llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	b172b8884a	[BypassSlowDivision] Teach bypass slow division not to interfere with div by constant where constants have been constant hoisted, but not moved from their basic block DAGCombiner doesn't pay attention to whether constants are opaque before doing the div by constant optimization. So BypassSlowDivision shouldn't introduce control flow that would make DAGCombiner unable to see an opaque constant. This can occur when a div and rem of the same constant are used in the same basic block. it will be hoisted, but not leave the block. Longer term we probably need to look into the X86 immediate cost model used by constant hoisting and maybe not mark div/rem immediates for hoisting at all. This fixes the case from PR38649. Differential Revision: https://reviews.llvm.org/D51000 llvm-svn: 340303	2018-08-21 17:15:33 +00:00
Farhana Aleen	3528c80378	[AMDGPU] Support idot2 pattern. Summary: Transform add (mul ((i32)S0.x, (i32)S1.x), add( mul ((i32)S0.y, (i32)S1.y), (i32)S3) => i/udot2((v2i16)S0, (v2i16)S1, (i32)S3) Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D50024 llvm-svn: 340295	2018-08-21 16:21:15 +00:00
Simon Pilgrim	43cf2c20ab	[X86] Add SSE2 and XOP udiv combine tests llvm-svn: 340282	2018-08-21 15:21:45 +00:00
Tim Renouf	bb5ee41ab4	[AMDGPU] Allow int types for MUBUF vdata Summary: Previously the new llvm.amdgcn.raw/struct.buffer.load/store intrinsics only allowed float types for the data to be loaded or stored, which sometimes meant the frontend needed to generate a bitcast. In this, the new intrinsics copied the old buffer intrinsics. This commit extends the new intrinsics to allow int types as well. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D50315 Change-Id: I8202af2d036455553681dcbb3d7d32ae273f8f85 llvm-svn: 340270	2018-08-21 11:08:12 +00:00
Tim Renouf	4f703f5e11	[AMDGPU] New buffer intrinsics Summary: This commit adds new intrinsics llvm.amdgcn.raw.buffer.load llvm.amdgcn.raw.buffer.load.format llvm.amdgcn.raw.buffer.load.format.d16 llvm.amdgcn.struct.buffer.load llvm.amdgcn.struct.buffer.load.format llvm.amdgcn.struct.buffer.load.format.d16 llvm.amdgcn.raw.buffer.store llvm.amdgcn.raw.buffer.store.format llvm.amdgcn.raw.buffer.store.format.d16 llvm.amdgcn.struct.buffer.store llvm.amdgcn.struct.buffer.store.format llvm.amdgcn.struct.buffer.store.format.d16 llvm.amdgcn.raw.buffer.atomic.* llvm.amdgcn.struct.buffer.atomic.* with the following changes from the llvm.amdgcn.buffer.* intrinsics: * there are separate raw and struct versions: raw does not have an index arg and sets idxen=0 in the instruction, and struct always sets idxen=1 in the instruction even if the index is 0, to allow for the fact that gfx9 does bounds checking differently depending on whether idxen is set; * there is a combined cachepolicy arg (glc+slc) * there are now only two offset args: one for the offset that is included in bounds checking and swizzling, to be split between the instruction's voffset and immoffset fields, and one for the offset that is excluded from bounds checking and swizzling, to go into the instruction's soffset field. The AMDISD::BUFFER_* SD nodes always have an index operand, all three offset operands, combined cachepolicy operand, and an extra idxen operand. The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50306 Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205 llvm-svn: 340269	2018-08-21 11:07:10 +00:00
Tim Renouf	35484c9d50	[AMDGPU] New tbuffer intrinsics Summary: This commit adds new intrinsics llvm.amdgcn.raw.tbuffer.load llvm.amdgcn.struct.tbuffer.load llvm.amdgcn.raw.tbuffer.store llvm.amdgcn.struct.tbuffer.store with the following changes from the llvm.amdgcn.tbuffer.* intrinsics: * there are separate raw and struct versions: raw does not have an index arg and sets idxen=0 in the instruction, and struct always sets idxen=1 in the instruction even if the index is 0, to allow for the fact that gfx9 does bounds checking differently depending on whether idxen is set; * there is a combined format arg (dfmt+nfmt) * there is a combined cachepolicy arg (glc+slc) * there are now only two offset args: one for the offset that is included in bounds checking and swizzling, to be split between the instruction's voffset and immoffset fields, and one for the offset that is excluded from bounds checking and swizzling, to go into the instruction's soffset field. The AMDISD::TBUFFER_* SD nodes always have an index operand, all three offset operands, combined format operand, combined cachepolicy operand, and an extra idxen operand. The tbuffer pseudo- and real instructions now also have a combined format operand. The obsolescent llvm.amdgcn.tbuffer.* and llvm.SI.tbuffer.store intrinsics continue to work. V2: Separate raw and struct intrinsics. V3: Moved extract_glc and extract_slc defs to a more sensible place. V4: Rebased on D49995. V5: Only two separate offset args instead of three. V6: Pseudo- and real instructions have joint format operand. V7: Restored optionality of dfmt and nfmt in assembler. V8: Addressed minor review comments. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49026 Change-Id: If22ad77e349fac3a5d2f72dda53c010377d470d4 llvm-svn: 340268	2018-08-21 11:06:05 +00:00
Bjorn Pettersson	d378a39603	Change how finalizeBundle selects debug location for the BUNDLE instruction Summary: Previously a BUNDLE instruction inherited the DebugLoc from the first instruction in the bundle, even if that DebugLoc had no DILocation. With this commit this is changed into selecting the first DebugLoc that has a DILocation, by searching among the bundled instructions. The idea is to reduce amount of bundles that are lacking debug locations. Reviewers: #debug-info, JDevlieghere Reviewed By: JDevlieghere Subscribers: JDevlieghere, mattd, llvm-commits Differential Revision: https://reviews.llvm.org/D50639 llvm-svn: 340267	2018-08-21 10:59:50 +00:00
Simon Pilgrim	8e15b43092	[X86] Add SSE2 sdiv combine tests llvm-svn: 340264	2018-08-21 10:44:06 +00:00
Sam Parker	597811e7a7	[DAGCombiner] Reduce load widths of shifted masks During combining, ReduceLoadWdith is used to combine AND nodes that mask loads into narrow loads. This patch allows the mask to be a shifted constant. This results in a narrow load which is then left shifted to compensate for the new offset. Differential Revision: https://reviews.llvm.org/D50432 llvm-svn: 340261	2018-08-21 10:26:59 +00:00
Simon Pilgrim	72b324de4d	[TargetLowering] Add BuildSDiv support for division by one or negone. This reduces most of the sdiv stages (the MULHS, shifts etc.) to just zero/identity values and use the numerator scale factor to multiply by +1/-1. llvm-svn: 340260	2018-08-21 10:20:36 +00:00
Petar Jovanovic	3b953c37f8	[MIPS GlobalISel] Select bitwise instructions Select bitwise instructions for i32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D50183 llvm-svn: 340258	2018-08-21 08:15:56 +00:00
Bjorn Pettersson	880f291577	[RegisterCoalescer] Do not assert when trying to remat dead values Summary: RegisterCoalescer::reMaterializeTrivialDef used to assert that the input register was live in. But as shown by the new coalesce-dead-lanes.mir test case that seems to be a valid scenario. We now return false instead of the assert, simply avoiding to remat the dead def. Normally a COPY of an undef value is eliminated by eliminateUndefCopy(). Although we only do that when the destination isn't a physical register. So the situation above should be limited to the case when we copy an undef value to a physical register. Reviewers: kparzysz, wmi, tpr Reviewed By: kparzysz Subscribers: MatzeB, qcolombet, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D50842 llvm-svn: 340255	2018-08-21 07:49:05 +00:00
Heejin Ahn	487992cc09	[WebAssembly] Revert type of wake count in atomic.wake to i32 Summary: We decided to revert this from i64 to i32 in Nov 28 CG meeting. Fixes PR38632. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D51010 llvm-svn: 340234	2018-08-20 23:49:29 +00:00
Craig Topper	9c57ba0dc3	[X86] Add test command line to expose PR38649. Bypass slow division and constant hoisting are conspiring to break div+rem of large constants. llvm-svn: 340217	2018-08-20 21:51:35 +00:00
Craig Topper	210ccfe3db	[X86] Prevent lowerVectorShuffleByMerging128BitLanes from creating cycles Due to some splat handling code in getVectorShuffle, its possible for NewV1/NewV2 to have their mask modified from what is requested. This can lead to cycles being created in the DAG. This patch examines the returned mask and makes sure its different. Long term we may need to look closer at that splat code in getVectorShuffle, or add more splat awareness to getVectorShuffle. Fixes PR38639 Differential Revision: https://reviews.llvm.org/D50981 llvm-svn: 340214	2018-08-20 21:08:35 +00:00
Craig Topper	7dcb2c4b0a	[X86] Teach combineTruncatedArithmetic to handle some cases of ISD::SUB We can safely avoid interfering with the subus combine if both inputs are freely truncatable. Either both extends, or an extend and a constant vector. Differential Revision: https://reviews.llvm.org/D50878 llvm-svn: 340212	2018-08-20 20:57:35 +00:00
Craig Topper	08e7e04998	[X86] Pre-commit test cases for D50878. llvm-svn: 340211	2018-08-20 20:57:32 +00:00
Krzysztof Parzyszek	cc3f630252	Consistently use MemoryLocation::UnknownSize to indicate unknown access size 1. Change the software pipeliner to use unknown size instead of dropping memory operands. It used to do it before, but MachineInstr::mayAlias did not handle it correctly. 2. Recognize UnknownSize in MachineInstr::mayAlias. 3. Print and parse UnknownSize in MIR. Differential Revision: https://reviews.llvm.org/D50339 llvm-svn: 340208	2018-08-20 20:37:57 +00:00
Vitaly Buka	30b5ed3eb7	Revert "AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space" As it introduces out of bound access. This reverts commit r340172 and r340171 llvm-svn: 340202	2018-08-20 19:31:03 +00:00
Cameron McInally	94b9029be9	[FPEnv] Support constrained FREM intrinsic Differential Revision: https://reviews.llvm.org/D50975 llvm-svn: 340201	2018-08-20 19:28:56 +00:00
Aditya Nandakumar	2a08285cf3	Revert "Revert r339977: [GISel]: Add Opcodes for a few LLVM Intrinsics" This reverts commit 7debc334e6421bb5251ef8f18e97166dfc7dd787. I missed updating legalizer-info-validation.mir as I had assertions turned off in my build and that specific test requires asserts. Fixed it now. llvm-svn: 340197	2018-08-20 18:43:19 +00:00
Simon Pilgrim	6ac905926f	[TargetLowering] Disable BuildSDiv division by one or negone. Fuzz tests have detected an issue, currently working on a fix. llvm-svn: 340195	2018-08-20 18:23:54 +00:00
Samuel Pitoiset	c95ef77d37	AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space 32-bit constant address space is declared as 6, so the maximum number of address spaces is 6, not 5. Fixes "LLVM ERROR: Pointer address space out of range". v3: use static_assert() v2: add a very simple test for 32-bit addr space Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 340171	2018-08-20 13:18:59 +00:00
Simon Pilgrim	5b78c9d58d	[SelectionDAG] Add partial sign-bit support to ComputeNumSignBits for BITCAST nodes Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts. Handle the case where the sign bit extends to only part of the small elements. llvm-svn: 340169	2018-08-20 13:05:48 +00:00
Simon Pilgrim	11bec5b80c	[X86][SSE] Fix PACKSS bitcast test from rL340166 We need the signbits to extends to lower 16-bits of the even elements llvm-svn: 340167	2018-08-20 11:47:15 +00:00
Simon Pilgrim	cee9c64838	[X86][SSE] Add PACKSS test showing ComputeNumSignBits failure to handle a partial sign bits extension through a bitcast llvm-svn: 340166	2018-08-20 11:10:12 +00:00
Simon Pilgrim	686090a45f	[X86] Drop unnecessary exact qualifier from packss test llvm-svn: 340165	2018-08-20 11:01:51 +00:00
QingShan Zhang	f8f9af7ba5	[PowerPC] Add a peephole post RA to transform the inst that fed by add If the arch is P8, we will select XFLOAD to load the floating point, and then, expand it to vsx and non-vsx X-form instruction post RA. This patch is trying to convert the X-form to D-form if it meets the requirement that one operand of the x-form inst is the special Zero register, and another operand fed by add inst. i.e. y = add imm, reg LFDX. 0, y --> LFD imm(reg) Reviewers: Nemanjai Differential Revision: https://reviews.llvm.org/D49007 llvm-svn: 340149	2018-08-20 02:52:55 +00:00
Simon Pilgrim	5b936ec89e	[SelectionDAG] Add basic demanded elements support to ComputeNumSignBits for BITCAST nodes Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts. The next step would be to support cases where the large elements aren't all sign bits, and determine the small element equivalent based on the demanded elements. llvm-svn: 340143	2018-08-19 17:47:50 +00:00
Simon Pilgrim	0fd72ab44f	[X86][SSE] Add PACKSS test showing ComputeNumSignBits failure to handle demanded elts through a bitcast llvm-svn: 340139	2018-08-19 16:01:47 +00:00
Craig Topper	803912ea57	[X86] Fix an issue in the matching for ADDUS. We were basically assuming only one operand of the compare could be an ADD node and using that to swap operands. But we can have a normal add followed by a saturing add. This rewrites the canonicalization to just be based on the condition code. llvm-svn: 340134	2018-08-19 04:26:31 +00:00
Craig Topper	a85d7e927b	[X86] Add a test case showing an issue in our addusw pattern matching. We are unable to handle a normal add followed by a saturing add with certain operand orders on the icmp. llvm-svn: 340133	2018-08-19 04:26:29 +00:00
Craig Topper	40c9559b74	[X86] Add support for using 512-bit PSUBUS to combineSelect. The code already support 128 and 256 and even knows to split 256 for AVX1. So we really just needed to stop looking for specific VTs and subtarget features and just look for legal VTs with i8/i16 elements. While there, add some curly braces around outer if statement bodies that contain only another if. It makes all the closing curly braces look more regular. llvm-svn: 340128	2018-08-18 18:51:03 +00:00
Craig Topper	b40a1d5f84	[X86] Add test cases to show missed opportunities to use 512-bit PSUBUS. llvm-svn: 340127	2018-08-18 18:50:59 +00:00
Craig Topper	911efbb926	[X86] Add a signed test case for PR38622. Use nounwind to reduce the output on the unsigned test case. llvm-svn: 340121	2018-08-18 06:00:16 +00:00
Craig Topper	cc5dbbf759	[DAGCombiner] Allow divide by constant optimization on opaque constants. Summary: I believe this restores the behavior we had before r339147. Fixes PR38622. Reviewers: RKSimon, chandlerc, spatel Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50936 llvm-svn: 340120	2018-08-18 05:52:42 +00:00
Matt Arsenault	25e51540e1	DAG: Fix isKnownNeverNaN for basic non-sNaN cases fadd/fsub/fmul need to worry about infinities as well as fdiv. llvm-svn: 340085	2018-08-17 21:19:22 +00:00
Simon Pilgrim	2f48122cc9	[X86][SSE] Lower constant vXi8 ISD::SRL/ISD::SRA using PMULLW Extending the concept introduced in D49562, this patch lowers constant vXi8 ISD::SRL/ISD::SRA by zero/sign extending to vXi16 and using PMULLW and then truncating the high 8 bits of the result. Differential Revision: https://reviews.llvm.org/D50781 llvm-svn: 340062	2018-08-17 18:03:11 +00:00
Stefan Pintilie	39869ccf51	[PowerPC] Generate lxsd instead of the ld->mtvsrd sequence for vector loads This patch addresses: - Implementation within PPCISelLowering.cpp to check if we should use direct load into vector instructions (such as lxsd/lfd ) when the scalar_to_vector function is used; which will allow us to catch as many cases of the scalar_to_vector uses as possible to translate the ld->mtvsrd sequence into lxsd. - Test cases to exhibit the behaviour of emitting lxsd/lfd. Patch by amyk Differential revision: https://reviews.llvm.org/D49698 llvm-svn: 340037	2018-08-17 15:15:26 +00:00
Francis Visoiu Mistrih	f006b491bd	[x86] Fix test breaking on Darwin after r339962 * -march=x86-64 -> -mtriple=x86_64-unknown-linux to avoid _ prefixes to symbols * add -start-before to avoid running the whole codegen on the IR. I assumed it is meant to be running after X86SpeculativeLoadHardening. llvm-svn: 340034	2018-08-17 14:47:01 +00:00
Francis Visoiu Mistrih	8bff832534	[X86] Fix liveness information when expanding X86::EH_SjLj_LongJmp64 test/CodeGen/X86/shadow-stack.ll has the following machine verifier errors: ``` * Bad machine code: Using a killed virtual register * - function: bar - basic block: %bb.6 entry (0x7fdc81857818) - instruction: %3:gr64 = MOV64rm killed %2:gr64, 1, $noreg, 8, $noreg - operand 1: killed %2:gr64 * Bad machine code: Using a killed virtual register * - function: bar - basic block: %bb.6 entry (0x7fdc81857818) - instruction: $rsp = MOV64rm killed %2:gr64, 1, $noreg, 16, $noreg - operand 1: killed %2:gr64 * Bad machine code: Virtual register killed in block, but needed live out. * - function: bar - basic block: %bb.2 entry (0x7fdc818574f8) Virtual register %2 is used after the block. ``` The fix here is to only copy the machine operand's register without the kill flags for all the instructions except the very last one of the sequence. I had to insert dummy PHIs in the test case to force the NoPHI function property to be set to false. More on this here: https://llvm.org/PR38439 Differential Revision: https://reviews.llvm.org/D50260 llvm-svn: 340033	2018-08-17 14:46:56 +00:00
Krzysztof Parzyszek	39a979c838	[Hexagon] Expand vgather pseudos during packetization This will allow packetizing the vgather expansion with other instructions. llvm-svn: 340028	2018-08-17 14:24:24 +00:00
Luke Cheeseman	64dcdec60c	[AArch64] - Generate pointer authentication instructions - Generate pointer authentication instructions - The functions instrumented depend on function attribtues: all (all functions instrumentent) non-leaf (only those that spill LR) none - Function epilogues sign the LR before spilling to the stack and authenticate the LR once restored - If the target is v8.3a or greater than can use the combined authenticate and return instruction Differential revision: https://reviews.llvm.org/D49793 llvm-svn: 340018	2018-08-17 12:53:22 +00:00
Nemanja Ivanovic	39751276b0	[PowerPC] Generate Power9 extswsli extend sign and shift immediate instruction Add a DAG combine for the PowerPC code generator to generate the Power9 extswsli extend sign and shift immediate instruction. Patch by RolandF. Differential revision: https://reviews.llvm.org/D49879 llvm-svn: 340016	2018-08-17 12:35:44 +00:00
Simon Pilgrim	03e57521c0	[DAGCombiner] extractShiftForRotate - fix out of range shift issue Don't just check for negative shift amounts. Fixes OSS Fuzz #9935 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9935 llvm-svn: 340015	2018-08-17 12:25:18 +00:00
Simon Pilgrim	5113b48798	[DAGCombine] Improve (sra (sra x, c1), c2) -> (sra x, (add c1, c2)) folding Add support for cases where only some c1+c2 results exceed the max bitshift, clamping accordingly. Differential Revision: https://reviews.llvm.org/D35722 llvm-svn: 340010	2018-08-17 10:52:49 +00:00
Daniel Cederman	0c597ca223	[Sparc] Get sret arg size from CallLoweringInfo.getArgs() Summary: Looking at the callee argument list, as is done now, might not work if the function has been typecasted into one that is expected to return a struct. This change also simplifies the code. The isFP128ABICall() function can be removed as it is no longer needed. The test in fp128.ll has been updated to verify this. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48117 llvm-svn: 340008	2018-08-17 10:40:00 +00:00
Daniel Cederman	7d3e08ff8d	[Sparc] Flush register windows for @llvm.returnaddress(1) Summary: When @llvm.returnaddress is called with a value higher than 0 it needs to read from the call stack to get the return address. This means that the register windows needs to be flushed to the stack to guarantee that the data read is valid. For values higher than 1 this is done indirectly by the call to getFRAMEADDR(), but not for the value 1. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48636 llvm-svn: 340003	2018-08-17 09:18:31 +00:00
Chandler Carruth	b898b86f49	Revert r339977: [GISel]: Add Opcodes for a few LLVM Intrinsics This is breaking ~all the bots. llvm-svn: 339982	2018-08-17 04:47:16 +00:00
Aditya Nandakumar	973a557338	[GISel]: Add Opcodes for a few LLVM Intrinsics https://reviews.llvm.org/D50401 Add opcodes for llvm.intrinsic.trunc, round, and update the IRTranslator for the same. Reviewed by: dsanders. llvm-svn: 339977	2018-08-17 01:41:56 +00:00
Heejin Ahn	e76fa9ecca	[WebAssembly] CFG stackify support for exception handling Summary: This adds support for exception handling to CFGStackify pass. This only adds TRY / END_TRY markers and DOES NOT yet fix unwind mismatches that can be created by the linearization of the CFG into the structural wasm format. The mismatch fix will be added by following patches. In detail, this patch - Added support for TRY / END_TRY markers to support EH - Changed many static functions into class member functions as they take too many arguments now - Added several more bookeeping data structures - Refactored routines that decide where to insert markers, because without refactoring this got too complicated as we added support for new kinds of markers (TRY/END_TRY). - Rewrote rethrow instructions' BB arguments to relative depths in EH pad stack. Reviewers: dschuff, sunfish Subscribers: sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D48273 llvm-svn: 339967	2018-08-16 23:50:59 +00:00
Chandler Carruth	75ca6be1c1	[x86/MIR] Implement support for pre- and post-instruction symbols, as well as MIR parsing support for `MCSymbol` `MachineOperand`s. The only real way to test pre- and post-instruction symbol support is to use them in operands, so I ended up implementing that within the patch as well. I can split out the operand support if folks really want but it doesn't really seem worth it. The functional implementation of pre- and post-instruction symbols is now completely trivial. Two tiny bits of code in the (misnamed) AsmPrinter. It should be completely target independent as well. We emit these exactly the same way as we emit basic block labels. Most of the code here is to give full dumping, MIR printing, and MIR parsing support so that we can write useful tests. The MIR parsing of MC symbol operands still isn't 100%, as it forces the symbols to be non-temporary and non-local symbols with names. However, those names often can encode most (if not all) of the special semantics desired, and unnamed symbols seem especially annoying to serialize and de-serialize. While this isn't perfect or full support, it seems plenty to write tests that exercise usage of these kinds of operands. The MIR support for pre-and post-instruction symbols was quite straightforward. I chose to print them out in an as-if-operand syntax similar to debug locations as this seemed the cleanest way and let me use nice introducer tokens rather than inventing more magic punctuation like we use for memoperands. However, supporting MIR-based parsing of these symbols caused me to change the design of the symbol support to allow setting arbitrary symbols. Without this, I don't see any reasonable way to test things with MIR. Differential Revision: https://reviews.llvm.org/D50833 llvm-svn: 339962	2018-08-16 23:11:05 +00:00
Craig Topper	883ff69c93	[DAGCombiner] Don't reassociate operations that have the vector reduction flag set. When nodes are reassociated the vector-reduction flag gets lost. The test case is here is what would happen if you had a sum of absolute differences loop that started with a non-zero but contant sum and that loop was unrolled. The vectorizer will generate a constant vector for the initial value. And DAGCombiner reassociate tries to move it down the addition tree erasing the vector-reduction flag. Interestingly this moves constants the opposite direction of the reassociate IR pass. I've chosen to just punt on the reassociate, but I suppose we could maybe preserve the flag if both nodes have it set. Differential Revision: https://reviews.llvm.org/D50827 llvm-svn: 339946	2018-08-16 21:54:05 +00:00
Craig Topper	bde2b43cb3	[X86] In EFLAGS copy pass, don't emit EXTRACT_SUBREG instructions since we're after peephole Normally the peephole pass converts EXTRACT_SUBREG to COPY instructions. But we're after peephole so we can't rely on it to clean these up. To fix this, the eflags pass now emits a COPY with a subreg input. I also noticed that in 32-bit mode we need to constrain the input to the copy to ensure the subreg is valid. Otherwise we'll fail verify-machineinstrs Differential Revision: https://reviews.llvm.org/D50656 llvm-svn: 339945	2018-08-16 21:54:02 +00:00
Krzysztof Parzyszek	bb1aede865	[SystemZ] Require asserts in subregliveness-06.mir The option -misched=shuffle is only available with !NDEBUG builds. llvm-svn: 339931	2018-08-16 20:12:15 +00:00
Craig Topper	3dfc5af178	[X86] Pre-commit test case for D50827. llvm-svn: 339926	2018-08-16 19:27:43 +00:00
Krzysztof Parzyszek	9af86a5e01	[MachineVerifier] Check if predecessor is jointly dominated by undefs Each use of a value should be jointly dominated by the union of defs and undefs. It can happen that it will only be jointly dominated by undefs, and that is still legal. Make sure that the verifier is aware of that. llvm-svn: 339924	2018-08-16 19:13:28 +00:00
Eli Friedman	73e8a784e6	[SelectionDAG] Improve the legalisation lowering of UMULO. There is no way in the universe, that doing a full-width division in software will be faster than doing overflowing multiplication in software in the first place, especially given that this same full-width multiplication needs to be done anyway. This patch replaces the previous implementation with a direct lowering into an overflowing multiplication algorithm based on half-width operations. Correctness of the algorithm was verified by exhaustively checking the output of this algorithm for overflowing multiplication of 16 bit integers against an obviously correct widening multiplication. Baring any oversights introduced by porting the algorithm to DAG, confidence in correctness of this algorithm is extremely high. Following table shows the change in both t = runtime and s = space. The change is expressed as a multiplier of original, so anything under 1 is “better” and anything above 1 is worse. +-------+-----------+-----------+-------------+-------------+ \| Arch \| u64u64 t \| u64u64 s \| u128u128 t \| u128u128 s \| +-------+-----------+-----------+-------------+-------------+ \| X64 \| - \| - \| ~0.5 \| ~0.64 \| \| i686 \| ~0.5 \| ~0.6666 \| ~0.05 \| ~0.9 \| \| armv7 \| - \| ~0.75 \| - \| ~1.4 \| +-------+-----------+-----------+-------------+-------------+ Performance numbers have been collected by running overflowing multiplication in a loop under `perf` on two x86_64 (one Intel Haswell, other AMD Ryzen) based machines. Size numbers have been collected by looking at the size of function containing an overflowing multiply in a loop. All in all, it can be seen that both performance and size has improved except in the case of armv7 where code size has regressed for 128-bit multiply. u128*u128 overflowing multiply on 32-bit platforms seem to benefit from this change a lot, taking only 5% of the time compared to original algorithm to calculate the same thing. The final benefit of this change is that LLVM is now capable of lowering the overflowing unsigned multiply for integers of any bit-width as long as the target is capable of lowering regular multiplication for the same bit-width. Previously, 128-bit overflowing multiply was the widest possible. Patch by Simonas Kazlauskas! Differential Revision: https://reviews.llvm.org/D50310 llvm-svn: 339922	2018-08-16 18:39:39 +00:00
Krzysztof Parzyszek	17143f6111	[RegisterCoalescer] Shrink to uses if needed after removeCopyByCommutingDef llvm-svn: 339912	2018-08-16 18:02:59 +00:00
Simon Pilgrim	87d0039a45	[TargetLowering] Add support for non-uniform vectors to BuildSDIV This patch refactors the existing TargetLowering::BuildSDIV base implementation to support non-uniform constant vector denominators. This is the last patch necessary to close PR36545 Differential Revision: https://reviews.llvm.org/D50765 llvm-svn: 339908	2018-08-16 17:44:33 +00:00
Matt Arsenault	7121bed210	AMDGPU: Custom lower fexp This will allow the library to just use __builtin_expf directly without expanding this itself. Note f64 still won't work because there is no exp instruction for it. llvm-svn: 339902	2018-08-16 17:07:52 +00:00
Simon Pilgrim	8b9e545477	[X86][SSE] Add sdiv by nonuniform constant vector test containing -1/+1 and all-bits style constants llvm-svn: 339901	2018-08-16 17:07:41 +00:00
Sam Parker	0d51197051	[ARM] Ignore GEPs in ARMCodeGenPrepare While searching through the use-def tree, ignore GetElementPtrInst instructions because they don't need promoting and neither do their indices. Otherwise, the wide indices prevent the transformation from happening. Differential Revision: https://reviews.llvm.org/D50762 llvm-svn: 339871	2018-08-16 12:24:40 +00:00
Sam Parker	0e2f0bd48e	[ARM] Allow zext in ARMCodeGenPrepare Treat zext instructions as roots, like we do for truncs. Differential Revision: https://reviews.llvm.org/D50759 llvm-svn: 339868	2018-08-16 11:54:09 +00:00
Alex Bradbury	fdc4647ca3	[RISCV][MC] Don't fold symbol differences if requiresDiffExpressionRelocations is true When emitting the difference between two symbols, the standard behavior is that the difference will be resolved to an absolute value if both of the symbols are offsets from the same data fragment. This is undesirable on architectures such as RISC-V where relaxation in the linker may cause the computed difference to become invalid. This caused an issue when compiling to object code, where the size of a function in the debug information was already calculated even though it could change as a consequence of relaxation in the subsequent linking stage. This patch inhibits the resolution of symbol differences to absolute values where the target's AsmBackend has declared that it does not want these to be folded. Differential Revision: https://reviews.llvm.org/D45773 Patch by Edward Jones. llvm-svn: 339864	2018-08-16 11:26:37 +00:00
Sam Parker	13567dbbd8	[ARM] Allow signed icmps in ARMCodeGenPrepare Originally committed in r339755 which was reverted in r339806 due to an asan issue. The issue was caused by my assumption that operands to a CallInst mapped to the FunctionType Params. CallInsts are now handled by iterating over their ArgOperands instead of Operands. Original Message: Treat signed icmps as 'sinks', allowing them to be in the use-def tree, enabling more promotions to be performed. As a sink, any promoted incoming values need to be truncated before being used by the signed icmp. Differential Revision: https://reviews.llvm.org/D50067 llvm-svn: 339858	2018-08-16 10:05:39 +00:00
Craig Topper	9c1d9fdeaa	[X86] Remove masking from the 512-bit padds and psubs intrinsics. Use select in IR instead. llvm-svn: 339842	2018-08-16 06:20:24 +00:00
Craig Topper	9d6983c9fd	[X86] Remove the unused masked 128 and 256-bit masked padds/psubs intrinsics. Still need to remove masking from the 512-bit versions. llvm-svn: 339841	2018-08-16 06:20:22 +00:00
Craig Topper	054b8cce2d	[X86] Correct some bad FileCheck prefixes in tests. Add test cases for v64i8 padd/psub saturation intrinsics. For some reason we had the 128/256-bit tests, but no the 512-bit tests. llvm-svn: 339840	2018-08-16 06:20:19 +00:00
Chandler Carruth	00c35c7794	[x86] Actually initialize the SLH pass with the x86 backend and use a shorter name ('x86-slh') for the internal flags and pass name. Without this, you can't use the -stop-after or -stop-before infrastructure. I seem to have just missed this when originally adding the pass. The shorter name solves two problems. First, the flag names were ... really long and hard to type/manage. Second, the pass name can't be the exact same as the flag name used to enable this, and there are already some users of that flag name so I'm avoiding changing it unnecessarily. llvm-svn: 339836	2018-08-16 01:22:19 +00:00
Matt Arsenault	f533e6b0ed	AMDGPU: Fold fneg into fmed3 llvm-svn: 339821	2018-08-15 21:46:27 +00:00
Matt Arsenault	a816073764	AMDGPU: Improve extract_vector_elt reduction combine Handle fmul, fsub and preserve flags. Also really test minnum/maxnum reductions. The existing tests were only checking from minnum/maxnum matched from a fast math compare and select which is not the same. llvm-svn: 339820	2018-08-15 21:34:06 +00:00
Matt Arsenault	b3a80e5397	AMDGPU: Implement llvm.amdgcn.icmp/fcmp for i16/f16 Also support these on targets without support for these, since it will allow us to freely create these in instcombine. llvm-svn: 339819	2018-08-15 21:25:20 +00:00
Craig Topper	08e082619a	[X86] Improve AVX1 shuffle lowering for v8f32 shuffles where the low half comes from V1 and the high half comes from V2 and the halves do the same operation To lower this we now create a new V1 containing the low half of both sources and a new V2 containing the upper half of both sources. Then we created a repeated lane shuffle of those new sources to create the final result. This fixes PR35833 Differential Revison: https://reviews.llvm.org/D41794 llvm-svn: 339818	2018-08-15 21:21:52 +00:00
Matt Arsenault	6c7ba82900	AMDGPU: Address todo for handling 1/(2 pi) llvm-svn: 339814	2018-08-15 21:03:55 +00:00
Vitaly Buka	ed4239f482	Revert "[ARM] Allow signed icmps in ARMCodeGenPrepare" use-after-poison in check-llvm under asan This reverts commit r339755. llvm-svn: 339806	2018-08-15 20:09:35 +00:00
Sanjay Patel	49a8280f43	[AArch64] add tests for poor vector intrinsic lowering via legalization (PR38527); NFC These correspond to the x86 tests added with rL339790 / rL339791, but I widened the non-fsin tests to v3f32 to show the problem because AArch supports v2f32 ops. llvm-svn: 339793	2018-08-15 17:06:21 +00:00
Krzysztof Parzyszek	3b097b4d3e	[RegisterCoalescer] Ensure that both registers have subranges if one does llvm-svn: 339792	2018-08-15 17:04:58 +00:00
Sanjay Patel	712d42f53d	[x86] add fabs test for vector intrinsic to potential libcall bug; NFC This is a negative test for x86 because it has custom lowering for fabs. llvm-svn: 339791	2018-08-15 16:56:09 +00:00
Sanjay Patel	f9afee479f	[x86] add tests for poor vector intrinsic lowering via legalization (PR38527); NFC llvm-svn: 339790	2018-08-15 16:35:50 +00:00
Krzysztof Parzyszek	88d267d094	[RegisterCoalescer] Reset VNInfo def when copying segments over llvm-svn: 339788	2018-08-15 16:21:53 +00:00
Derek Schuff	82812fb986	[WebAssembly] SIMD replace_lane Implement and test replace_lane instructions. Patch by Thomas Lively Differential Revision: https://reviews.llvm.org/D50750 llvm-svn: 339786	2018-08-15 16:18:51 +00:00
Krzysztof Parzyszek	46ce441df6	[RegAlloc] Check that subreg liveness tracking applies to given virtual reg Subregister liveness applies selectively to register classes with certain properties. Make sure that when it's enabled, it applies to a given virtual register (in virtual register rewriter). llvm-svn: 339784	2018-08-15 16:07:47 +00:00
Krzysztof Parzyszek	4e06beb820	[SystemZ] Add testcase for r339778 llvm-svn: 339780	2018-08-15 15:43:13 +00:00
Nemanja Ivanovic	5b9a4f8ee5	[PowerPC] Enhance the selection(ISD::VSELECT) of vector type To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding. Use xxsel to match vselect if vsx is open, or use vsel. In order to do not write many patterns in td file, promote (for vector it's bitcast) all other type into v4i32 and only pattern match vselect of v4i32 into vsel or xxsel. Patch by wuzish Differential revision: https://reviews.llvm.org/D49531 llvm-svn: 339779	2018-08-15 15:30:36 +00:00
Sam Parker	fabf7fe5f8	[ARM] TypeSize lower bound for ARMCodeGenPrepare We only try to promote types with are smaller than 16-bits, but we also need to check that the type is not less than 8-bits. Differential Revision: https://reviews.llvm.org/D50769 llvm-svn: 339770	2018-08-15 13:29:50 +00:00
Nemanja Ivanovic	8b4bd09e22	[PowerPC] Don't run BV DAG Combine before legalization if it assumes legal types When trying to combine a DAG that builds a vector out of sign-extensions of vector extracts, the code assumes legal input types. Due to that, we have to disable this combine prior to legalization. In some cases, the DAG will look slightly different after legalization so account for that in the matching code. This is a fix for https://bugs.llvm.org/show_bug.cgi?id=38087 Differential Revision: https://reviews.llvm.org/D49080 llvm-svn: 339769	2018-08-15 12:58:13 +00:00
Simon Pilgrim	51cee894da	[X86][SSE] Add sdiv by nonuniform constant vector tests Tests cover each TargetLowering::BuildSDIV path separately plus combos llvm-svn: 339761	2018-08-15 10:59:29 +00:00
Aleksandr Urakov	eb3735e425	[X86] Add sibling-call test cases This commit adds new sibling-call test cases, so it will be possible to see how these test cases will be changed after applying D45653. See D45653 for details. llvm-svn: 339760	2018-08-15 10:54:06 +00:00
Simon Pilgrim	a272fa9b0c	[TargetLowering] Add support for non-uniform vectors to BuildExactSDIV This patch refactors the existing BuildExactSDIV implementation to support non-uniform constant vector denominators. Differential Revision: https://reviews.llvm.org/D50392 llvm-svn: 339756	2018-08-15 09:35:12 +00:00
Sam Parker	6548cd3905	[ARM] Allow signed icmps in ARMCodeGenPrepare Treat signed icmps as 'sinks', allowing them to be in the use-def tree, enabling more promotions to be performed. As a sink, any promoted incoming values need to be truncated before being used by the signed icmp. Differential Revision: https://reviews.llvm.org/D50067 llvm-svn: 339755	2018-08-15 08:23:03 +00:00
Sam Parker	7def86bbdb	[ARM] Allow pointer values in ARMCodeGenPrepare Add pointers to the list of allowed types, but don't try to promote them. Also fixed a bug with the promotion of undef values, so a new value is now created instead of mutating in place. We also now only promote if there's an instruction in the use-def chains other than the icmp, sinks and sources. Differential Revision: https://reviews.llvm.org/D50054 llvm-svn: 339754	2018-08-15 07:52:35 +00:00
Derek Schuff	4ec8bca13e	[WebAssembly] SIMD Splats Implement and test SIMD splat ops. Patch by Thomas Lively Differential Revision: https://reviews.llvm.org/D50741 llvm-svn: 339744	2018-08-15 00:30:27 +00:00
Heejin Ahn	283e1c11bd	[WebAssembly] Delete a specific push number from test expectations Summary: This shouldn't have been a specific number but rather a regex. This was a part of rL339474 which got reverted. Reviewers: aardappel Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D50728 llvm-svn: 339736	2018-08-14 22:14:51 +00:00
Cameron McInally	00b0658aae	[FPEnv] Scalarize StrictFP vector operations Add a helper function to scalarize constrained FP operations as needed. Differential Revision: https://reviews.llvm.org/D50720 llvm-svn: 339735	2018-08-14 22:13:11 +00:00
Heejin Ahn	c15a87848b	[WebAssembly] SIMD encoding tests Modifies existing SIMD tests to also check that SIMD instructions are lowered to the expected bytes. This CL depends on D50597. Reviewers: aheejin Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D50660 Patch by Thomas Lively (tlively) llvm-svn: 339712	2018-08-14 19:10:50 +00:00
Heejin Ahn	a0fd9c3e9a	[WebAssembly] SIMD extract_lane Implement instruction selection for all versions of the extract_lane instruction. Use explicit sext/zext to differentiate between extract_lane_s and extract_lane_u for applicable types, otherwise default to extract_lane_u. Reviewers: aheejin Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D50597 Patch by Thomas Lively (tlively) llvm-svn: 339707	2018-08-14 18:53:27 +00:00
Simon Pilgrim	2ce3d6e135	[X86][SSE] Avoid duplicate shuffle input sources in combineX86ShufflesRecursively rL339686 added the case where a faux shuffle might have repeated shuffle inputs coming from either side of the OR(). This patch improves the insertion of the inputs into the source ops lists to account for this, as well as making it trivial to add support for shuffles with more than 2 inputs in the future. llvm-svn: 339696	2018-08-14 17:22:37 +00:00
Simon Pilgrim	ed55138247	[X86][SSE] Add shuffle combine support for OR(PSHUFB,PSHUFB) style patterns. If each element is zero from one (or both) inputs then we can combine these into a single shuffle mask. llvm-svn: 339686	2018-08-14 16:00:05 +00:00
Simon Pilgrim	52c88a7c0e	[X86][SSE] Add shuffle combine tests for OR(PSHUFB,PSHUFB) style patterns. We generate these shuffle patterns but we fail to combine them. llvm-svn: 339684	2018-08-14 15:21:26 +00:00
Amara Emerson	30e61404a8	[GlobalISel][IRTranslator] Fix a bug in handling repeating struct types during argument lowering. Differential Revision: https://reviews.llvm.org/D49442 llvm-svn: 339674	2018-08-14 12:04:25 +00:00
Tomasz Krupa	86a63889f3	[X86] Lowering addus/subus intrinsics to native IR Summary: This revision improves previous version (rL330322) which has been reverted due to crashes. This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. The patch also includes folding of previously missing saturation patterns so that IR emits the same machine instructions as the intrinsics. Reviewers: craig.topper, spatel, RKSimon Reviewed By: craig.topper Subscribers: mike.dvoretsky, DavidKreitzer, sroland, llvm-commits Differential Revision: https://reviews.llvm.org/D46179 llvm-svn: 339650	2018-08-14 08:00:56 +00:00
Wouter van Oortmerssen	a7be375586	Revert "[WebAssembly] Added default stack-only instruction mode for MC." This reverts commit 917a99b71ce21c975be7bfbf66f4040f965d9f3c. llvm-svn: 339630	2018-08-13 23:12:49 +00:00
Scott Linder	35213793bc	[CodeGen] Fix assert in SelectionDAG::computeKnownBits Fix SelectionDAG::computeKnownBits asserting when handling EXTRACT_SUBVECTOR when zero extending the demanded elements mask if it is already as long as the source vector. Differential Revision: https://reviews.llvm.org/D49574 llvm-svn: 339600	2018-08-13 18:44:21 +00:00
Daniel Cederman	dc3e4c6d95	Revert "[Sparc] Add support for the cycle counter available in GR740" It breaks when using EXPENSIVE_CHECKS with the error message "Bad machine code: Using an undefined physical register". llvm-svn: 339570	2018-08-13 14:18:09 +00:00
Simon Pilgrim	4aaf48013d	[X86] Add tests showing missing div/rem 0, X -> 0 combines llvm-svn: 339562	2018-08-13 13:29:54 +00:00
Simon Pilgrim	ee82a79041	[CGP] Fix GEP issue with out of range APInt constant values not fitting in int64_t Test case reduced from https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=7173 llvm-svn: 339556	2018-08-13 12:10:09 +00:00
Daniel Cederman	1bfbc62022	[Sparc] Add support for the cycle counter available in GR740 Summary: The GR740 provides an up cycle counter in the registers ASR22 and ASR23. As these registers can not be read together atomically we only use the value of ASR23 for llvm.readcyclecounter(). The ASR23 register holds the 32 LSBs of the up-counter. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48638 llvm-svn: 339551	2018-08-13 10:49:48 +00:00
Luke Geeson	4ce41d2bb7	[ARM] Added FP16 VREV Vector Instrinsic CodeGen support llvm-svn: 339546	2018-08-13 08:37:41 +00:00
Craig Topper	cacf12a149	[SelectionDAG] In PromoteFloatOp_BITCAST, insert a bitcast after the fp_to_fp16 in case the result type isn't a scalar integer. This is another variation of PR38533. In this case, the result type of the bitcast is legal and 16-bits wide, but not a scalar integer. So we need to emit the convert to i16 and then bitcast it to the true result type. This new bitcast will be further type legalized if necessary. llvm-svn: 339536	2018-08-13 06:53:49 +00:00
Craig Topper	e42a159537	[SelectionDAG] In PromoteIntRes_BITCAST, when the input is TypePromoteFloat, make sure the output type is scalar. For vectors, use a store and load of temporary. Previously if the result type was a vector, we emitted a FP_TO_FP16 with a vector result type which isn't valid. This is basically the opposite case of the root cause of PR38533. llvm-svn: 339535	2018-08-13 06:53:47 +00:00
Lei Liu	901a0a9588	Restore correct x86_64 EH encodings in kernel code model Fixes PR37524. The exception handling encodings for x86_64 in kernel code model has been changed with r309884. Restore it to correct ones. These encodings include PersonalityEncoding, LSDAEncoding and TTypeEncoding. Differential Revision: https://reviews.llvm.org/D50490 llvm-svn: 339534	2018-08-13 06:06:53 +00:00
Craig Topper	42e32117bb	[SelectionDAG] In PromoteFloatRes_BITCAST, insert a bitcast before the fp16_to_fp in case the input type isn't an i16. The bitcast can be further legalized as needed. Fixes PR38533. llvm-svn: 339533	2018-08-13 05:26:49 +00:00
Matt Arsenault	3763f307bd	AMDGPU: Cleanup min/max legacy tests Also add some more tests in preparation for a future patch. llvm-svn: 339526	2018-08-12 19:29:53 +00:00
Matt Arsenault	1201301b94	DAG: Check no-signed-zeros instead of unsafe-fp-math Addresses fixme, although this should still be checking individual operand flags. llvm-svn: 339525	2018-08-12 19:09:12 +00:00
Matt Arsenault	13b0db9285	AMDGPU: Check NSZ MI flag when folding omod I'm not sure the exact nsz flag combination that is OK. I think as long as it's on either, this is OK. For now just check it on the omod multiply. llvm-svn: 339513	2018-08-12 08:44:25 +00:00
Matt Arsenault	b5acec1f79	AMDGPU: Use splat vectors for undefs when folding canonicalize If one of the elements is undef, use the canonicalized constant from the other element instead of 0. Splat vectors are more useful for other optimizations, such as matching vector clamps. This was breaking on clamps of half3 from the undef 4th component. llvm-svn: 339512	2018-08-12 08:42:54 +00:00
Matt Arsenault	3ead7d7389	AMDGPU: Fix packing undef parts of build_vector llvm-svn: 339511	2018-08-12 08:42:46 +00:00
Craig Topper	570d47a010	[X86] Change the MOV32ri64 pseudo instruction to def a GR64 directly instead of wrapping it in a SUBREG_TO_REG. Now we switch to the subregister in expandPostRAPseudos where we already switched the opcode. This simplifies a few isel patterns that used the pseudo directly. And magically seems to have improved our ability to CSE it in the undef-label.ll test. llvm-svn: 339496	2018-08-11 05:33:00 +00:00
Tom Stellard	8adc86a7dc	AMDGPU/GlobalISel: Define instruction mapping for G_INSERT Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49625 llvm-svn: 339491	2018-08-11 00:51:54 +00:00
Wouter van Oortmerssen	ab26bd0647	[WebAssembly] Added default stack-only instruction mode for MC. Summary: Moved Explicit Locals pass to last. Made that pass obligatory. Made it convert from register to stack based instructions, and removed the registers. Fixes to related code that was expecting register based instructions. Added the correct testing flag to all tests, depending on what the format they were expecting so far. Translated one test to stack format as example: reg-stackify-stack.ll tested: llvm-lit -v `find test -name WebAssembly` unittests/MC/* Reviewers: dschuff, sunfish Subscribers: jfb, llvm-commits, aheejin, eraman, jgravelle-google, sbc100 Differential Revision: https://reviews.llvm.org/D50568 llvm-svn: 339474	2018-08-10 21:32:47 +00:00
Eli Friedman	e1687a89e8	[ARM] Adjust AND immediates to make them cheaper to select. LLVM normally prefers to minimize the number of bits set in an AND immediate, but that doesn't always match the available ARM instructions. In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer a two-instruction sequence movs+ands or movs+bics. Some potential improvements outlined in ARMTargetLowering::targetShrinkDemandedConstant, but seems to work pretty well already. The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX instruction due to a larger-than-expected mask. (It's orthogonal, in some sense, but as far as I can tell it's either impossible or nearly impossible to reproduce the bug without this change.) According to my testing, this seems to consistently improve codesize by a small amount by forming bic more often for ISD::AND with an immediate. Differential Revision: https://reviews.llvm.org/D50030 llvm-svn: 339472	2018-08-10 21:21:53 +00:00
Matt Arsenault	940e6075e4	AMDGPU: More canonicalized operations llvm-svn: 339464	2018-08-10 19:20:17 +00:00
Matt Arsenault	3dcf4ce435	AMDGPU: Combine and of seto/setuo and fp_class Clear the nan (or non-nan) test bits from the mask. llvm-svn: 339462	2018-08-10 18:58:56 +00:00
Matt Arsenault	8ad00d30fa	AMDGPU: Match isfinite pattern to class instructions llvm-svn: 339460	2018-08-10 18:58:41 +00:00
Sam Parker	8c4b964c5a	[ARM] Disallow zexts in ARMCodeGenPrepare Enabling ARMCodeGenPrepare by default caused a whole load of failures. This is due to zexts and truncs not being handled properly. ZExts are messy so it's just easier to disable for now and truncs are allowed only as 'sinks'. I still need to figure out why allowing them as 'sources' causes so many failures. The other main changes are that we are explicit in the types that we converting to, it's now always 'TypeSize'. Type support is also now performed while checking for valid opcodes as it unnecessarily complicated having the checks are different stages. I've moved the tests around too, so we have the zext and truncs in their own file as well as the overflowing opcode tests. Differential Revision: https://reviews.llvm.org/D50518 llvm-svn: 339432	2018-08-10 13:57:13 +00:00
Hans Wennborg	d4090be340	Rename the cfguard module flag to cfguardtable The previous name sounds like it inserts cfguard implementation, but it really just emits the table of address-taken functions. Change the name to better reflect that. Clang will be updated in the next commit. llvm-svn: 339419	2018-08-10 09:48:53 +00:00
Heejin Ahn	5831e9cc79	[WebAssembly] Gate i64x2 and f64x2 on -wasm-enable-unimplemented Summary: i64x2 and f64x2 operations are not implemented in V8, so we normally do not want to emit them. However, they are in the SIMD spec proposal, so we still want to be able to test them in the toolchain. This patch adds a flag to enable their emission. Reviewers: aheejin, dschuff Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D50423 Patch by Thomas Lively (tlively) llvm-svn: 339407	2018-08-09 23:58:51 +00:00
Craig Topper	9a8136f7b4	[X86] Qualify one of the heuristics in combineMul to only apply to positive multiply amounts. This seems to slightly help the performance of one of our internal benchmarks. We probably need better heuristics here. llvm-svn: 339406	2018-08-09 23:27:42 +00:00
Krzysztof Parzyszek	75c2ca3638	[Hexagon] Map ISD::TRAP to J2_trap0(#0 ) llvm-svn: 339365	2018-08-09 18:03:45 +00:00
Sanjay Patel	15d1501aae	[SelectionDAG] try harder to convert funnel shift to rotate Similar to rL337966 - if the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. AArch only goes right, and PPC only goes left. x86 has both, so no diffs there. Differential Revision: https://reviews.llvm.org/D50091 llvm-svn: 339359	2018-08-09 17:26:22 +00:00
Michael Berg	ca38254601	extend folding fsub/fadd to fneg for FMF Summary: This change provides a common optimization path for both Unsafe and FMF driven optimization for this fsub fold adding reassociation, as it the flag that most closely represents the translation Reviewers: spatel, wristow, arsenm Reviewed By: spatel Subscribers: wdng Differential Revision: https://reviews.llvm.org/D50195 llvm-svn: 339357	2018-08-09 17:00:03 +00:00
Evandro Menezes	9a92fe0c9e	[ARM] Replace processor check with feature Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a processor check. Otherwise, NFC. llvm-svn: 339354	2018-08-09 16:13:24 +00:00
Sjoerd Meijer	806f70d229	[ARM] FP16: codegen support for VTRN Differential Revision: https://reviews.llvm.org/D50454 llvm-svn: 339340	2018-08-09 12:45:09 +00:00
Simon Pilgrim	511c3fc529	[X86][SSE] Remove PMULDQ/PMULUDQ by zero Exposed by D50328 Differential Revision: https://reviews.llvm.org/D50328 llvm-svn: 339337	2018-08-09 12:37:36 +00:00
Simon Pilgrim	01ae462fef	[X86][SSE] Combine (some) target shuffles with multiple uses As discussed on D41794, we have many cases where we fail to combine shuffles as the input operands have other uses. This patch permits these shuffles to be combined as long as they don't introduce additional variable shuffle masks, which should reduce instruction dependencies and allow the total number of shuffles to still drop without increasing the constant pool. However, this may mean that some memory folds may no longer occur, and on pre-AVX require the occasional extra register move. This also exposes some poor PMULDQ/PMULUDQ codegen which was doing unnecessary upper/lower calculations which will in fact fold to zero/undef - the fix will be added in a followup commit. Differential Revision: https://reviews.llvm.org/D50328 llvm-svn: 339335	2018-08-09 12:30:02 +00:00
Jonas Hahnfeld	20526bf483	[NVPTX] Select atomic loads and stores According to PTX ISA .volatile has the same memory synchronization semantics as .relaxed.sys, so it can be used to implement monotonic atomic loads and stores. This is important for OpenMP's atomic construct where - 'read's and 'write's are lowered to atomic loads and stores, and - an update of float or double types are lowered into a cmpxchg loop. (Note that PTX could do better because it has atom.add.f{32,64} but LLVM's atomicrmw instruction only allows integer types.) Higher levels of atomicity (like acquire and release) need additional synchronization properties which were added with PTX ISA 6.0 / sm_70. So using these instructions still results in an error. Differential Revision: https://reviews.llvm.org/D50391 llvm-svn: 339316	2018-08-09 07:45:49 +00:00
Sanjay Patel	f9a80fe87a	[x86] add test for commuted variant for fsub fold; NFC llvm-svn: 339300	2018-08-08 23:06:59 +00:00
Sanjay Patel	e47dc1a405	[DAGCombiner] loosen constraints for fsub+fadd fold isNegatibleForFree() should not matter here (as the test diffs show) because it's always a win to replace an fsub+fadd with fneg. The problem in D50195 persists because either (1) we are doing these folds in the wrong order or (2) we're missing another fold for fadd. llvm-svn: 339299	2018-08-08 23:04:43 +00:00
Petr Hosek	7b27454477	[ADT] Normalize empty triple components LLVM triple normalization is handling "unknown" and empty components differently; for example given "x86_64-unknown-linux-gnu" and "x86_64-linux-gnu" which should be equivalent, triple normalization returns "x86_64-unknown-linux-gnu" and "x86_64--linux-gnu". autoconf's config.sub returns "x86_64-unknown-linux-gnu" for both "x86_64-linux-gnu" and "x86_64-unknown-linux-gnu". This changes the triple normalization to behave the same way, replacing empty triple components with "unknown". This addresses PR37129. Differential Revision: https://reviews.llvm.org/D50219 llvm-svn: 339294	2018-08-08 22:23:57 +00:00
Sanjay Patel	f8937c8406	[x86] add tests for fsub+fadd with FMF; NFC These are related to the block of code under review in D50195. llvm-svn: 339293	2018-08-08 22:18:16 +00:00
Jonas Devlieghere	49ff4d9041	[DWARF] Unclamp line table version on Darwin for v5 and later. On Darwin we pin the DWARF line tables to version 2. Stop doing so for DWARF v5 and later. Differential revision: https://reviews.llvm.org/D49381 llvm-svn: 339288	2018-08-08 21:16:50 +00:00
Eli Friedman	5b45a39056	[ARM] Avoid spilling lr with Thumb1 tail calls. Normally, if any registers are spilled, we prefer to spill lr on Thumb1 so we can fold the "bx lr" into the "pop". However, if there are tail calls involved, restoring lr is expensive, so skip the optimization in that case. The spill of r7 in the new test also isn't necessary, but that's mostly orthogonal to this patch. (It's the same code in ARMFrameLowering, but it's not related to tail calls.) Differential Revision: https://reviews.llvm.org/D49459 llvm-svn: 339283	2018-08-08 20:03:10 +00:00
Ties Stuij	0244aa67d6	revert tests of '[CodeGen] emit inline asm clobber list warnings for reserved' llvm-svn: 339276	2018-08-08 17:19:32 +00:00
Krzysztof Parzyszek	1df7059150	[Hexagon] Diagnose misaligned absolute loads and stores Differential Revision: https://reviews.llvm.org/D50405 llvm-svn: 339272	2018-08-08 17:00:09 +00:00
Matt Arsenault	935f3b70fe	AMDGPU: Error more gracefully on libcalls I think this is the only situation where the callsite will have a null instruction. llvm-svn: 339271	2018-08-08 16:58:39 +00:00
Matt Arsenault	e719139b10	AMDGPU: Fix shifts for i128 llvm-svn: 339270	2018-08-08 16:58:33 +00:00
Zaara Syeda	b2595b988b	[PowerPC] Improve codegen for vector loads using scalar_to_vector This patch aims to improve the codegen for vector loads involving the scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X) to utilize: LXSD and LXSDX for i64 and f64 LXSIWAX for i32 (sign extension to i64) LXSIWZX for i32 and f64 Committing on behalf of Amy Kwan. Differential Revision: https://reviews.llvm.org/D48950 llvm-svn: 339260	2018-08-08 15:20:43 +00:00
Ties Stuij	52f3631f4b	[CodeGen] emit inline asm clobber list warnings for reserved Summary: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: ``` extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } ``` Compiled for thumb, this gives: ``` $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @APP mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... ``` r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: ``` repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ ``` Reviewers: eli.friedman, olista01, javed.absar, efriedma Reviewed By: efriedma Subscribers: efriedma, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49727 llvm-svn: 339257	2018-08-08 15:15:59 +00:00
Simon Pilgrim	164e8b0b5c	[TargetLowering] BuildUDIV - Add support for divide by one (PR38477) Provide a pass-through of the numerator for divide by one cases - this is the same approach we take in DAGCombiner::visitSDIVLike. I investigated whether we could achieve this by magic MULHU/SRL values but nothing appeared to work as we don't have a way for MULHU(x,c) -> x llvm-svn: 339254	2018-08-08 14:51:19 +00:00

1 2 3 4 5 ...

25672 Commits