llvm-project

Commit Graph

Author	SHA1	Message	Date
Krzysztof Parzyszek	918e6d70bd	[Hexagon] Handle cases when the aligned stack pointer is missing llvm-svn: 306288	2017-06-26 14:17:58 +00:00
Jonas Paulsson	8c33647ba1	[SystemZ] Add a check against zero before calling getTestUnderMaskCond() Csmith discovered that this function can be called with a zero argument, in which case an assert for this triggered. This patch also adds a guard before the other call to this function since it was missing, although the test only covers the case where it was discovered. Reduced test case attached as CodeGen/SystemZ/int-cmp-54.ll. Review: Ulrich Weigand llvm-svn: 306287	2017-06-26 13:38:27 +00:00
Matt Arsenault	8bcf2f20a7	AMDGPU: Whitespace fixes llvm-svn: 306265	2017-06-26 03:01:36 +00:00
Matt Arsenault	10fc062b2b	AMDGPU: Partially fix implicit.buffer.ptr intrinsic handling This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264	2017-06-26 03:01:31 +00:00
Simon Pilgrim	c338ba48fc	[X86][SSE] Remove unused memopfsf32_128/memopfsf64_128 scalar memops The 'scalar' simd bitops were dropped a while ago llvm-svn: 306248	2017-06-25 17:04:58 +00:00
Simon Pilgrim	bed1fa1ac1	Strip trailing whitespace. NFCI. llvm-svn: 306247	2017-06-25 16:57:46 +00:00
Igor Breger	f5035d6ee5	[GlobalISel][X86] Support vector type G_EXTRACT selection. Summary: Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33957 llvm-svn: 306240	2017-06-25 11:42:17 +00:00
Dorit Nuzman	e0e0f1ddb0	[AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2 The cost of an interleaved access was only implemented for AVX512. For other X86 targets an overly conservative Base cost was returned, resulting in avoiding vectorization where it is actually profitable to vectorize. This patch starts to add costs for AVX2 for most prominent cases of interleaved accesses (stride 3,4 chars, for now). Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb workloads; There is also a known issue of 15-30% degradations on some of these workloads, associated with an interleaved access followed by type promotion/widening; the resulting shuffle sequence is currently inefficient and will be improved by a series of patches that extend the X86InterleavedAccess pass (such as D34601 and more to follow). Note 2: The costs in this patch do not reflect port pressure penalties which can be very dominant in the case of interleaved accesses since most of the shuffle operations are restricted to a single port. Further tuning, that may incorporate these considerations, will be done on top of the upcoming improved shuffle sequences (that is, along with the abovementioned work to extend X86InterleavedAccess pass). Differential Revision: https://reviews.llvm.org/D34023 llvm-svn: 306238	2017-06-25 08:26:25 +00:00
Hiroshi Inoue	a85d24b73d	fix trivial typos in comment, NFC llvm-svn: 306211	2017-06-24 16:00:26 +00:00
Rafael Espindola	6418856127	Simplify the processFixupValue interface. NFC. llvm-svn: 306202	2017-06-24 05:22:28 +00:00
Rafael Espindola	daaee7151b	Remove a processFixupValue hack. The intention of processFixupValue is not to redefine the semantics of MCExpr. It is odd enough that a expression lowers to a PCRel MCExpr or not depending on what it looks like. At least it is a local hack now. I left a fix for anyone trying to figure out what producers should be producing a different expression. llvm-svn: 306200	2017-06-24 05:12:29 +00:00
Derek Schuff	d2c9ec7bc7	[WebAssembly] Fix build after r306177 llvm-svn: 306190	2017-06-24 01:00:43 +00:00
Rafael Espindola	f351292141	Remove redundant argument. llvm-svn: 306189	2017-06-24 00:26:57 +00:00
Rafael Espindola	86c664f9d7	Move Value adjustment to applyFixup. NFC. llvm-svn: 306178	2017-06-23 23:05:15 +00:00
Rafael Espindola	801b42de31	ARM: move some logic from processFixupValue to applyFixup. processFixupValue is called on every relaxation iteration. applyFixup is only called once at the very end. applyFixup is then the correct place to do last minute changes and value checks. While here, do proper range checks again for fixup_arm_thumb_bl. We used to do it, but dropped because of thumb2. We now do it again, but use the thumb2 range. llvm-svn: 306177	2017-06-23 22:52:36 +00:00
Petar Jovanovic	53dbfb3798	Reland r306095: [mips] Fix reg positions in the aui/daui instructions After fixing (r306173) a failing test in the lld test suite (r306173), reland r306095. Original commit message: [mips] Fix register positions in the aui/daui instructions Swapped the position of the rt and rs register in the aui/daui instructions for mips32r6 and mips64r6. With this change, the format of the generated instructions complies with specifications and GCC. Patch by Milos Stojanovic. llvm-svn: 306174	2017-06-23 22:37:19 +00:00
Geoff Berry	dd239718bd	[AArch64][Falkor] Remove some non-existent opcodes from sched detail regexes. NFC. llvm-svn: 306170	2017-06-23 21:59:09 +00:00
Vadzim Dambrouski	9e0d3878fb	[MSP430] Fix data layout string. Summary: Without this patch some types have incorrect size and/or alignment according to the MSP430 EABI. Reviewers: asl, awygle Reviewed By: asl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34561 llvm-svn: 306159	2017-06-23 21:11:45 +00:00
Krzysztof Parzyszek	717021772b	Revert "[Hexagon] Handle decreasing of stack alignment in frame lowering" This breaks passing of aligned function arguments. llvm-svn: 306145	2017-06-23 19:47:04 +00:00
Chad Rosier	6db9ff64a8	[AArch64] Prefer Bcc to CBZ/CBNZ/TBZ/TBNZ when NZCV flags can be set for "free". This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a conditional branch (Bcc), when the NZCV flags can be set for "free". This is preferred on targets that have more flexibility when scheduling Bcc instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are equal). This can reduce register pressure and is also the default behavior for GCC. A few examples: add w8, w0, w1 -> cmn w0, w1 ; CMN is an alias of ADDS. cbz w8, .LBB_2 -> b.eq .LBB0_2 ; single def/use of w8 removed. add w8, w0, w1 -> adds w8, w0, w1 ; w8 has multiple uses. cbz w8, .LBB1_2 -> b.eq .LBB1_2 sub w8, w0, w1 -> subs w8, w0, w1 ; w8 has multiple uses. tbz w8, #31, .LBB6_2 -> b.ge .LBB6_2 In looking at all current sub-target machine descriptions, this transformation appears to be either positive or neutral. Differential Revision: https://reviews.llvm.org/D34220. llvm-svn: 306144	2017-06-23 19:20:12 +00:00
whitequark	00ede4dcc1	[X86] Fix SP adjustment in stack probes emitted on 32-bit Windows. Commit r306010 adjusted the condition as follows: - if (Is64Bit) { + if (!STI.isTargetWin32()) { The intent was to preserve the behavior on all Windows platforms but extend the behavior on 64-bit Windows platforms to every other one. (Before r306010, emitStackProbeCall only ever executed when emitting code for Windows triples.) Unfortunately, if (Is64Bit && STI.isOSWindows()) is not the same as if (!STI.isTargetWin32()) because of the way isTargetWin32() is defined: bool isTargetWin32() const { return !In64BitMode && (isTargetCygMing() \|\| isTargetKnownWindowsMSVC()); } In practice this broke the JIT tests on 32-bit Windows, which did not satisfy the new condition: LLVM :: ExecutionEngine/MCJIT/2003-01-15-AlignmentTest.ll LLVM :: ExecutionEngine/MCJIT/2003-08-15-AllocaAssertion.ll LLVM :: ExecutionEngine/MCJIT/2003-08-23-RegisterAllocatePhysReg.ll LLVM :: ExecutionEngine/MCJIT/test-loadstore.ll LLVM :: ExecutionEngine/OrcMCJIT/2003-01-15-AlignmentTest.ll LLVM :: ExecutionEngine/OrcMCJIT/2003-08-15-AllocaAssertion.ll LLVM :: ExecutionEngine/OrcMCJIT/2003-08-23-RegisterAllocatePhysReg.ll LLVM :: ExecutionEngine/OrcMCJIT/test-loadstore.ll because %esp was not updated correctly. The failures are only visible on a MSVC 2017 Debug build, for which we do not have bots. llvm-svn: 306142	2017-06-23 18:58:10 +00:00
Krzysztof Parzyszek	c0a102f505	[Hexagon] Remove call to printAndVerify from HexagonPassConfig It causes an extra pass of the machine verifier to be added to the pass manager, and causes test/CodeGen/Generic/llc-start-stop.ll to fail. llvm-svn: 306140	2017-06-23 18:47:55 +00:00
Sanjay Patel	3de6bad65f	[x86] fix value types for SBB transform (PR33560) I'm not sure yet why this wouldn't fail in the simple case, but clearly I used the wrong value type with: https://reviews.llvm.org/rL306040 ...and the bug manifests with: https://bugs.llvm.org/show_bug.cgi?id=33560 llvm-svn: 306139	2017-06-23 18:42:15 +00:00
Krzysztof Parzyszek	bb2fcd1921	[Hexagon] Handle decreasing of stack alignment in frame lowering llvm-svn: 306124	2017-06-23 16:53:59 +00:00
Simon Pilgrim	6e85e92b6c	Remove trailing whitespace. NFCI. llvm-svn: 306121	2017-06-23 16:35:32 +00:00
Tim Northover	4b4eec7009	GlobalISel: remove G_SEQUENCE instruction. It was trying to do too many things. The basic lumping together of values for legalization purposes is now handled by G_MERGE_VALUES. More complex things involving gaps and odd sizes are handled by G_INSERT sequences. llvm-svn: 306120	2017-06-23 16:15:55 +00:00
Ulrich Weigand	eaf0051ba3	[SystemZ] Remove unnecessary serialization before volatile loads This reverts the use of TargetLowering::prepareVolatileOrAtomicLoad introduced by r196905. Nothing in the semantics of the "volatile" keyword or the definition of the z/Architecture actually requires that volatile loads are preceded by a serialization operation, and no other compiler on the platform actually implements this. Since we've now seen a use case where this additional serialization causes noticable performance degradation, this patch removes it. The patch still leaves in the serialization before atomic loads, which is now implemented directly in lowerATOMIC_LOAD. (This also seems overkill, but that can be addressed separately.) llvm-svn: 306117	2017-06-23 15:56:14 +00:00
Tom Stellard	af552dc352	AMDGPU/GlobalISel: Mark 32-bit G_AND as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34349 llvm-svn: 306112	2017-06-23 15:17:17 +00:00
Jonas Paulsson	82f15a7168	[SystemZ] Fix trap issue and enable expensive checks. The isBarrier/isTerminator flags have been removed from the SystemZ trap instructions, so that tests do not fail with EXPENSIVE_CHECKS. This was just an issue at -O0 and did not affect code output on benchmarks. (Like Eli pointed out: "targets are split over whether they consider their "trap" a terminator; x86, AArch64, and NVPTX don't, but ARM, MIPS, PPC, and SystemZ do. We should probably try to be consistent here.". This is still the case, although SystemZ has switched sides). SystemZ now returns true in isMachineVerifierClean() :-) These Generic tests have been modified so that they can be run with or without EXPENSIVE_CHECKS: CodeGen/Generic/llc-start-stop.ll and CodeGen/Generic/print-machineinstrs.ll Review: Ulrich Weigand, Simon Pilgrim, Eli Friedman https://bugs.llvm.org/show_bug.cgi?id=33047 https://reviews.llvm.org/D34143 llvm-svn: 306106	2017-06-23 14:30:46 +00:00
Petar Jovanovic	78811c2c07	Revert r306095: [mips] Fix reg positions in the aui/daui instructions ELF/mips-plt-r6.s in lld-test is failing. Reverting the change. Original commit message: [mips] Fix register positions in the aui/daui instructions Swapped the position of the rt and rs register in the aut/daui instructions for mips32r6 and mips64r6. With this change, the format of the generated instructions complies with specifications and GCC. Patch by Milos Stojanovic. llvm-svn: 306099	2017-06-23 13:33:46 +00:00
Petar Jovanovic	d5f7711ebb	[mips] Fix register positions in the aui/daui instructions Swapped the position of the rt and rs register in the aut/daui instructions for mips32r6 and mips64r6. With this change, the format of the generated instructions complies with specifications and GCC. Patch by Milos Stojanovic. Differential Revision: https://reviews.llvm.org/D33988 llvm-svn: 306095	2017-06-23 12:47:18 +00:00
Stefan Maksimovic	b794c0a5ca	[mips][msa] Splat.d endianness check Before this change, it was always the first element of a vector that got splatted since the lower 6 bits of vshf.d $wd were always zero for little endian. Additionally, masking has been performed for vshf via which splat.d is created. Vshf has a property where if its first operand's elements have either bit 6 or 7 set, destination element is set to zero. Initially masked with 63 to avoid this property, which would result in generation of and.v + vshf.d in all cases. Masking with one results in generating a single splati.d instruction when possible. Differential Revision: https://reviews.llvm.org/D32216 llvm-svn: 306090	2017-06-23 09:09:31 +00:00
Rafael Espindola	58173b9720	COFF: Produce an error on invalid pcrel relocs. X86_64 COFF only has support for 32 bit pcrel relocations. Produce an error on all others. Note that gnu as has extended the relocation values to support this. It is not clear if we should support the gnu extension. llvm-svn: 306082	2017-06-23 04:07:44 +00:00
Farhana Aleen	9bd593e0d7	Fixed a (product) build error that was due to an unused variable Details: There was a use but it was in the assert which was not exercised during product build. Reviewers: Andrew Kaylor Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32658 llvm-svn: 306073	2017-06-22 23:56:31 +00:00
Sanjay Patel	359ae44fb4	[x86] add/sub (X==0) --> sbb(cmp X, 1) This is very similar to the transform in: https://reviews.llvm.org/rL306040 ...but in this case, we use cmp X, 1 to set the carry bit as needed. Again, we can show that all of these are logically equivalent (although InstCombine currently canonicalizes to a form not seen here), and if we believe IACA, then this is the smallest/fastest code. Eg, with SNB: \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1 \| 1.0 \| \| \| \| \| \| \| cmp edi, 0x1 \| 2 \| \| 1.0 \| \| \| \| 1.0 \| CP \| sbb eax, eax The larger motivation is to clean up all select-of-constants combining/lowering because we're missing some common cases. llvm-svn: 306072	2017-06-22 23:47:15 +00:00
Farhana Aleen	4b652a5335	Supported lowerInterleavedStore() in X86InterleavedAccess. Reviewers: RKSimon, DavidKreitzer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32658 llvm-svn: 306068	2017-06-22 22:59:04 +00:00
Jacob Gravelle	a31ec61c46	[WebAssembly] WebAssemblyFastISel getelementptr variable index support Summary: Previously -fast-isel getelementptr would constant-fold non-constant i8 load/stores. Reviewers: sunfish Subscribers: jfb, dschuff, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D34044 llvm-svn: 306060	2017-06-22 21:26:08 +00:00
Krzysztof Parzyszek	9b7c1d2dcf	[Hexagon] Properly update kill flags in HexagonNewValueJump The feeder instruction will be moved to right before the compare, so the updating code should not be looking for kills past the compare. llvm-svn: 306059	2017-06-22 21:11:44 +00:00
Krzysztof Parzyszek	1a0da8d5a3	[Hexagon] Use LivePhysRegs to fix up kills in HexagonGenMux Remove the previous, manual shuffling of the kill flags. llvm-svn: 306054	2017-06-22 20:43:02 +00:00
Craig Topper	792fc92be2	[AVX-512] Remove and autoupgrade the masked integer compare intrinsics Summary: These intrinsics aren't used by clang and haven't been for a while. There's some really terrible codegen in the 32-bit target for avx512bw due to i64 not being legal. But as I said these intrinsics aren't used by clang even before this patch so this codegen reflects our clang behavior today. Reviewers: spatel, RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34389 llvm-svn: 306047	2017-06-22 20:11:01 +00:00
Sanjay Patel	41a34e4111	[x86] add/sub (X==0) --> sbb(neg X) Our handling of select-of-constants is lumpy in IR (https://reviews.llvm.org/D24480), lumpy in DAGCombiner, and lumpy in X86ISelLowering. That's why we only had the 'sbb' codegen in 1 out of the 4 tests. This is a step towards smoothing that out. First, show that all of these IR forms are equivalent: http://rise4fun.com/Alive/mx Second, show that the 'sbb' version is faster/smaller. IACA output for SandyBridge (later Intel and AMD chips are similar based on Agner's tables): This is the "obvious" x86 codegen (what gcc appears to produce currently): \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1* \| \| \| \| \| \| \| \| xor eax, eax \| 1 \| 1.0 \| \| \| \| \| \| CP \| test edi, edi \| 1 \| \| \| \| \| \| 1.0 \| CP \| setnz al \| 1 \| \| 1.0 \| \| \| \| \| CP \| neg eax This is the adc version: \| 1* \| \| \| \| \| \| \| \| xor eax, eax \| 1 \| 1.0 \| \| \| \| \| \| CP \| cmp edi, 0x1 \| 2 \| \| 1.0 \| \| \| \| 1.0 \| CP \| adc eax, 0xffffffff And this is sbb: \| 1 \| 1.0 \| \| \| \| \| \| \| neg edi \| 2 \| \| 1.0 \| \| \| \| 1.0 \| CP \| sbb eax, eax If IACA is trustworthy, then sbb became a single uop in Broadwell, so this will be clearly better than the alternatives going forward. llvm-svn: 306040	2017-06-22 18:11:19 +00:00
Rafael Espindola	8a261c2565	Add a common error checking for some invalid expressions. This refactors a bit of duplicated code and fixes an assertion failure on ELF. llvm-svn: 306035	2017-06-22 17:25:35 +00:00
David Stuttard	f677966e2e	[AMDGPU] Add intrinsics for tbuffer load and store - build error fix Variable was unused in non-debug build (used in assert) causing compile time warning and eventual build failure llvm-svn: 306034	2017-06-22 17:15:49 +00:00
David Stuttard	70e8bc1bf3	[AMDGPU] Add intrinsics for tbuffer load and store Intrinsic already existed for llvm.SI.tbuffer.store Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.* Added CodeGen tests for the 2 new variants added. Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr Differential Revision: https://reviews.llvm.org/D30687 llvm-svn: 306031	2017-06-22 16:29:22 +00:00
Krzysztof Parzyszek	f63ad39e7d	[Hexagon] Handle a global operand to A2_addi when creating duplexes llvm-svn: 306012	2017-06-22 15:53:31 +00:00
whitequark	cebe8241ca	[X86] Add support for "probe-stack" attribute This commit adds prologue code emission for stack probe function calls. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D34387 llvm-svn: 306010	2017-06-22 15:42:53 +00:00
Florian Hahn	5991b5be74	[ARM] Create relocations for beq.w branches to ARM function syms. Summary: The ARM ELF ABI requires the linker to do interworking for wide conditional branches from Thumb code to ARM code. That was pointed out by @peter.smith in the comments for D33436. Reviewers: rafael, peter.smith, echristo Reviewed By: peter.smith Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits, peter.smith Differential Revision: https://reviews.llvm.org/D34447 llvm-svn: 306009	2017-06-22 15:32:41 +00:00
Petar Jovanovic	636851b845	[mips] Allow $AT to be used as a register name This patch allows $AT to be used as a register name in assembly files. Currently only $at is recognized as a valid register name. Patch by Stanislav Ocovaj. Differential Revision: https://reviews.llvm.org/D34348 llvm-svn: 306007	2017-06-22 15:24:16 +00:00
Krzysztof Parzyszek	69ffba4595	[Hexagon] Recognize potential offset overflow for store-imm to stack Reserve an extra scavenging stack slot if the offset field in store- -immediate instructions may overflow. llvm-svn: 306004	2017-06-22 14:11:23 +00:00
Sam Kolton	ca5a30ed74	[AMDGPU] SDWA: remove support for VOP2 instructions that have only 64-bit encoding Summary: Despite that this instructions are listed in VOP2, they are treated as VOP3 in specs. They should not support SDWA. There are no real instructions for them, but there are pseudo instructions. Reviewers: arsenm, vpykhtin, cfang Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34403 llvm-svn: 305999	2017-06-22 12:42:14 +00:00
Kristof Beyls	9665249fd8	Don't conditionalize Neon instructions, even in IT blocks. This has been deprecated since ARMARM v7-AR, release C.b, published back in 2012. This also removes test/CodeGen/Thumb2/ifcvt-neon.ll that originally was introduced to check that conditionalization of Neon instructions did happen when generating Thumb2. However, the test had evolved and was no longer testing that. Rather than trying to adapt that test, this commit introduces test/CodeGen/Thumb2/ifcvt-neon-deprecated.mir, since we can now use the MIR framework to write nicer/more maintainable tests. llvm-svn: 305998	2017-06-22 12:11:38 +00:00
Simon Dardis	1c73fcc131	[mips] Implement the ".rdata" MIPS assembly directive. Rather than creating a separate ".rdata" section distinct from the customary ".rodata" in ELF, ".rdata" switches to the ".rodata" section. This patch relands r305949 and r305950 with the correct commit message and addresses nit raised during review. Patch By: John Baldwin! Differential Revision: https://reviews.llvm.org/D34452 llvm-svn: 305995	2017-06-22 10:41:51 +00:00
John Brawn	ed78aaf093	[ARM] Add .w aliases of MOV with shifted operand These appear to have been simply missing. Differential Revision: https://reviews.llvm.org/D34461 llvm-svn: 305993	2017-06-22 10:30:53 +00:00
John Brawn	192f74a84d	[ARM] Clean up choice of narrow instructions in ARMAsmParser, NFC This patch makes a couple of changes to how we decide whether to use the narrow or wide encoding of thumb2 instructions: * Common out the detection of the .w qualifier * Check for the CPSR operand in a consistent way Differential Revision: https://reviews.llvm.org/D34460 llvm-svn: 305992	2017-06-22 10:29:31 +00:00
Igor Breger	1c29be7e4f	[GlobalISel][X86] Support vector type G_INSERT legalization/selection. Summary: Support vector type G_INSERT legalization/selection. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33956 llvm-svn: 305989	2017-06-22 09:43:35 +00:00
Florian Hahn	b489e56ae2	[ARM] Add macro fusion for AES instructions. Summary: This patch adds a macro fusion using CodeGen/MacroFusion.cpp to pair AES instructions back to back and adds FeatureFuseAES to enable the feature. Reviewers: evandro, javed.absar, rengolin, t.p.northover Reviewed By: javed.absar Subscribers: aemerson, mgorny, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34142 llvm-svn: 305988	2017-06-22 09:39:36 +00:00
Elena Demikhovsky	2dac0b4d58	AVX-512: Lowering Masked Gather intrinsic - fixed a bug Masked gather for vector length 2 is lowered incorrectly for element type i32. The type <2 x i32> was automatically extended to <2 x i64> and we generated VPGATHERQQ instead of VPGATHERQD. The type <2 x float> is extended to <4 x float>, so there is no bug for this type, but the sequence may be more optimal. In this patch I'm fixing <2 x i32>bug and optimizing <2 x float> sequence for GATHERs only. The same fix should be done for Scatters as well. Differential revision: https://reviews.llvm.org/D34343 llvm-svn: 305987	2017-06-22 06:47:41 +00:00
Sam Kolton	3c4933fcc6	[AMDGPU] SDWA: add support for GFX9 in peephole pass Summary: Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers. Added several subtarget features for GFX9 SDWA. This diff also contains changes from D34026. Depends D34026 Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34241 llvm-svn: 305986	2017-06-22 06:26:41 +00:00
Hiroshi Inoue	1d5693c915	[PowerPC] fix potential verification errors This patch fixes trivial mishandling of 32-bit/64-bit instructions that may cause verification errors with -verify-machineinstrs. llvm-svn: 305984	2017-06-22 04:33:44 +00:00
Reid Kleckner	ef5817579b	[wasm] Fix WebAssembly asm backend after r305968 llvm-svn: 305978	2017-06-22 01:07:05 +00:00
Davide Italiano	7a6c5c12ad	Revert "[Target] Implement the ".rdata" MIPS assembly directive." This reverts commit r305949 and r305950 as they didn't have the correct commit message. llvm-svn: 305973	2017-06-22 00:11:41 +00:00
Stanislav Mekhanoshin	3ed38c601a	[AMDGPU] Add FP_CLASS to the add/setcc combine This is one of the nodes which also compile as v_cmp_*. Differential Revision: https://reviews.llvm.org/D34485 llvm-svn: 305970	2017-06-21 23:46:22 +00:00
Rafael Espindola	88d9e37ec8	Use a MutableArrayRef. NFC. llvm-svn: 305968	2017-06-21 23:06:53 +00:00
Rafael Espindola	6da25f4fc4	Fix build. llvm-svn: 305967	2017-06-21 23:02:57 +00:00
Stanislav Mekhanoshin	a8b26936d0	[AMDGPU] Combine add and adde, sub and sube If one of the arguments of adde/sube is zero we can fold another add/sub into it. Differential Revision: https://reviews.llvm.org/D34374 llvm-svn: 305964	2017-06-21 22:30:01 +00:00
Sam Clegg	705f798bff	Mark dump() methods as const. NFC Add const qualifier to any dump() method where adding one was trivial. Differential Revision: https://reviews.llvm.org/D34481 llvm-svn: 305963	2017-06-21 22:19:17 +00:00
Stanislav Mekhanoshin	e3eb42cef6	[AMDGPU] simplify add x, *ext (setcc) => addc\|subb x, 0, setcc This simplification allows to avoid generating v_cndmask_b32 to serialize condition code between compare and use. Differential Revision: https://reviews.llvm.org/D34300 llvm-svn: 305962	2017-06-21 22:05:06 +00:00
Krzysztof Parzyszek	5b933fee3c	[Hexagon] Use MachineInstrBuilder instead of changing instruction in place llvm-svn: 305953	2017-06-21 21:03:34 +00:00
Davide Italiano	75ed943def	[Target] Implement the ".rdata" MIPS assembly directive. Patch by John Baldwin < jhb at freebsd dot org >! Differential Revision: https://reviews.llvm.org/D34452 llvm-svn: 305949	2017-06-21 20:40:27 +00:00
Davide Italiano	9b8e3d308f	[Solaris] emit .init_array instead of .ctors on Solaris (Sparc/x86) Patch by Fedor Sergeev. Differential Revision: https://reviews.llvm.org/D33868 llvm-svn: 305948	2017-06-21 20:36:32 +00:00
Krzysztof Parzyszek	fd048cc0ec	[Hexagon] Handle more types of immediate operands in expand-condsets llvm-svn: 305943	2017-06-21 19:21:30 +00:00
Lei Huang	84dbbfdeb9	[PowerPC] define target hook isReallyTriviallyReMaterializable() Define target hook isReallyTriviallyReMaterializable() to explicitly specify PowerPC instructions that are trivially rematerializable. This will allow the MachineLICM pass to accurately identify PPC instructions that should always be hoisted. Differential Revision: https://reviews.llvm.org/D34255 llvm-svn: 305932	2017-06-21 17:17:56 +00:00
Dmitry Preobrazhensky	851a3d9f05	[AMDGPU][MC][GFX9] Corrected VOP3P relevant code to fix disassembler failures See Bug 33509: https://bugs.llvm.org//show_bug.cgi?id=33509 Reviewers: Sam Kolton, Artem Tamazov, Valery Pykhtin Differential Revision: https://reviews.llvm.org/D34360 llvm-svn: 305923	2017-06-21 16:00:54 +00:00
Dmitry Preobrazhensky	dc4ac823ec	[AMDGPU][MC] Corrected V_QSAD instructions to check that dest register is different than any of the src See Bug 33279: https://bugs.llvm.org//show_bug.cgi?id=33279 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D34003 llvm-svn: 305915	2017-06-21 14:41:34 +00:00
Sanjay Patel	cec6a500a8	[x86] fix formatting; NFC llvm-svn: 305914	2017-06-21 14:27:11 +00:00
Christof Douma	c1c28051d2	[AARCH64][LSE] Preliminary support for ARMv8.1 LSE Atomics. Implemented support to AArch64 codegen for ARMv8.1 Large System Extensions atomic instructions. Where supported, these instructions can provide atomic operations with higher performance. Currently supported operations include: fetch_add, fetch_or, fetch_xor, fetch_smin, fetch_min/max (signed and unsigned), swap, and compare_exchange. This implementation implies sequential-consistency ordering, more relaxed ordering is under development. Subtarget->hasLSE is currently supported for Cavium ThunderX2T99. Patch by Ananth Jasty. Differential Revision: https://reviews.llvm.org/D33586 Change-Id: I82f6d3d64255622791ceb0715b7ab9f4dc4d4b2c llvm-svn: 305893	2017-06-21 10:58:31 +00:00
Florian Hahn	8552e591a1	[AArch64] Add early exit to promoteLoadFromStore. There should be at most a single kill flag for the promoted operand between the store/load pair. Discussed in https://reviews.llvm.org/D34402. llvm-svn: 305889	2017-06-21 09:51:52 +00:00
Strahinja Petrovic	d280ea4f76	[MIPS] Fix for selecting of DINS/INS instruction This patch adds one more condition in selection DINS/INS instruction, which fixes MultiSource/Applications/JM/ldecod/ for mips32r2 (and mips64r2 n32 abi). Differential Revision: https://reviews.llvm.org/D33725 llvm-svn: 305888	2017-06-21 09:25:51 +00:00
Sam Kolton	549c89d2c9	[AMDGPU] SDWA: merge VI and GFX9 pseudo instructions Summary: Previously there were two separate pseudo instruction for SDWA on VI and on GFX9. Created one pseudo instruction that is union of both of them. Added verifier to check that operands conform either VI or GFX9. Reviewers: dp, arsenm, vpykhtin Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, artem.tamazov Differential Revision: https://reviews.llvm.org/D34026 llvm-svn: 305886	2017-06-21 08:53:38 +00:00
Florian Hahn	80e485179e	[AArch64] Preserve register flags when promoting a load from store. Summary: This patch updates promoteLoadFromStore to use the store MachineOperand as the source operand of the of the new instruction instead of creating a new register MachineOperand. This way, the existing register flags are preserved. This fixes PR33468 (https://bugs.llvm.org/show_bug.cgi?id=33468). Reviewers: MatzeB, t.p.northover, junbuml Reviewed By: MatzeB Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34402 llvm-svn: 305885	2017-06-21 08:47:23 +00:00
Rafael Espindola	3ac4c09daf	clang-format a region. It will make a followup patch easier to read. llvm-svn: 305865	2017-06-20 22:53:29 +00:00
Matt Arsenault	67cd347e93	AMDGPU: Allow vectorization of packed types llvm-svn: 305844	2017-06-20 20:38:06 +00:00
Stanislav Mekhanoshin	a9d846c6ef	[AMDGPU] Fix illegal shrink of V_SUBB_U32 and V_ADDC_U32 If there is an immediate operand we shall not shrink V_SUBB_U32 and V_ADDC_U32, it does not fit e32 encoding. Differential Revison: https://reviews.llvm.org/D34291 llvm-svn: 305840	2017-06-20 20:33:44 +00:00
Matt Arsenault	9698f1c862	AMDGPU: Start adding global_* instructions llvm-svn: 305838	2017-06-20 19:54:14 +00:00
Matt Arsenault	ff3f912e74	AMDGPU: Do operand folding in program order Before it was possible to partially fold use instructions before the defs. After the xor is folded into a copy, the same mov can end up in the fold list twice, so on the second attempt it will fail expecting to see a register to fold. llvm-svn: 305821	2017-06-20 18:56:32 +00:00
Matt Arsenault	76858f5a1d	AMDGPU: Preserve undef when folding register operands If the source was a copy of an undef register, this would produce a read of an undefined register which is a verifier error. llvm-svn: 305816	2017-06-20 18:41:31 +00:00
Stanislav Mekhanoshin	465a1ff193	[AMDGPU] Eliminate SGPR to VGPR copy when possible SGPRs are generally cheaper, so try to use them over VGPRs. Differential Revision: https://reviews.llvm.org/D34130 llvm-svn: 305815	2017-06-20 18:32:42 +00:00
Matt Arsenault	7f67b35901	AMDGPU: Fix crash with undef vreg input operand llvm-svn: 305814	2017-06-20 18:28:02 +00:00
Hiroshi Inoue	e7a35539c5	[PowerPC] fix trivial typos in comment, NFC llvm-svn: 305813	2017-06-20 17:53:33 +00:00
Sanjay Patel	0656629b87	[x86] enable CGP memcmp() expansion for 2/4/8 byte sizes There are a couple of potential improvements as seen in the IR and asm: 1. We're unnecessarily extending to a larger type to compare values. 2. The codegen for (select cond, 1, -1) could avoid a cmov. (or we could change the order of the compares, so we have a select with 0 operand) llvm-svn: 305802	2017-06-20 15:58:30 +00:00
Simon Pilgrim	4822b5b649	[X86][SSE] Relax 0/-1 vector element insertion to work for any vector with >=16bit elements Shuffle lowering/combining now does a good job for 256/512-bit vectors - we don't need to prevent this llvm-svn: 305801	2017-06-20 15:19:02 +00:00
Simon Pilgrim	b233c0a5d2	[X86][SSE] Dropped old INSERT_VECTOR_ELT lowering TODO Target shuffle combining now supports the matching of INSERT_VECTOR_ELT/PINSRW/PINSRB for merging multiple insertions into shuffles/bitmasks. llvm-svn: 305788	2017-06-20 10:33:34 +00:00
Igor Breger	0dee0f458e	[GlobalISel][X86] fix compilation error ( -Werror=unused-function ) llvm-svn: 305786	2017-06-20 09:40:57 +00:00
Igor Breger	1dcd5e8dc8	[GlobalISel][X86] Get correct RegClass for given RegBank. Summary: In some cases RegClass depends on target feature. Hight (16-31) vector registers exist only if AVX512f available. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: t.p.northover, guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33952 Conflicts: test/CodeGen/X86/GlobalISel/select-memop-scalar.mir llvm-svn: 305784	2017-06-20 09:15:10 +00:00
Alexandros Lamprineas	2b2b420563	[ARM] Support constant pools in data when generating execute-only code. Resubmission of r305387, which was reverted at r305390. The Address Sanitizer caught a stack-use-after-scope of a Twine variable. This is now fixed by passing the Twine directly as a function parameter. The ARM backend asserts against constant pool lowering when it generates execute-only code in order to prevent the generation of constant pools in the text section. It appears that target independent optimizations might generate DAG nodes that represent constant pools. By lowering such nodes as global addresses we don't violate the semantics of execute-only code and also it is guaranteed that execute-only behaves correct with the position-independent addressing modes that support execute-only code. Differential Revision: https://reviews.llvm.org/D33773 llvm-svn: 305776	2017-06-20 07:20:52 +00:00
Matt Arsenault	c595185f8f	AMDGPU: Fix scratch wave offset relative FI expansion The offset may not be an inline immediate, so this needs to be materialized into a register. The post-RA run of SIShrinkInstructions is able to fold it later if it can. llvm-svn: 305761	2017-06-19 23:47:21 +00:00
Stanislav Mekhanoshin	50c2f251f5	[AMDGPU] Add infer address spaces pass before SROA It adds it for the target after inlining but before SROA where we can get most out of it. Differential Revision: https://reviews.llvm.org/D34366 llvm-svn: 305759	2017-06-19 23:17:36 +00:00
Eugene Zelenko	8361b0a9bb	[Target] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 305757	2017-06-19 22:43:19 +00:00
Geoff Berry	5e46600e3a	[AArch64][Falkor] Fix MOVZ sched predicate to not assert on non-imm operands (e.g. blockaddress). llvm-svn: 305752	2017-06-19 21:57:44 +00:00
Geoff Berry	e9972cabbd	[AArch64][Kryo] Add missing write latency for LDAXP, LDXP second destination. Fixes PR33491 and PR33512. llvm-svn: 305751	2017-06-19 21:57:42 +00:00
Geoff Berry	3cc4b9f780	[AArch64][Falkor] Refine load/store increment latencies. Also fix LDXP & LDAXP write latency to avoid similar assert as PR33491 and PR33512. llvm-svn: 305750	2017-06-19 21:56:21 +00:00
Matt Arsenault	e0e68a757e	AMDGPU: Cleanup CreateLiveInRegister llvm-svn: 305748	2017-06-19 21:52:45 +00:00
Nico Weber	4c5c02a448	Revert r305382, it caused PR33513. llvm-svn: 305735	2017-06-19 19:48:59 +00:00
Hans Wennborg	ca69fc1cb7	Revert r304824 "Fix PR23384 (part 3 of 3)" This seems to be interacting badly with ASan somehow, causing false reports of heap-buffer overflows: PR33514. > Summary: > The patch makes instruction count the highest priority for > LSR solution for X86 (previously registers had highest priority). > > Reviewers: qcolombet > > Differential Revision: http://reviews.llvm.org/D30562 > > From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 305720	2017-06-19 17:57:15 +00:00
Florian Hahn	fd44ca6c76	[AArch64] Fix order of checks in shouldScheduleAdjacent. We need to check the opcode of FirstMI before accessing the operands. This caused a buildbot failure during bootstrapping on AArch64. llvm-svn: 305694	2017-06-19 13:45:41 +00:00
Tom Stellard	ff63ee0db5	AMDGPU/GlobalISel: Mark G_BITCAST s32 <--> <2 x s16> legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34129 llvm-svn: 305692	2017-06-19 13:15:45 +00:00
Igor Breger	bd2dedaa38	[GlobalISel][X86] Fold FI/G_GEP into LDR/STR instruction addressing mode. Summary: Implement some of the simplest addressing modes.It should help to test ABI. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33888 llvm-svn: 305691	2017-06-19 13:12:57 +00:00
Florian Hahn	5f746c8e27	Recommit rL305677: [CodeGen] Add generic MacroFusion pass Use llvm::make_unique to avoid ambiguity with MSVC. This patch adds a generic MacroFusion pass, that is used on X86 and AArch64, which both define target-specific shouldScheduleAdjacent functions. This generic pass should make it easier for other targets to implement macro fusion and I intend to add macro fusion for ARM shortly. Differential Revision: https://reviews.llvm.org/D34144 llvm-svn: 305690	2017-06-19 12:53:31 +00:00
Diana Picus	78aaf7db04	[ARM] GlobalISel: Support G_ICMP for s8 and s16 Widen to s32 (like all other binary ops). llvm-svn: 305683	2017-06-19 11:47:28 +00:00
Florian Hahn	e16d3106f3	Revert r305677 [CodeGen] Add generic MacroFusion pass. This causes Windows buildbot failures do an ambiguous call. llvm-svn: 305681	2017-06-19 11:26:15 +00:00
Florian Hahn	ee1b096f8a	[CodeGen] Add generic MacroFusion pass. Summary: This patch adds a generic MacroFusion pass, that is used on X86 and AArch64, which both define target-specific shouldScheduleAdjacent functions. This generic pass should make it easier for other targets to implement macro fusion and I intend to add macro fusion for ARM shortly. Reviewers: craig.topper, evandro, t.p.northover, atrick, MatzeB Reviewed By: MatzeB Subscribers: atrick, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34144 llvm-svn: 305677	2017-06-19 10:51:38 +00:00
Diana Picus	621894ac76	[ARM] GlobalISel: Support G_ICMP for i32 and pointers Add support throughout the pipeline: - mark as legal for s32 and pointers - map to GPRs - lower to a sequence of instructions, which moves 0 or 1 into the result register based on the flags set by a CMPrr We have copied from FastISel a helper function which maps CmpInst predicates into ARMCC codes. Ideally, we should be able to move it somewhere that both FastISel and GlobalISel can use. llvm-svn: 305672	2017-06-19 09:40:51 +00:00
Eric Christopher	c70d07b7ea	Rework logic and comment out the default relocation models for PPC. llvm-svn: 305630	2017-06-17 02:25:56 +00:00
Eric Christopher	5ec30ef4e4	Turn a large if block into a smaller early return for clarity. llvm-svn: 305629	2017-06-17 02:25:55 +00:00
Eric Christopher	ded727c5a8	Remove the old and unused PPC32 and PPC64TargetMachine classes. llvm-svn: 305628	2017-06-17 02:25:53 +00:00
Eric Christopher	3b78693b9a	Remove unused forward declaration. llvm-svn: 305627	2017-06-17 02:25:51 +00:00
Eric Christopher	a5b50cd645	Tidy up some calls to getRegister for readability. llvm-svn: 305626	2017-06-17 02:25:49 +00:00
Tim Shen	d123c194e0	[PPC] Remove isBarrier from CFENCE8's definition. Summary: This is my misunderstanding on isBarrier. It's not for memory barriers, but for other control flow purposes. lwsync doesn't have it either. This fixes a simple crash with -verify-machineinstrs like below: define void @Foo() { entry: %tmp = load atomic i64, i64* undef acquire, align 8 unreachable } I deliberately don't want to check in the test, since there is little chance to regress on such a mistake. Such a test adds noise to the code base. I plan to check in first, since it fixes a crash, and the fix is obvious. Reviewers: kbarton, echristo Subscribers: sanjoy, nemanjai, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D34314 llvm-svn: 305624	2017-06-17 01:25:34 +00:00
Sam Clegg	9d24fb7ff3	[WebAssembly] Use __stack_pointer global when writing wasm binary This ensures that symbolic relocations are generated for stack pointer manipulations. These relocations are of type R_WEBASSEMBLY_GLOBAL_INDEX_LEB. This change also adds support for reading relocations of this type in WasmObjectFile.cpp. Since its a globally imported symbol this does mean that the get_global/set_global instruction won't be valid until the objects are linked that global used in no longer an imported global. Differential Revision: https://reviews.llvm.org/D34172 llvm-svn: 305616	2017-06-16 23:59:10 +00:00
Yonghong Song	a63178f756	bpf: fix a strict-aliasing issue Davide Italiano reported the following issue if llvm is compiled with gcc -Wstrict-aliasing -Werror: ..... lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BPFISelDAGToDAG.cpp.o ../lib/Target/BPF/BPFISelDAGToDAG.cpp: In member function ‘virtual void {anonymous}::BPFDAGToDAGISel::PreprocessISelDAG()’: ../lib/Target/BPF/BPFISelDAGToDAG.cpp:264:26: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] val = (uint16_t )new_val; ..... The error is caused by my previous commit (revision 305560). This patch fixed the issue by introducing an union to avoid type casting. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 305608	2017-06-16 23:28:04 +00:00
Yonghong Song	ac2e25026f	bpf: avoid load from read-only sections If users tried to have a structure decl/init code like below struct test_t t = { .memeber1 = 45 }; It is very likely that compiler will generate a readonly section to hold up the init values for variable t. Later load of t members, e.g., t.member1 will result in a read from readonly section. BPF program cannot handle relocation. This will force users to write: struct test_t t = {}; t.member1 = 45; This is just inconvenient and unintuitive. This patch addresses this issue by implementing BPF PreprocessISelDAG. For any load from a global constant structure or an global array of constant struct, it attempts to translate it into a constant directly. The traversal of the constant struct and other constant data structures are similar to where the assembler emits read-only sections. Four different unit test cases are also added to cover different scenarios. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 305560	2017-06-16 15:41:16 +00:00
Yonghong Song	5c65439617	bpf: set missing types in insn tablegen file o This is discovered during my study of 32-bit subregister support. o This is no impact on current functionality since we only support 64-bit registers. o Searching the web, looks like the issue has been discovered before, so fix it now. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 305559	2017-06-16 15:30:55 +00:00
Simon Dardis	5852c4c108	Revert "[mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP" This reverts commit r305455. This commit was reported as breaking one of the sanitizer buildbots. Reverting until lab.llvm.org comes back online. llvm-svn: 305557	2017-06-16 14:00:33 +00:00
Krzysztof Parzyszek	3a40b34123	[Hexagon] Don't kill live registers when creating mux out of tfr The second part of r305300: when placing the mux at the later location, make sure that it won't use any register that was killed between the two original instructions. Remove any such kills and transfer them to the mux. llvm-svn: 305553	2017-06-16 12:24:03 +00:00
Alfred Huang	f9b521fdaf	[AMDGPU] Testing commit access only, no real change llvm-svn: 305523	2017-06-15 23:02:55 +00:00
Alexander Timofeev	0f9c84cd93	DivergencyAnalysis patch for review llvm-svn: 305494	2017-06-15 19:33:10 +00:00
Lei Huang	b4733ca8c5	[MachineLICM] Hoist TOC-based address instructions Add condition for MachineLICM to safely hoist instructions that utilize non constant registers that are reserved. On PPC, global variable access is done through the table of contents (TOC) which is always in register X2. The ABI reserves this register in any functions that have calls or access global variables. A call through a function pointer involves saving, changing and restoring this register around the call and thus MachineLICM does not consider it to be invariant. We can however guarantee the register is preserved across the call and thus is invariant. Differential Revision: https://reviews.llvm.org/D33562 llvm-svn: 305490	2017-06-15 18:29:59 +00:00
Hiroshi Inoue	7a08bb1458	[PowerPC] fix potential verification errors on CFENCE8 This patch fixes a potential verification error (64-bit register operands for cmpw) with -verify-machineinstrs. Differential Revision: https://reviews.llvm.org/D34208 llvm-svn: 305479	2017-06-15 16:51:28 +00:00
Simon Dardis	24ca9da2de	[mips] Fix documentation of member variable. NFCI. llvm-svn: 305478	2017-06-15 16:28:28 +00:00
Simon Pilgrim	07cfc80186	Remove trailing whitespace. NFCI. llvm-svn: 305476	2017-06-15 16:20:27 +00:00
Simon Pilgrim	4d432b2c6b	[X86][AVX2] Fix issue in lowerV8I16GeneralSingleInputVectorShuffle that was assuming v8i16 vectors We can use this with v16i16/v32i16 as well. Found during fuzz testing. llvm-svn: 305472	2017-06-15 14:52:30 +00:00
Nirav Dave	85e92223b4	[AArch64] Add indexed check to splitStores. NFC. Add explicit check for unhandled cases in preparation for delaying splitStores to post-legalization. llvm-svn: 305471	2017-06-15 14:47:44 +00:00
Simon Pilgrim	b98cb3808c	Revert r305465: [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions). This is causing windows buildbot failures llvm-svn: 305470	2017-06-15 14:39:34 +00:00
Ayman Musa	56912cda71	[X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions). AVX512 compare instructions return v*i1 types. In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type. Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes. The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class. When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction. Differential Revision: https://reviews.llvm.org/D33188 llvm-svn: 305465	2017-06-15 13:02:37 +00:00
Diana Picus	02e11010b2	[ARM] GlobalISel: Add support for i32 modulo Add support for modulo for targets that have hardware division and for those that don't. When hardware division is not available, we have to choose the correct libcall to use. This is generally straightforward, except for AEABI. The AEABI variant is trickier than the other libcalls because it returns { quotient, remainder }, instead of just one value like the other libcalls that we've seen so far. Therefore, we need to use custom lowering for it. However, we don't want to have too much special code, so we refactor the target-independent code in the legalizer by adding a helper for replacing an instruction with a libcall. This helper is used by the legalizer itself when dealing with simple calls, and also by the custom ARM legalization for the more complicated AEABI divmod calls. llvm-svn: 305459	2017-06-15 10:53:31 +00:00
Diana Picus	8fd1601d32	[ARM] GlobalISel: Lower only homogeneous struct args Lowering mixed struct args, params and returns used G_INSERT, which is a bit more convoluted to support through the entire pipeline. Since they don't occur that often in practice, it's probably wiser to leave them out until later. Meanwhile, we can lower homogeneous structs using G_MERGE_VALUES, which has good support in the legalizer. These occur e.g. as the return of __aeabi_idivmod, so it's nice to be able to support them. llvm-svn: 305458	2017-06-15 09:42:02 +00:00
Florian Hahn	0a26d2c298	[AArch64] Enable FeatureFuseAES for the generic processor model. Summary: Scheduling AESE/AESMC and AESD/AESIMC instruction pairs back-to-back gives a double digit speedup on benchmarks using those instructions on Cortex-A processors. In GCC, this optimization is part of the generic processor model as well. This change should not have a major performance impact on processors that do not optimize AES instruction pairs, although I only had access to Cortex-A processors for benchmarking. Reviewers: rengolin, kristof.beyls, javed.absar, evandro, silviu.baranga, MatzeB, mcrosier, joelkevinjones, joel_k_jones, bmakam, t.p.northover Reviewed By: evandro Subscribers: sbaranga, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D33836 llvm-svn: 305457	2017-06-15 09:31:23 +00:00
Zoran Jovanovic	d9299293ad	[mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP Author: milena.vujosevic.janicic Reviewers: sdardis The patch extends size reduction pass for MicroMIPS. The following instructions are examined and transformed, if possible: ADDIU instruction is transformed into 16-bit instruction ADDIUSP ADDIU instruction is transformed into 16-bit instruction ADDIUR1SP Differential Revision: https://reviews.llvm.org/D33887 llvm-svn: 305455	2017-06-15 09:14:33 +00:00
Sanjay Patel	83cb007940	[x86] avoid unnecessary shuffle mask math in combineX86ShufflesRecursively() This is a follow-up to https://reviews.llvm.org/D34174 / https://reviews.llvm.org/rL305398. We mentioned replacing the multiplies with shifts, but the real win seems to be in bypassing the extra ops in the common case when the RootRatio and OpRatio are one. This gives us another 1-2% overall win for the test in PR32037: https://bugs.llvm.org/show_bug.cgi?id=32037 llvm-svn: 305414	2017-06-14 20:37:11 +00:00
Lei Huang	f689f69fea	Test commit - NFC. Modified a comment to confirm commit access functionality. llvm-svn: 305402	2017-06-14 17:25:55 +00:00
Sanjay Patel	ce0b99563a	[x86] replace div/rem with shift/mask for better shuffle combining perf We know that shuffle masks are power-of-2 sizes, but there's no way (?) for LLVM to know that, so hack combineX86ShufflesRecursively() to be much faster by replacing div/rem with shift/mask. This makes the motivating compile-time bug in PR32037 ( https://bugs.llvm.org/show_bug.cgi?id=32037 ) about 9% faster overall. Differential Revision: https://reviews.llvm.org/D34174 llvm-svn: 305398	2017-06-14 17:00:57 +00:00
Alexandros Lamprineas	1c15ee2631	Revert "[ARM] Support constant pools in data when generating execute-only code." This reverts commit 3a204faa093c681a1e96c5e0622f50649b761ee0. I've upset a buildbot which runs the address sanitizer: ERROR: AddressSanitizer: stack-use-after-scope lib/Target/ARM/ARMISelLowering.cpp:2690 That Twine variable is used illegally. llvm-svn: 305390	2017-06-14 15:00:08 +00:00
Simon Dardis	9790e39f45	[mips] Fix multiprecision arithmetic. For multiprecision arithmetic on MIPS, rather than using ISD::ADDE / ISD::ADDC, get SelectionDAG to break down the operation into ISD::ADDs and ISD::SETCCs. For MIPS, only the DSP ASE has a carry flag, so in the general case it is not useful to directly support ISD::{ADDE, ADDC, SUBE, SUBC} nodes. Also improve the generation code in such cases for targets with TargetLoweringBase::ZeroOrOneBooleanContent by directly using the result of the comparison node rather than using it in selects. Similarly for ISD::SUBE / ISD::SUBC. Address optimization breakage by moving the generation of MIPS specific integer multiply-accumulate nodes to before legalization. This revolves PR32713 and PR33424. Thanks to Simonas Kazlauskas and Pirama Arumuga Nainar for reporting the issue! Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D33494 llvm-svn: 305389	2017-06-14 14:46:30 +00:00
Alexandros Lamprineas	c582d6e133	[ARM] Support constant pools in data when generating execute-only code. The ARM backend asserts against constant pool lowering when it generates execute-only code in order to prevent the generation of constant pools in the text section. It appears that target independent optimizations might generate DAG nodes that represent constant pools. By lowering such nodes as global addresses we don't violate the semantics of execute-only code and also it is guaranteed that execute-only behaves correct with the position-independent addressing modes that support execute-only code. Differential Revision: https://reviews.llvm.org/D33773 llvm-svn: 305387	2017-06-14 13:22:41 +00:00
Simon Dardis	941a49b6d6	[mips] Fix machine verifier errors in the long branch pass This patch fixes two systemic machine verifier errors in the long branch pass. The first is the incorrect basic block successors and the second was the incorrect construction of several jump instructions. This partially resolves PR27458 and the associated PR32146. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D33378 llvm-svn: 305382	2017-06-14 12:16:47 +00:00
Nemanja Ivanovic	7855185bbb	Revert r304907 as it is causing some failures that I cannot reproduce. Reverting this until a test case can be provided to aid the investigation. llvm-svn: 305372	2017-06-14 07:05:42 +00:00
Davide Italiano	36559b2527	[AMDGPU] Remove now dead defaultOffsetS13(). NFCI. Fixes the GCC7 build with -Werror. llvm-svn: 305329	2017-06-13 22:24:24 +00:00
Sam Clegg	ae03c1e724	[WebAssembly] Cleanup WebAssemblyWasmObjectWriter Differential Revision: https://reviews.llvm.org/D34131 llvm-svn: 305316	2017-06-13 18:51:50 +00:00
Geoff Berry	13d5dcb093	[AArch64][Falkor] Fix sched details for FDIV, FSQRT, SDIV, UDIV llvm-svn: 305310	2017-06-13 17:43:39 +00:00
Kit Barton	0b216305db	Test commit - NFC. Modified a comment to confirm commit access functionality. llvm-svn: 305309	2017-06-13 17:35:29 +00:00
Krzysztof Parzyszek	b3a8d20e27	[Hexagon] Generate store-immediate instructions for stack objects Store-immediate instructions have a non-extendable offset. Since the actual offset for a stack object is not known until much later, only generate these stores when the stack size (at the time of instruction selection) is small. llvm-svn: 305305	2017-06-13 17:10:16 +00:00
Krzysztof Parzyszek	c83c267b84	[Hexagon] Generate multiply-high instruction in isel llvm-svn: 305302	2017-06-13 16:21:57 +00:00
Yonghong Song	7e9d2cb553	bpf: clang-format on BPFAsmPrinter.cpp Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 305301	2017-06-13 16:17:20 +00:00
Krzysztof Parzyszek	de2ac17b7b	[Hexagon] Don't kill live registers when creating mux out of tfr When a mux instruction is created from a pair of complementary conditional transfers, it can be placed at the location of either the earlier or the later of the transfers. Since it will use the operands of the original transfers, putting it in the earlier location may hoist a kill of a source register that was originally further down. Make sure the kill flag is removed if the register is still used afterwards. llvm-svn: 305300	2017-06-13 16:07:36 +00:00
Simon Dardis	c38d391f56	[MIPS] BuildCondBr should preserve MO flags While simplifying branches in the MachineInstr representation, the routine BuildCondBr must preserve flags on register MachineOperands. In particular, it must preserve the <undef> flag. This fixes a bug that is unlikely to occur in any real scenario, but which bugpoint is likely to introduce. Patch By Nick Johnson! Reviewers: ahatanak, sdardis Differential Revision: https://reviews.llvm.org/D34041 llvm-svn: 305290	2017-06-13 14:11:29 +00:00
Krzysztof Parzyszek	9bd4d91037	[Hexagon] Stop pmpy recognition when shift conversion fails The conversion of shifts from right shifts to left shifts may fail. In such case, the pmpy recognition cannot proceed. llvm-svn: 305289	2017-06-13 13:51:49 +00:00
Oliver Stannard	852fbd2fea	[ARM] Add scheduling classes for VFNM[AS] The VFNM[AS] instructions did not have scheduling information attached, which was causing assertion failures with the Cortex-A57 scheduling model and -fp-contract=fast, because the Cortex-A57 sched model claims to be complete. Differential Revision: https://reviews.llvm.org/D34139 llvm-svn: 305288	2017-06-13 13:04:32 +00:00
Simon Pilgrim	9ff06a0c7e	Strip UTF8 BOM that got added in rL305091 Seems my recent move to VS2017 has resulted in a few text editor issues..... llvm-svn: 305285	2017-06-13 10:17:57 +00:00
Simon Pilgrim	2b3b717768	[X86][SSE] Refactor getTargetConstantBitsFromNode to avoid large APInts (PR32037) Much of PR32037's compile time regression is due to getTargetConstantBitsFromNode always creating large (>64bit) APInts during the bitcasting from the source data to the destination bitwidth. This commit avoids this bitcast stage if the data is already the correct bitwidth. llvm-svn: 305284	2017-06-13 10:13:48 +00:00
NAKAMURA Takumi	3807ab24c6	PPCISelLowering.cpp: Fix warnings in r305214. [-Wdocumentation] llvm-svn: 305277	2017-06-13 07:34:32 +00:00
Craig Topper	8b8767662c	[AVX-512] Mark masked VPCMP instructions as commutable. llvm-svn: 305276	2017-06-13 07:13:50 +00:00
Craig Topper	e1d8103d8f	[AVX-512] Mark masked version of vpcmpeq as being commutable. llvm-svn: 305275	2017-06-13 07:13:47 +00:00
Craig Topper	42d0339257	[X86] Add masked integer compare instructions to load folding tables. llvm-svn: 305274	2017-06-13 07:13:44 +00:00
Sam Clegg	7736855dee	[WebAssembly] Fix symbol type for addresses of external functions These symbols were previously not being marked as functions so were appearing as globals instead, and with the incorrect relocation type. Without this fix, objects that take address of external functions include them as global imports rather than function imports which then fails at link time. Differential Revision: https://reviews.llvm.org/D34068 llvm-svn: 305263	2017-06-13 01:42:21 +00:00
Tom Stellard	ee6e6452df	AMDGPU/GlobalISel: Mark 32-bit G_ADD as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D33992 llvm-svn: 305232	2017-06-12 20:54:56 +00:00
Tim Northover	7a61316e89	AArch64: don't try to emit an add (shifted reg) for SP. The "Add/sub (shifted reg)" instructions use the 31 encoding for xzr and wzr rather than the SP, so we need to use different variants. Situations where this actually comes up are rare enough (see test-case) that I think falling back to DAG is fine. llvm-svn: 305230	2017-06-12 20:49:53 +00:00
Tony Jiang	1a8eec141a	[PowerPC] Match vec_revb builtins to P9 instructions. Power9 has instructions that will reverse the bytes within an element for all sizes (half-word, word, double-word and quad-word). These can be used for the vec_revb builtins in altivec.h. However, we implement these to match vector shuffle nodes as that will cover both the builtins and vector shuffles that occur in the SDAG through other means. Differential Revision: https://reviews.llvm.org/D33690 llvm-svn: 305214	2017-06-12 18:24:36 +00:00
Tony Jiang	30a49d1a3d	[Power9] Added support for the modsw, moduw, modsd, modud hardware instructions. Note that if we need the result of both the divide and the modulo then we compute the modulo based on the result of the divide and not using the new hardware instruction. Commit on behalf of STEFAN PINTILIE. Differential Revision: https://reviews.llvm.org/D33940 llvm-svn: 305210	2017-06-12 17:58:42 +00:00
Matt Arsenault	05c26472fa	AMDGPU: Don't add same implicit use multiple times For the last component, the same register use was added as an implicit use and another implicit kill use. llvm-svn: 305205	2017-06-12 17:19:20 +00:00
Matt Arsenault	d9b77848f2	AMDGPU: Teach isLegalAddressingMode about flat offsets Also fix reporting r+r as a valid addressing mode without offsets. llvm-svn: 305203	2017-06-12 17:06:35 +00:00
Matt Arsenault	db7c6a8731	AMDGPU: Start selecting flat instruction offsets llvm-svn: 305201	2017-06-12 16:53:51 +00:00
Matt Arsenault	89ad17ce4c	AMDGPU: Verify that flat offsets aren't used pre-GFX9 For convenience the operand is always present in the instruction, but it isn't valid to use except on GFX9. llvm-svn: 305200	2017-06-12 16:37:55 +00:00
Haicheng Wu	ef790ffd56	[Falkor] Enable SW Prefetch. SW prefetch is good for Falkor. Differential Revision: http://reviews.llvm.org/D34084 llvm-svn: 305199	2017-06-12 16:34:19 +00:00
Matt Arsenault	fd02314113	AMDGPU: Start adding offset fields to flat instructions llvm-svn: 305194	2017-06-12 15:55:58 +00:00
Sanjay Patel	d4765a38b4	[DAG] add helper to bind memop chains; NFCI This step is just intended to reduce code duplication rather than change any functionality. A follow-up would be to replace PPCTargetLowering::spliceIntoChain() usage with this new helper. Differential Revision: https://reviews.llvm.org/D33649 llvm-svn: 305192	2017-06-12 14:41:48 +00:00
Daniel Neilson	c0112ae8da	Const correctness for TTI::getRegisterBitWidth Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation. Reviewers: chandlerc, rnk, reames Reviewed By: reames Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D33903 llvm-svn: 305189	2017-06-12 14:22:21 +00:00
Simon Pilgrim	b079c8b35b	[X86][SSE] Change memop fragment to inherit from vec128load with local alignment controls First possible step towards merging SSE/AVX memory folding pattern fragments. Also allows us to remove the duplicate non-temporal load logic. Differential Revision: https://reviews.llvm.org/D33902 llvm-svn: 305184	2017-06-12 10:01:27 +00:00
Craig Topper	69fead95c7	[AVX-512] Add VPCONFLICT and VPLZCNT to load folding tables. llvm-svn: 305180	2017-06-12 04:57:31 +00:00
Sanjay Patel	dcbfbb11d9	[x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32-byte load I was looking closer at the x86 test diffs in D33866, and the first change seems like it shouldn't happen in the first place. So this patch will resolve that. Using Agner's tables and AMD docs, vperm2f128 and vinsertf128 have identical timing for any given CPU model, so we should be able to interchange those without affecting perf. But as we can see in some of the diffs here, using vperm2f128 allows load folding, so we should take that opportunity to reduce code size and register pressure. A secondary advantage is making AVX1 and AVX2 codegen more similar. Given that vperm2f128 was introduced with AVX1, we should be selecting it in all of the same situations that we would with AVX2. If there's some reason that an AVX1 CPU would not want to use this instruction, that should be fixed up in a later pass. Differential Revision: https://reviews.llvm.org/D33938 llvm-svn: 305171	2017-06-11 21:18:58 +00:00
Wei Ding	7c3e5115a5	AMDGPU : Fix ISA Version Definitions. Differential Revision: http://reviews.llvm.org/D28531 llvm-svn: 305137	2017-06-10 03:53:19 +00:00
I-Jui (Ray) Sung	21fde385fa	[AArch64] Add fallback in FastISel fp16 conversions Summary: - Fix assertion failures on F16 to/from int types in FastISel by falling back to regular ISel - Add a testcase of various conversion cases with FastISel (-O0) Reviewers: kristof.beyls, jmolloy, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: SjoerdMeijer, llvm-commits, srhines, pirama, aemerson, rengolin, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D33734 llvm-svn: 305127	2017-06-09 22:40:50 +00:00
Stanislav Mekhanoshin	1a61ab8172	[AMDGPU] Add intrinsics for alignbit and alignbyte instructions Differential Revision: https://reviews.llvm.org/D34046 llvm-svn: 305098	2017-06-09 19:03:00 +00:00
Simon Pilgrim	3d37b1a277	[X86][SSE] Add support for PACKSS nodes to faux shuffle extraction If the inputs won't saturate during packing then we can treat the PACKSS as a truncation shuffle llvm-svn: 305091	2017-06-09 17:29:52 +00:00
Krzysztof Parzyszek	7aca2fd830	[Hexagon] Fixes and updates to the selection patterns - Add some missing patterns. - Use C4_cmplte in branch patterns. - Fix signedness of immediate operand in M2_accii. llvm-svn: 305085	2017-06-09 15:26:21 +00:00
Simon Dardis	212cccb2f4	Reland "[SelectionDAG] Enable target specific vector scalarization of calls and returns" By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown, backends can request that LLVM to scalarize vector types for calls and returns. The MIPS vector ABI requires that vector arguments and returns are passed in integer registers. With SelectionDAG's new hooks, the MIPS backend can now handle LLVM-IR with vector types in calls and returns. E.g. 'call @foo(<4 x i32> %4)'. Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for calls and returns if vector types were not legal. If vector types were legal, a single 128bit vector argument would be assigned to a single 32 bit / 64 bit integer register. By teaching the MIPS backend to inspect the original types, it can now implement the MIPS vector ABI which requires a particular method of scalarizing vectors. Previously, the MIPS backend relied on clang to scalarize types such as "call @foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3, i32 inreg %4)". This patch enables the MIPS backend to take either form for vector types. The previous version of this patch had a "conditional move or jump depends on uninitialized value". Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur Differential Revision: https://reviews.llvm.org/D27845 llvm-svn: 305083	2017-06-09 14:37:08 +00:00
David Stuttard	82618baa0f	[AMDGPU] Fix for issue in alloca to vector promotion pass Summary: Alloca promotion pass not dealing with non-canonical input Added some additional checks so the pass simply backs-off forms it can't deal with (non-canonical) Also added some test cases in non-canonical form to check that it no longer crashes Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31710 llvm-svn: 305079	2017-06-09 14:16:22 +00:00
Javed Absar	9e1ff8654f	[ARM] Custom machine-scheduler. NFCI. This patch creates a customised machine-scheduler for ARM targets, so that subsequently DAG mutations etc can be added. Reviewed by: hahn, rengolin, rovka. Differential Revision: https://reviews.llvm.org/D34039 llvm-svn: 305078	2017-06-09 14:07:21 +00:00
Krzysztof Parzyszek	7881415510	[Hexagon] Add LLVM header to HexagonPatterns.td llvm-svn: 305074	2017-06-09 13:30:58 +00:00
Oliver Stannard	ad0973557c	[ARM] Add scheduling info for VFMS The scalar VFMS instructions did not have scheduling information attached (but VFMA did), which was causing assertion failures with the Cortex-A57 scheduling model and -fp-contract=fast. Differential Revision: https://reviews.llvm.org/D34040 llvm-svn: 305064	2017-06-09 09:19:09 +00:00
Stefan Maksimovic	add20f8f17	Test commit: remove whitespace llvm-svn: 305059	2017-06-09 07:57:05 +00:00
Rui Ueyama	365d4d0000	Fix -Wunused-variable. llvm-svn: 305051	2017-06-09 03:26:45 +00:00
Krzysztof Parzyszek	b1ada4e742	[Hexagon] Re-enable machine verifier after codegen passes Remove "false" from the arguments to "addPass" in Hexagon's target pass config. llvm-svn: 305015	2017-06-08 21:25:36 +00:00
Krzysztof Parzyszek	8a7fb0fe51	[Hexagon] Skip mux generation when predicate register is undefined llvm-svn: 305014	2017-06-08 20:56:36 +00:00
Matt Arsenault	f1202e650a	AMDGPU: Work around build special casing .inc files It complains because it assumes these were autogenerated files in the source directory. llvm-svn: 305005	2017-06-08 19:25:21 +00:00
Matt Arsenault	3c7581bbeb	AMDGPU: Use correct register names in inline assembly Fixes using physical registers in inline asm from clang. llvm-svn: 305004	2017-06-08 19:03:20 +00:00
Nirav Dave	6a38cc6d67	[Hexagon] Speedup NumNodesBlocking calculation. NFCI. llvm-svn: 305003	2017-06-08 18:49:25 +00:00
Guozhi Wei	f31c56df2a	[PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64 In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64. This patch fixed PR32442. Differential Revision: https://reviews.llvm.org/D31407 llvm-svn: 305001	2017-06-08 18:27:24 +00:00
Mark Searles	e5c7832311	[AMDGPU] Force qsads instrs to use different dest register than source registers The V_MQSAD_PK_U16_U8, V_QSAD_PK_U16_U8, and V_MQSAD_U32_U8 take more than 1 pass in hardware. For these three instructions, the destination registers must be different than all sources, so that the first pass does not overwrite sources for the following passes. Differential Revision: https://reviews.llvm.org/D33783 llvm-svn: 304998	2017-06-08 18:21:19 +00:00
Zaara Syeda	79acbbe513	[Power9] Exploit vector integer extend instructions This patch adds build vector patterns to exploit the vector integer extend instructions: vextsb2w - Vector Extend Sign Byte To Word vextsb2d - Vector Extend Sign Byte To Doubleword vextsh2w - Vector Extend Sign Halfword To Word vextsh2d - Vector Extend Sign Halfword To Doubleword vextsw2d - Vector Extend Sign Word To Doubleword Differential Revision: https://reviews.llvm.org/D33510 llvm-svn: 304992	2017-06-08 17:14:36 +00:00
Andrew V. Tischenko	8cb1d0931f	Add scheduler classes to integer/float horizontal operations. This patch will close PR32801. Differential Revision: https://reviews.llvm.org/D33203 llvm-svn: 304986	2017-06-08 16:44:13 +00:00

... 2 3 4 5 6 ...

43055 Commits