llvm-project

Commit Graph

Author	SHA1	Message	Date
Sam Clegg	13717bd54b	[WebAssembly] Remove expected failure of builtin-location.C test This seems to have been fixed by https://reviews.llvm.org/D61956 Yay Differential Revision: https://reviews.llvm.org/D62075 llvm-svn: 361071	2019-05-17 19:55:17 +00:00
Simon Pilgrim	065431c82b	[X86][SSE] Fold movmsk(not(x)) -> not(movmsk) Helps to improve folding of comparisons with movmsk results. llvm-svn: 361056	2019-05-17 17:56:25 +00:00
Simon Pilgrim	2c2f8e74b9	[X86][SSE] Match all-of bool scalar reductions into a bitcast/movmsk + cmp. Same as what we do for vector reductions in combineHorizontalPredicateResult, use movmsk+cmp for scalar (and(extract(x,0),extract(x,1)) reduction patterns. llvm-svn: 361052	2019-05-17 17:25:55 +00:00
Dmitry Preobrazhensky	198611b0ff	[AMDGPU][MC] Corrected parsing of NAME:VALUE modifiers See bug 41298: https://bugs.llvm.org/show_bug.cgi?id=41298 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61009 llvm-svn: 361045	2019-05-17 16:04:17 +00:00
Dmitry Preobrazhensky	5ae3113969	[AMDGPU][MC] Enabled labels with s_call_b64 and s_cbranch_i_fork See https://bugs.llvm.org/show_bug.cgi?id=41888 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D62016 llvm-svn: 361040	2019-05-17 14:57:04 +00:00
Simon Pilgrim	279314e81b	[X86][AVX] Remove LowerCTTZ's AVX1 custom vector handling. We can now rely on generic expansion to handle this. llvm-svn: 361038	2019-05-17 14:37:19 +00:00
Simon Pilgrim	62c7032c18	[X86][AVX] isNOT - add extract_subvector(xor X, -1) -> extract_subvector(X) fold. Prep work for the removal of the remaining x86 CTTZ vector lowering. llvm-svn: 361035	2019-05-17 14:04:56 +00:00
Dmitry Preobrazhensky	43fcc79837	[AMDGPU][MC] Enabled expressions for most operands which accept integer values See bug 40873: https://bugs.llvm.org/show_bug.cgi?id=40873 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60768 llvm-svn: 361031	2019-05-17 13:17:48 +00:00
Matt Arsenault	1a02d30c87	AMDGPU: Fix unused variable warnings in release builds llvm-svn: 361030	2019-05-17 12:59:27 +00:00
Matt Arsenault	a510b570c2	AMDGPU/GlobalISel: Legalize G_FCEIL llvm-svn: 361028	2019-05-17 12:20:05 +00:00
Matt Arsenault	6aebcd5499	AMDGPU/GlobalISel: Legalize G_INTRINSIC_TRUNC llvm-svn: 361027	2019-05-17 12:20:01 +00:00
Matt Arsenault	6aafc5e19d	AMDGPU/GlobalISel: Legalize G_FRINT llvm-svn: 361026	2019-05-17 12:19:57 +00:00
Matt Arsenault	1448f5689e	AMDGPU/GlobalISel: Legalize G_FCOPYSIGN llvm-svn: 361025	2019-05-17 12:19:52 +00:00
Matt Arsenault	568f193847	AMDGPU/GlobalISel: RegBankSelect for llvm.amdgcn.s.buffer.load llvm-svn: 361023	2019-05-17 12:02:34 +00:00
Matt Arsenault	a3b5a386fa	AMDGPU/GlobalISel: Use subreg index instead of extra unmerge This saves instructions and extra steps, but I'm not sure about introducing subregister indexes at this point. llvm-svn: 361022	2019-05-17 12:02:31 +00:00
Matt Arsenault	b3dc73634c	AMDGPU/GlobalISel: Use waterfall loop for buffer_load This adds support for more complex waterfall loops that need to handle operands > 32-bits, and multiple operands. llvm-svn: 361021	2019-05-17 12:02:27 +00:00
Simon Pilgrim	a6d3bd486b	[X86] Pull out IsNOT helper. NFCI. Return the input value for the NOT pattern: (xor X, -1) -> X llvm-svn: 361012	2019-05-17 10:37:08 +00:00
Rhys Perry	c4bc61bad7	[AMDGPU] detect WaW hazards when moving/merging load/store instructions Summary: In order to combine memory operations efficiently, the load/store optimizer might move some instructions around. It's usually safe to move instructions down past the merged instruction because the pass checks if memory operations can be re-ordered. Though, the current logic doesn't handle Write-after-Write hazards. This fixes a reflection issue with Monster Hunter World and DXVK. v2: - rebased on top of master - clean up the test case - handle WaW hazards correctly Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130 Original patch by Samuel Pitoiset. Reviewers: tpr, arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: ronlieb, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D61313 llvm-svn: 361008	2019-05-17 09:32:23 +00:00
Cullen Rhodes	7f605c3550	[AArch64][SVE2] Asm: add saturating multiply-add long instructions Summary: Patch adds support for indexed and unpredicated vectors forms of the following instructions: * SQDMLALB, SQDMLALT, SQDMLSLB, SQDMLSLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D61997 llvm-svn: 361005	2019-05-17 09:29:43 +00:00
Cullen Rhodes	334130a199	[AArch64][SVE2] Asm: add integer multiply-add long instructions Summary: Patch adds support for indexed and unpredicated vectors forms of the following instructions: * SMLALB, SMLALT, UMLALB, UMLALT, SMLSLB, SMLSLT, UMLSLB, UMLSLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61951 llvm-svn: 361003	2019-05-17 09:19:41 +00:00
Cullen Rhodes	0d47f00821	[AArch64][SVE2] Asm: add integer multiply long instructions Summary: Patch adds support for indexed and unpredicated vectors forms of the following instructions: * SMULLB, SMULLT, UMULLB, UMULLT, SQDMULLB, SQDMULLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61936 llvm-svn: 361002	2019-05-17 09:04:44 +00:00
Craig Topper	ae1597d360	[X86] Add FeatureFastScalarShiftMasks and FeatureFastVectorShiftMasks to the ignore list for inlining compatibility. These are tuning flags and won't cause any codegen issue if we inline a function with a different value. llvm-svn: 360992	2019-05-17 06:40:21 +00:00
Fangrui Song	ad7199f3e6	[PowerPC] Support .reloc , R_PPC{,64}_NONE, This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. llvm-svn: 360990	2019-05-17 06:04:11 +00:00
Fangrui Song	e18a6ad0b8	[MC][PowerPC] Clean up PPCAsmBackend Replace the member variable Target with Triple Use Triple instead of TheTarget.getName() to dispatch on 32-bit/64-bit. Delete redundant parameters llvm-svn: 360986	2019-05-17 05:44:26 +00:00
Fangrui Song	2463239777	[X86] Support .reloc , R_{386,X86_64}_NONE, This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. See R_MIPS_NONE (D13659), R_ARM_NONE (D61992), R_AARCH64_NONE (D61973) for similar changes. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62014 llvm-svn: 360983	2019-05-17 03:25:39 +00:00
Fangrui Song	aa6102ad8e	[AArch64] Support .reloc , R_AARCH64_NONE, Summary: This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D61973 llvm-svn: 360981	2019-05-17 03:05:07 +00:00
Fangrui Song	43ca0e9eb8	[ARM] Support .reloc , R_ARM_NONE, R_ARM_NONE can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. Add a generic MCFixupKind FK_NONE as this kind of no-op relocation is ubiquitous on ELF and COFF, and probably available on many other binary formats. See D62014. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D61992 llvm-svn: 360980	2019-05-17 02:51:54 +00:00
Jonas Paulsson	9427961c89	[SystemZ] Bugfix in SystemZTargetLowering::combineIntDIVREM() Make sure to not unroll a vector division/remainder (with a constant splat divisor) after type legalization, since the scalar type may then be illegal. Review: Ulrich Weigand https://reviews.llvm.org/D62036 llvm-svn: 360965	2019-05-17 00:50:35 +00:00
David L. Jones	4a5e01faa4	[X86][AsmParser] Add mnemonics missed in r360954. These are valid Jcc, but aren't based on the EFLAGS condition codes (Intel 64 and IA-32 Architetcures Software Developer's Manual Vol. 1, Appendix B). These are covered in clang/test, but not llvm/test. llvm-svn: 360960	2019-05-17 00:19:20 +00:00
David L. Jones	add7ed2281	[X86][AsmParser] Ignore "short" even harder in Intel syntax ASM. In Intel syntax, it's not uncommon to see a "short" modifier on Jcc conditional jumps, which indicates the offset should be a "short jump" (8-bit immediate offset from EIP, -128 to +127). This patch expands to all recognized Jcc condition codes, and removes the inline restriction. Clang already ignores "jmp short" in inline assembly. However, only "jmp" and a couple of Jcc are actually checked, and only inline (i.e., not when using the integrated assembler for asm sources). A quick search through asm-containing libraries at hand shows a pretty broad range of Jcc conditions spelled with "short." GAS ignores the "short" modifier, and instead uses an encoding based on the given immediate. MS inline seems to do the same, and I suspect MASM does, too. NASM will yield an error if presented with an out-of-range immediate value. Example of GCC 9.1 and MSVC v19.20, "jmp short" with offsets that do and do not fit within 8 bits: https://gcc.godbolt.org/z/aFZmjY Differential Revision: https://reviews.llvm.org/D61990 llvm-svn: 360954	2019-05-16 23:27:07 +00:00
David L. Jones	11305984d0	[X86][AsmParser] Rename "ConditionCode" variable to "ConditionPredicate". This better matches the verbiage in Intel documentation, and should help avoid confusion between these two different kinds of values, both of which are parsed from mnemonics. llvm-svn: 360953	2019-05-16 23:27:05 +00:00
Reid Kleckner	08c15df29f	[X86] Deduplicate symbol lowering logic, NFC Summary: This refactors four pieces of code that create SDNodes for references to symbols: - normal global address lowering (LEA, MOV, etc) - callee global address lowering (CALL) - external symbol address lowering (LEA, MOV, etc) - external symbol address lowering (CALL) Each of these pieces of code need to: - classify the reference - lower the symbol - emit a RIP wrapper if needed - emit a load if needed - add offsets if needed I think handling them all in one place will make the code easier to maintain in the future. Reviewers: craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61690 llvm-svn: 360952	2019-05-16 23:15:26 +00:00
Craig Topper	f09b9d419f	[X86] Use 0x9 instead of 0x1 as the immediate in some masked floor pattern. Similarly change 0x2 to 0xA for ceil. This suppresses exceptions which is what we should be doing for ceil and floor. We already use the correct immediate in patterns without masking. llvm-svn: 360915	2019-05-16 16:53:50 +00:00
Matt Arsenault	99e6f4d11a	AMDGPU: Introduce TokenFactor for ABI register copies in call sequence The call was missing chain dependencies on the pre-call copies. I don't think this was causing any real issues however. llvm-svn: 360906	2019-05-16 15:10:27 +00:00
Matt Arsenault	df24c92c0f	AMDGPU: Assume xnack is enabled by default This is the conservatively correct default. It is always safe to assume xnack is enabled, but not the converse. Introduce a feature to blacklist targets where xnack can never be meaningfully enabled. I'm not sure the targets this is applied to is 100% correct. llvm-svn: 360903	2019-05-16 14:48:34 +00:00
Adhemerval Zanella	2d28db6b9f	[AArch64] Handle ISD::LROUND and ISD::LLROUND This patch optimizes ISD::LROUND and ISD::LLROUND to fcvtas instruction. It currently only handles the scalar version. llvm-svn: 360894	2019-05-16 13:30:18 +00:00
Adhemerval Zanella	73643b5041	[CodeGen] Add lround/llround builtins This patch add the ISD::LROUND and ISD::LLROUND along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lround/llround generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. llvm-svn: 360889	2019-05-16 13:15:27 +00:00
Matt Arsenault	a8f88c388f	AMDGPU/GlobalISel: Correct regbank for 1-bit and/or/xor Bool values should use the scc/vcc regbank since r350611. llvm-svn: 360877	2019-05-16 12:06:41 +00:00
Cullen Rhodes	472c6ef8b0	[AArch64][SVE2] Asm: implement CMLA/SQRDCMLAH instructions Summary: This patch adds support for the indexed and unpredicated vectors forms of the CMLA and SQRDCMLAH instructions. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D61906 llvm-svn: 360871	2019-05-16 09:42:22 +00:00
Cullen Rhodes	07eba98dd7	[AArch64][SVE2] Asm: implement CDOT instruction Summary: The complex DOT instructions perform a dot-product on quadtuplets from two source vectors and the resuling wide real or wide imaginary is accumulated into the destination register. The instructions come in two forms: Vector form, e.g. cdot z0.s, z1.b, z2.b, #90 - complex dot product on four 8-bit quad-tuplets, accumulating results in 32-bit elements. The complex numbers in the second source vector are rotated by 90 degrees. cdot z0.d, z1.h, z2.h, #180 - complex dot product on four 16-bit quad-tuplets, accumulating results in 64-bit elements. The complex numbers in the second source vector are rotated by 180 degrees. Indexed form, e.g. cdot z0.s, z1.b, z2.b[3], #0 - complex dot product on four 8-bit quad-tuplets, with specified quadtuplet from second source vector, accumulating results in 32-bit elements. cdot z0.d, z1.h, z2.h[1], #0 - complex dot product on four 16-bit quad-tuplets, with specified quadtuplet from second source vector, accumulating results in 64-bit elements. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer, rovka Differential Revision: https://reviews.llvm.org/D61903 llvm-svn: 360870	2019-05-16 09:33:44 +00:00
Cullen Rhodes	064f6ab556	[AArch64][SVE2] Asm: add unpredicated integer multiply instructions Summary: Add support for the following instructions: * MUL (indexed and unpredicated vectors forms) * SQDMULH (indexed and unpredicated vectors forms) * SQRDMULH (indexed and unpredicated vectors forms) * SMULH (unpredicated, predicated form added in SVE) * UMULH (unpredicated, predicated form added in SVE) * PMUL (unpredicated) The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer, rovka Differential Revision: https://reviews.llvm.org/D61902 llvm-svn: 360867	2019-05-16 09:07:26 +00:00
Fangrui Song	3e92df3e39	Add Triple::isPPC64() llvm-svn: 360864	2019-05-16 08:31:22 +00:00
Craig Topper	e43bdf144c	[X86] Delay creating index register negations during address matching until after we know for sure the match will succeed If we're trying to match an LEA, its possible the LEA match will be deemed unprofitable. In which case the negation we created in matchAddress would be left dangling in the SelectionDAG. This could artificially increase use counts for other nodes in the DAG. Though I don't have an example of that. But it just seems like bad form to have dangling nodes in isel. Differential Revision: https://reviews.llvm.org/D61047 llvm-svn: 360823	2019-05-15 21:59:53 +00:00
Simon Atanasyan	0b0cc23fb6	[mips] Use range-based `for` loops. NFC llvm-svn: 360817	2019-05-15 21:26:25 +00:00
Mandeep Singh Grang	814435fe87	[AArch64] only indicate CFI on Windows if we emitted CFI Summary: Otherwise, we emit directives for CFI without any actual CFI opcodes to go with them, which causes tools to malfunction. The technique is similar to what the x86 backend already does. Fixes https://bugs.llvm.org/show_bug.cgi?id=40876 Patch by: froydnj (Nathan Froyd) Reviewers: mstorsjo, eli.friedman, rnk, mgrang, ssijaric Reviewed By: rnk Subscribers: javed.absar, kristof.beyls, llvm-commits, dmajor Tags: #llvm Differential Revision: https://reviews.llvm.org/D61960 llvm-svn: 360816	2019-05-15 21:23:41 +00:00
Craig Topper	439228727a	[X86] Strengthen type constraints on some specialized X86 ISD opcodes that don't have any flexibility. NFC These particular instructions only operate on 128-bit vectors and have no wider equivalents. And the element size is always known. One could argue that MOVSS/MOVSD could be merged, but that's probably disruptive to code in X86ISelLowering and probably low value. llvm-svn: 360815	2019-05-15 21:16:28 +00:00
Pete Couperus	1ca049959f	Uncomment LLVM_FALLTHROUGH. llvm-svn: 360798	2019-05-15 19:46:17 +00:00
Ryan Taylor	29257eb76c	[AMDGPU] Increases available SGPR for Calling Convention Summary: SGPR in CC can be either hw initialized or set by other chained shaders and so this increases the SGPR count availalbe to CC to 105. Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61261 llvm-svn: 360778	2019-05-15 14:43:55 +00:00
David Green	0582b22f10	[ARM] Don't use the Machine Scheduler for cortex-m at minsize The new cortex-m schedule in rL360768 helps performance, but can increase the amount of high-registers used. This, on average, ends up increasing the codesize by a fair amount (because less instructions are converted from T2 to T1). On cortex-m at -Oz, where we are quite size-paranoid, it is better to use the existing DAG scheduler with the RegPressure scheduling preference (at least until the issues around T2 vs T1 instructions can be improved). I have also made sure that the Sched::RegPressure dag scheduler is always chosen for MinSize. The test shows one case where we increase the number of registers used. Differential Revision: https://reviews.llvm.org/D61882 llvm-svn: 360769	2019-05-15 12:58:02 +00:00
David Green	d2d0f46cd2	[ARM] Cortex-M4 schedule This patch adds a simple Cortex-M4 schedule, renaming the existing M3 schedule to M4 and filling in the latencies as-per the Cortex-M4 TRM: https://developer.arm.com/docs/ddi0439/latest Most of these are 1, with the important exception being loads taking 2 cycles. A few others are also higher, but I don't believe they make a large difference. I've repurposed the M3 schedule as the latencies are mostly the same between the two cores, with the M4 having more FP and DSP instructions. We also turn on MISched and UseAA for the cores that now use this. It also adds some schedule Write's to various instruction to make things simpler. Differential Revision: https://reviews.llvm.org/D54142 llvm-svn: 360768	2019-05-15 12:41:58 +00:00
Simon Atanasyan	4c68c5ae71	[mips] LLVM and GAS now use same instructions for CFA Definition. NFCI LLVM previously used `DW_CFA_def_cfa` instruction in .eh_frame to set the register and offset for current CFA rule. We change it to `DW_CFA_def_cfa_register` which is the same one used by GAS that only changes the register but keeping the old offset. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D61899 llvm-svn: 360765	2019-05-15 12:05:27 +00:00
Craig Topper	384d46c0d5	[X86] Use OR32mi8Locked instead of LOCK_OR32mi8 in emitLockedStackOp. They encode the same way, but OR32mi8Locked sets hasUnmodeledSideEffects set which should be stronger than the mayLoad/mayStore on LOCK_OR32mi8. I think this makes sense since we are using it as a fence. This also seems to hide the operation from the speculative load hardening pass so I've reverted r360511. llvm-svn: 360747	2019-05-15 04:15:46 +00:00
Philip Reames	658cad1287	[NFC] Reuse a helper function to eliminate duplicate code llvm-svn: 360740	2019-05-15 01:39:07 +00:00
Richard Trieu	5f7d4ab5f9	[XCore] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360738	2019-05-15 01:28:30 +00:00
Richard Trieu	0116385452	[X86] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360736	2019-05-15 01:17:58 +00:00
Richard Trieu	c6c421379d	[WebAssembly] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360735	2019-05-15 01:03:00 +00:00
Richard Trieu	1e6f98b89d	[SystemZ] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360734	2019-05-15 00:46:18 +00:00
Richard Trieu	cf82d4a483	[Sparc] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360733	2019-05-15 00:35:37 +00:00
Richard Trieu	51fc56d603	[RISCV] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360732	2019-05-15 00:24:15 +00:00
Richard Trieu	ee6ced196d	[PowerPC] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360731	2019-05-15 00:09:58 +00:00
Richard Trieu	e8f83befd5	[NVPTX] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360729	2019-05-14 23:56:18 +00:00
Richard Trieu	a57ce32eff	[MSP430] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360728	2019-05-14 23:45:18 +00:00
Richard Trieu	313b78150c	[Mips] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360727	2019-05-14 23:34:37 +00:00
Richard Trieu	2e50dc78c5	[Lanai] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360726	2019-05-14 23:17:18 +00:00
Richard Trieu	7ef172998b	[Hexagon] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360724	2019-05-14 23:04:55 +00:00
Richard Trieu	a68ee931e6	[BPF] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360722	2019-05-14 22:54:06 +00:00
Richard Trieu	e982b42003	[AVR] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360721	2019-05-14 22:41:58 +00:00
Philip Reames	445f942fc4	Use an offset from TOS for idempotent rmw locked op lowering This was the portion split off D58632 so that it could follow the redzone API cleanup. Note that I changed the offset preferred from -8 to -64. The difference should be very minor, but I thought it might help address one concern which had been previously raised. Differential Revision: https://reviews.llvm.org/D61862 llvm-svn: 360719	2019-05-14 22:32:42 +00:00
Richard Trieu	f3011b9b10	[ARM] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360718	2019-05-14 22:29:50 +00:00
Richard Trieu	7f9a008a2d	[ARC] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360716	2019-05-14 22:06:04 +00:00
Richard Trieu	8ce2ee9d56	[AMDGPU] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360713	2019-05-14 21:54:37 +00:00
Richard Trieu	b26592e04d	[AArch64] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360709	2019-05-14 21:33:53 +00:00
Dmitry Preobrazhensky	ee51d851ea	[AMDGPU][GFX8][GFX9] Corrected predicate of v_*_co_u32 aliases Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D61905 llvm-svn: 360702	2019-05-14 19:16:24 +00:00
Stanislav Mekhanoshin	05791d90c9	[AMDGPU] Fixed handling of imemdiate i1 literals This bug was exposed by the rL360395. Differential Revision: https://reviews.llvm.org/D61812 llvm-svn: 360689	2019-05-14 16:18:00 +00:00
Tim Renouf	33cb8f5b54	[AMDGPU] Fixed +DumpCode The +DumpCode attribute is a horrible hack in AMDGPU to embed the disassembly of the generated code into the elf file. It is used by LLPC to implement an extension that allows the application to read back the disassembly of the code. Longer term, we should re-implement that by using the LLVM disassembler from the Vulkan driver. Recent LLVM changes broke +DumpCode. With -filetype=asm it crashed, and with -filetype=obj I think it did not include any instructions, only the labels. Fixed with this commit: now it has no effect with -filetype=asm, and works as intended with -filetype=obj. Differential Revision: https://reviews.llvm.org/D60682 Change-Id: I6436d86fe2ea220d74a643a85e64753747c9366b llvm-svn: 360688	2019-05-14 16:17:14 +00:00
Simon Pilgrim	c2d9cfd925	[X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD targets (PR40758) D61068 handled vector shifts, this patch does the same for scalars where there are similar number of pipes for shifts as bit ops - this is true almost entirely for AMD targets where the scalar ALUs are well balanced. This combine avoids AND immediate mask which usually means we reduce encoding size. Some tests show use of (slow, scaled) LEA instead of SHL in some cases, but thats due to particular shift immediates - shift+mask generate these just as easily. Differential Revision: https://reviews.llvm.org/D61830 llvm-svn: 360684	2019-05-14 15:21:28 +00:00
Cullen Rhodes	3b917019a5	[AArch64][SVE2] Asm: add SQRDMLAH/SQRDMLSH instructions Summary: This patch adds support for the indexed and unpredicated vectors forms of the SQRDMLAH and SQRDMLSH instructions. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61515 llvm-svn: 360683	2019-05-14 15:10:16 +00:00
Cullen Rhodes	e029da46e6	[AArch64][SVE2] Asm: add integer multiply-add/subtract (indexed) instructions Summary: This patch adds support for the following instructions: MLA mul-add, writing addend (Zda = Zda + Zn * Zm[idx]) MLS mul-sub, writing addend (Zda = Zda + -Zn * Zm[idx]) Predicated forms of these instructions were added in SVE. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61514 llvm-svn: 360682	2019-05-14 15:01:00 +00:00
Lei Huang	22561972af	[PowerPC] Custom lower known CR bit spills For known CRBit spills, CRSET/CRUNSET, it is more efficient to load and spill the known value instead of extracting the bit. eg. This sequence is currently used to spill a CRUNSET: crclr 4*cr5+lt mfocrf r3,4 rlwinm r3,r3,20,0,0 stw r3,132(r1) This patch custom lower it to: li r3,0 stw r3,132(r1) Differential Revision: https://reviews.llvm.org/D61754 llvm-svn: 360677	2019-05-14 14:27:06 +00:00
Simon Pilgrim	2747ee2c83	[X86] X86TargetLowering::LowerINTRINSIC_WO_CHAIN - ensure rounding control is initialized. NFCI. Fixes scan-build warnings llvm-svn: 360664	2019-05-14 11:30:39 +00:00
Tim Northover	ff6875acd9	AArch64: support binutils-like things on arm64_32. This adds support for the arm64_32 watchOS ABI to LLVM's low level tools, teaching them about the specific MachO choices and constants needed to disassemble things. llvm-svn: 360663	2019-05-14 11:25:44 +00:00
Philip Reames	3098e44daa	[X86] Prefer locked stack op over mfence for seq_cst 64-bit stores on 32-bit targets This is a follow on to D58632, with the same logic. Given a memory operation which needs ordering, but doesn't need to modify any particular address, prefer to use a locked stack op over an mfence. Differential Revision: https://reviews.llvm.org/D61863 llvm-svn: 360649	2019-05-14 04:43:37 +00:00
Sanjay Patel	3a13d970aa	[SDAG, x86] allow targets to override test for binop opcodes This follows the pattern of the existing isCommutativeBinOp(). x86 shows improvements from vector narrowing for the min/max opcodes. llvm-svn: 360639	2019-05-14 00:39:40 +00:00
Craig Topper	e2966473dd	[X86] Use ISD::MERGE_VALUES to return from lowerAtomicArith instead of calling ReplaceAllUsesOfValueWith and returning SDValue(). Returning SDValue() makes the caller think that nothing happened and it will end up executing the Expand path. This generates extra nodes that will need to be pruned as dead code. Returning an ISD::MERGE_VALUES will tell the caller that we'd like to make a change and it will take care of replacing uses. This will prevent falling into the Expand path. llvm-svn: 360627	2019-05-13 22:17:13 +00:00
Craig Topper	5f999c2bea	[X86] Various type corrections to the code that creates LOCK_OR32mi8/OR32mi8Locked to the stack for idempotent atomic rmw and atomic fence. These are updates to match how isel table would emit a LOCK_OR32mi8 node. -Use i32 for the immediate zero even though only 8 bits are encoded. -Use i16 for segment register. -Use LOCK_OR32mi8 for idempotent atomic operations in 32-bit mode to match 64-bit mode. I'm not sure why OR32mi8Locked and LOCK_OR32mi8 both exist. The only difference seems to be that OR32mi8Locked is marked as UnmodeledSideEffects=1. -Emit an extra i32 result for the flags output. I don't know if the types here really matter just noticed it was inconsistent with normal behavior. llvm-svn: 360619	2019-05-13 21:01:24 +00:00
Nikita Popov	323dc634b9	[WebAssembly] Don't assume that zext/sext result is i32/i64 in fast isel (PR41841) Usually this will abort fast-isel at the instruction using the non-legal result, but if the only use is in a different basic block, we'll incorrectly assume that the zext/sext is to i32 (rather than i128 in this case). Differential Revision: https://reviews.llvm.org/D61823 llvm-svn: 360616	2019-05-13 19:40:18 +00:00
Stanislav Mekhanoshin	79b2828b3f	[AMDGPU] Reorder includes per coding standard. NFC. llvm-svn: 360609	2019-05-13 18:05:10 +00:00
Stanislav Mekhanoshin	21088639ae	[AMDGPU] Remove now unused V2FP16_ONE constant def. NFC. llvm-svn: 360608	2019-05-13 17:52:57 +00:00
Robert Lougher	91a9d4ef4b	Revert [X86] Avoid SFB - Fix inconsistent codegen with/without debug info Revert r360436 as it is causing clang-x64-windows-msvc buildbot to fail. llvm-svn: 360606	2019-05-13 17:36:46 +00:00
Nick Desaulniers	c33f754e74	[TargetLowering] Handle multi depth GEPs w/ inline asm constraints Summary: X86TargetLowering::LowerAsmOperandForConstraint had better support than TargetLowering::LowerAsmOperandForConstraint for arbitrary depth getelementpointers for "i", "n", and "s" extended inline assembly constraints. Hoist its support from the derived class into the base class. Link: https://github.com/ClangBuiltLinux/linux/issues/469 Reviewers: echristo, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D61560 llvm-svn: 360604	2019-05-13 17:27:44 +00:00
Simon Pilgrim	73aee29095	[X86][SSE] LowerBuildVectorv4x32 - don't insert MOVQ for undef elts Fixes the regression noted in D61782 where a VZEXT_MOVL was being inserted because we weren't discriminating between 'zeroable' and 'all undef' for the upper elts. Differential Revision: https://reviews.llvm.org/D61782 llvm-svn: 360596	2019-05-13 16:10:11 +00:00
Simon Pilgrim	cf5a8eb7cd	[X86][SSE] Relax use limits for lowerAddSubToHorizontalOp (PR32433) Now that we can use HADD/SUB for scalar additions from any pair of extracted elements (D61263), we can relax the one use limit as we will be able to merge multiple uses into using the same HADD/SUB op. This exposes a couple of missed opportunities in LowerBuildVectorv4x32 which will be committed separately. Differential Revision: https://reviews.llvm.org/D61782 llvm-svn: 360594	2019-05-13 16:02:45 +00:00
Simon Pilgrim	d9aa928603	[X86] Add SimplifyDemandedBits support for PEXTRB/PEXTRW (PR39709) Test case will be included in a followup - its being used but its tricky to show a case that isn't caught at a later stage anyway. llvm-svn: 360588	2019-05-13 15:31:27 +00:00
Cullen Rhodes	6dcef8fc0c	[AArch64][SVE2] Add SVE2 target features to backend and TargetParser Summary: This patch adds the following features defined by Arm SVE2 architecture extension: sve2, sve2-aes, sve2-sm4, sve2-sha3, bitperm For existing CPUs these features are declared as unsupported to prevent scheduler errors. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewers: SjoerdMeijer, sdesmalen, ostannard, rovka Reviewed By: SjoerdMeijer, rovka Subscribers: rovka, javed.absar, tschuett, kristof.beyls, kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61513 llvm-svn: 360573	2019-05-13 10:10:24 +00:00
Ulrich Weigand	8e42f6ddc8	[SystemZ] Model floating-point control register This adds the FPC (floating-point control register) as a reserved physical register and models its use by SystemZ instructions. Note that only the current rounding modes and the IEEE exception masks are modeled. Changes of the FPC due to exceptions (in particular the IEEE exception flags and the DXC) are not modeled. At this point, this patch is mostly NFC, but it will prevent scheduling of floating-point instructions across SPFC/LFPC etc. llvm-svn: 360570	2019-05-13 09:47:26 +00:00
Sam Parker	a33e311a3b	[ARM][ParallelDSP] Relax alias checks When deciding the safety of generating smlad, we checked for any writes within the block that may alias with any of the loads that need to be widened. This is overly conservative because it only matters when there's a potential aliasing write to a location accessed by a pair of loads. Now we check for aliasing writes only once, during setup. If two loads are found to have an aliasing write between them, we don't add these loads to LoadPairs. This means that later during the transform, we can safely widened a pair without worrying about aliasing. However, to maintain correctness, we also need to change the way that wide loads are inserted because the order is now important. The MatchSMLAD method has also been changed, absorbing MatchReductions and AddMACCandidate to hopefully improve readability. Differential Revision: https://reviews.llvm.org/D6102 llvm-svn: 360567	2019-05-13 09:23:32 +00:00
Fangrui Song	f3be557159	[WebAssembly] Add dependency on WebAssemblyDesc to fix BUILD_SHARED_LIBS=on builds after rL360550 This fixes the link error ld.lld: error: undefined symbol: llvm::WebAssembly::anyTypeToString(unsigned int) >>> referenced by WebAssemblyDisassembler.cpp llvm-svn: 360558	2019-05-13 05:51:39 +00:00
Yonghong Song	98fe9c9869	[BPF] emit BTF sections only if debuginfo available Currently, without -g, BTF sections may still be emitted with data sections, e.g., for linux kernel bpf selftest test_tcp_check_syncookie_kern.c issue discovered by Martin as shown below. -bash-4.4$ bpftool btf dump file test_tcp_check_syncookie_kern.o [1] VAR 'results' type_id=0, linkage=global-alloc [2] VAR '_license' type_id=0, linkage=global-alloc [3] DATASEC 'license' size=0 vlen=1 type_id=2 offset=0 size=4 [4] DATASEC 'maps' size=0 vlen=1 type_id=1 offset=0 size=28 Let disable BTF generation if no debuginfo, which is the original design. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61826 llvm-svn: 360556	2019-05-13 05:00:23 +00:00
Craig Topper	61e556d2bd	Recommit r358887 "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" I've included a new fix in X86RegisterInfo to prevent PR41619 without reintroducing r359392. We might be able to improve that in the base class implementation of shouldRewriteCopySrc somehow. But this hopefully enables forward progress on SimplifyDemandedBits improvements for now. Original commit message: This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. llvm-svn: 360552	2019-05-13 04:03:35 +00:00
David L. Jones	a263aa25e1	[WebAssembly] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360550	2019-05-13 03:32:41 +00:00
Simon Pilgrim	a7fc763082	[X86][AVX] Split VZEXT_MOVL ymm/zmm if the upper elements are not demanded. Removes unnecessary vzeroupper noted in D61806 llvm-svn: 360543	2019-05-12 15:16:29 +00:00
Simon Pilgrim	fda6bffd3b	[X86][SSE] SimplifyDemandedBits - call PEXTRB/PEXTRW SimplifyDemandedVectorElts as well. See if we can simplify the demanded vector elts from the extraction before trying to simplify the demanded bits. This helps us with target shuffles and hops in particular. llvm-svn: 360535	2019-05-11 21:35:50 +00:00
Simon Pilgrim	6b10fde69b	[CostModel][X86] Add min/max reduction costs for all SSE targets The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference). I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1. llvm-svn: 360528	2019-05-11 17:12:52 +00:00
Simon Pilgrim	e4c5b6d9bd	[X86][SSE] Add SimplifyDemandedVectorElts HADD/HSUB handling. Still missing PHADDW/PHSUBW tests because PEXTRW doesn't call SimplifyDemandedVectorElts llvm-svn: 360526	2019-05-11 16:07:12 +00:00
Simon Pilgrim	5e0f92acad	FixupLEAPass::fixupIncDec - non-LEA opcodes should not happen here. NFCI. Matches what we do in other functions and fixes scan-build warning about uninitialized NewOpcode variable. llvm-svn: 360525	2019-05-11 16:02:34 +00:00
Craig Topper	c9d7484aa3	[X86] Add CMOV_FR32X/CMOV_FR64X pseudo instructions. Use them in fast isel to fix a machine verifier error after adding test cases. Fast isel picks the FR32X/FR64X register classes when lowering pseudo select, but it didn't have the right opcode to go with it. llvm-svn: 360524	2019-05-11 16:00:28 +00:00
Craig Topper	74a436596d	[X86] Sink some fast isel code into the only if that uses it. NFC llvm-svn: 360523	2019-05-11 16:00:19 +00:00
Craig Topper	26f2b13a65	[X86] Use TLI.getRegClassFor to simplify some more fast isel code. NFCI llvm-svn: 360522	2019-05-11 16:00:13 +00:00
Simon Pilgrim	e7c51137aa	HexagonConstEvaluator::evaluateHexExt - check incoming opcodes. NFCI. Only certain extension opcodes are supported - fixes scan build warning. llvm-svn: 360520	2019-05-11 15:24:34 +00:00
Craig Topper	682cc09675	[X86] Use getRegClassFor to simplify some code in fast isel. NFCI No need to select the register class based on type and features. It should already be setup by X86ISelLowering. llvm-svn: 360513	2019-05-11 05:18:58 +00:00
Craig Topper	31f7adb94f	[X86] Don't emit MOVNTDQA loads from fast-isel without SSE4.1. We were checking for SSE4.1 for FP types, but not integer 128-bit types. Fixes PR41837. llvm-svn: 360512	2019-05-11 04:19:33 +00:00
Craig Topper	bdef12df8d	[X86] Add a test case for idempotent atomic operations with speculative load hardening. Fix an additional issue found by the test. This test covers the fix from r360475 as well. llvm-svn: 360511	2019-05-11 04:00:27 +00:00
Richard Trieu	d0124bd762	[SystemZ] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360510	2019-05-11 03:36:16 +00:00
Richard Trieu	03fe9d82c4	[Sparc] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360506	2019-05-11 02:59:02 +00:00
Richard Trieu	00ecf67045	[RISCV] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure llvm-svn: 360505	2019-05-11 02:43:58 +00:00
Richard Trieu	4bdb136b0f	[PowerPC] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360502	2019-05-11 02:33:18 +00:00
Richard Trieu	4b620fcf0f	[NVPTX] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360500	2019-05-11 02:09:13 +00:00
Richard Trieu	61fb6700a5	[MSP430] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360498	2019-05-11 01:58:52 +00:00
Richard Trieu	fa29bee9d0	[Mips] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360497	2019-05-11 01:38:56 +00:00
Richard Trieu	4c3890ddbf	[Lanai] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360496	2019-05-11 01:25:58 +00:00
Richard Trieu	48803aa65c	[BPF] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360494	2019-05-11 01:13:21 +00:00
Richard Trieu	bf9e67b5b9	[AVR] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360493	2019-05-11 01:03:03 +00:00
Richard Trieu	5e3ee4b84e	[ARM] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360490	2019-05-11 00:34:07 +00:00
Richard Trieu	dcf1ea08e5	[ARC] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360488	2019-05-11 00:13:01 +00:00
Richard Trieu	c0bd7bd481	[AMDGPU] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360487	2019-05-11 00:03:35 +00:00
Richard Trieu	7ba0605511	[AArch64] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360486	2019-05-10 23:50:01 +00:00
Richard Trieu	f48ef2f2ba	[XCore] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360485	2019-05-10 23:36:49 +00:00
Richard Trieu	b28b8b7724	[X86] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360484	2019-05-10 23:24:38 +00:00
Philip Reames	849ef823df	Factor out redzone ABI checks [NFCI] As requested in D58632, cleanup our red zone detection logic in the X86 backend. The existing X86MachineFunctionInfo flag is used to track whether we use the redzone (via a particularly optimization?), but there's no common way to check whether the function has a red zone. I'd appreciate careful review of the uses being updated. I think they are NFC, but a careful eye from someone else would be appreciated. Differential Revision: https://reviews.llvm.org/D61799 llvm-svn: 360479	2019-05-10 22:55:42 +00:00
Craig Topper	df10cc6068	[X86] Disable speculative load hardening for operations with an explicit RSP base. After D58632, we can create idempotent atomic operations to the top of stack. This confused speculative load hardening because it thinks accesses should have virtual register base except for the cases it already excluded. This commit adds a new exclusion for this case. I'll try to reduce a test case for this, but this fix was verified to work by the reporter. This should avoid needing to revert D58632. llvm-svn: 360475	2019-05-10 22:03:33 +00:00
Mircea Trofin	ff3bed0e61	Skip over prefetches Summary: Skip over prefetches when assigning debug info to instructions with memory operands. This way, the debug info is stable after instrumenting a binary with prefetches, allowing for iterative profiling and instrumentation. Reviewers: davidxl Reviewed By: davidxl Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61789 llvm-svn: 360471	2019-05-10 21:27:55 +00:00
Robert Lougher	986b6b86bb	[X86] Avoid SFB - Fix inconsistent codegen with/without debug info Fixes https://bugs.llvm.org/show_bug.cgi?id=40969 The functions findPotentiallyBlockedCopies and buildCopy are currently not accounting for the presence of debug instructions. In the former this results in the optimization not being trigerred, and in the latter results in inconsistent codegen. This patch enables the optimization to be performed in a debug build and ensures the codegen is consistent with non-debug builds. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61680 llvm-svn: 360436	2019-05-10 15:55:06 +00:00
Simon Pilgrim	a0b1518a4a	[X86][SSE] Add getHopForBuildVector vector splitting If we only use the lower xmm of a ymm hop, then extract the xmm's (for free), perform the xmm hop and then insert back into a ymm (for free). Fixes some of the regressions noted in D61782 llvm-svn: 360435	2019-05-10 15:46:04 +00:00
Lei Huang	1ac6e9636c	[PowerPC] custom lower `v2f64 fpext v2f32` Reduces scalarization overhead via custom lowering of v2f64 fpext v2f32. eg. For the following IR %0 = load <2 x float>, <2 x float>* %Ptr, align 8 %1 = fpext <2 x float> %0 to <2 x double> ret <2 x double> %1 Pre custom lowering: ld r3, 0(r3) mtvsrd f0, r3 xxswapd vs34, vs0 xscvspdpn f0, vs0 xxsldwi vs1, vs34, vs34, 3 xscvspdpn f1, vs1 xxmrghd vs34, vs0, vs1 After custom lowering: lfd f0, 0(r3) xxmrghw vs0, vs0, vs0 xvcvspdp vs34, vs0 Differential Revision: https://reviews.llvm.org/D57857 llvm-svn: 360429	2019-05-10 14:04:06 +00:00
Sam Clegg	ea38ac5ba3	[WebAssembly] Don't assume that strongly defined symbols are DSO-local The current PIC model for WebAssembly is more like ELF in that it allows symbol interposition. This means that more functions end up being addressed via the GOT and fewer directly added to the wasm table. One effect is a reduction in the number of wasm table entries similar to the previous attempt in https://reviews.llvm.org/D61539 which was reverted. Differential Revision: https://reviews.llvm.org/D61772 llvm-svn: 360402	2019-05-10 01:52:08 +00:00
Sam Clegg	2147365484	[WebAssembly] Remove friend18.C from list of known gcc torture test failures. NFC. Differential Revision: https://reviews.llvm.org/D61775 llvm-svn: 360401	2019-05-10 01:45:34 +00:00
Mircea Trofin	5c31c05fbd	[llvm] X86DiscriminateMemOps: insert debug info when missing Reviewers: davidxl Reviewed By: davidxl Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61735 llvm-svn: 360396	2019-05-10 00:12:51 +00:00
Stanislav Mekhanoshin	64196850f0	[AMDGPU] Pattern for v_xor3_b32 This also allows three op patterns to use increased constant bus limit of GFX10. Differential Revision: https://reviews.llvm.org/D61763 llvm-svn: 360395	2019-05-10 00:09:01 +00:00
Philip Reames	bd588dfd59	[X86] Improve lowering of idemptotent RMW operations The current lowering uses an mfence. mfences are substaintially higher latency than the locked operations originally requested, but we do want to avoid contention on the original cache line. As such, use a locked instruction on a cache line assumed to be thread local. Differential Revision: https://reviews.llvm.org/D58632 llvm-svn: 360393	2019-05-09 23:23:42 +00:00
Bill Wendling	6ee7f31484	Add ".dword" directive Summary: The ".dword" directive is a synonym for ".xword" and is used used by klibc, a minimalistic libc subset for initramfs. Reviewers: t.p.northover, nickdesaulniers Reviewed By: nickdesaulniers Subscribers: nickdesaulniers, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61719 llvm-svn: 360381	2019-05-09 21:57:44 +00:00
Stanislav Mekhanoshin	a76da34b1d	[AMDGPU] gfx1010 v_interp_* instructions Differential Revision: https://reviews.llvm.org/D61703 llvm-svn: 360364	2019-05-09 18:38:55 +00:00
Simon Pilgrim	93bfa5af48	[X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920) As reported on PR39920, "slow horizontal ops" targets tend to internally expand to 2shuffle+add/sub - so if we can reduce 2shuffle+add/sub to a hadd/sub then we should do it - similar port usage but reduced instruction count. This works out in most cases, although the "PR22377" regression in vector-shuffle-combining.ll is annoying - going from 2shuffle+add+shuffle to hadd+2shuffle - I've opened PR41813 to cover this. Differential Revision: https://reviews.llvm.org/D61308 llvm-svn: 360360	2019-05-09 17:45:01 +00:00
Stanislav Mekhanoshin	4d4c9e0757	[AMDGPU] gfx1010 changes for PAL metadata Differential Revision: https://reviews.llvm.org/D61704 llvm-svn: 360353	2019-05-09 16:34:13 +00:00
Roman Lebedev	9db0e72570	[X86] AMD Piledriver (BdVer2): major cleanup (mainly inverse throughput) I've started this cleanup more several times now, but got sidetracked elsewhere, e.g. by llvm-exegesis problems. Not this time, finally! This is mainly cleaning up the inverse throughput values, and a few latencies/uops, based on the llvm-exegesis measured values. Though this is not complete by any means, there's certainly more cleanup to be done. The performance numbers (i've only checked by RawSpeed benchmark) aren't really surprising - overall this slightly (< -1%) improves perf. llvm-svn: 360341	2019-05-09 13:54:51 +00:00
Sam Parker	d7b650cc72	[ARM][CGP] Guard against signext args and sitofp Add an Argument that has the SExtAttr attached, as well as SIToFP instructions, as values that generate sign bits. SIToFP doesn't strictly do this and could be treated as a sink to be sign-extended. Differential Revision: https://reviews.llvm.org/D61381 llvm-svn: 360331	2019-05-09 11:56:16 +00:00
Diana Picus	3531453371	[ARM GlobalISel] Map DBG_VALUE for types != s32 ...and make sure we fail elegantly for unsupported values. s64 goes into DPR, anything <= 32 into GPR. llvm-svn: 360321	2019-05-09 09:49:36 +00:00
Hans Wennborg	b1b09e5b55	X86WinAllocaExpander: Drop code looking through register copies (PR41786) This code was never covered by tests, in PR41786 it was pointed out that the deletion part doesn't work, and in a full Chrome build I was never able to hit the code path that looks through copies. It seems the situation it's supposed to handle doesn't actually come up in practice. Delete it to simplify the code. Differential revision: https://reviews.llvm.org/D61671 llvm-svn: 360320	2019-05-09 09:22:56 +00:00
Matt Arsenault	462403a5c8	AMDGPU: Mark scheduler classes as final llvm-svn: 360294	2019-05-08 22:10:04 +00:00
Matt Arsenault	01434f9377	AMDGPU: Select VOP3 form of add The VOP3 form should always be the preferred selection, to be shrunk later. This should only be an optimization issue, but this partially works around a problem from clobbering VCC when SIFixSGPRCopies rewrites an SCC defining operation directly to VCC. 3 of the testcases are regressions from failing to fold the immediate in cases it should. These can be avoided by improving the VCC liveness handling in SIFoldOperands. Simply increasing the threshold to computeRegisterLiveness works, although this is common enough that VCC liveness should probably be tracked throughout the pass. The hack of leaving behind an implicit_def instruction to avoid breaking iterator wastes instruction count, which inhibits finding the VCC def in long chains of adds. Doing this however exposes different, worse looking regressions from poor scheduling behavior. This could probably be avoided around by forcing the shrink of the addc here, but the scheduler should probably be fixed. The r600 add test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 360293	2019-05-08 22:09:57 +00:00
Stanislav Mekhanoshin	1dbf721315	[AMDGPU] gfx1010 exp modifications Differential Revision: https://reviews.llvm.org/D61701 llvm-svn: 360287	2019-05-08 21:23:37 +00:00
Changpeng Fang	73b7272e7a	AMDGPU: Fix a mis-placed bracket Differential Revision: https://reviews.llvm.org/D61430 llvm-svn: 360283	2019-05-08 19:46:04 +00:00
Alina Sbirlea	f31eba6494	[MemorySSA] Teach LoopSimplify to preserve MemorySSA. Summary: Preserve MemorySSA in LoopSimplify, in the old pass manager, if the analysis is available. Do not preserve it in the new pass manager. Update tests. Subscribers: nemanjai, jlebar, javed.absar, Prazek, kbarton, zzheng, jsji, llvm-commits, george.burgess.iv, chandlerc Tags: #llvm Differential Revision: https://reviews.llvm.org/D60833 llvm-svn: 360270	2019-05-08 17:05:36 +00:00
Simon Pilgrim	e461e9a77d	[AArch64] Remove scan-build "Value stored during its initialization is never read" warnings. NFCI. llvm-svn: 360268	2019-05-08 16:29:39 +00:00
Simon Pilgrim	12521b2d43	[AArch64] Fix scan-build null/uninitialized pointer warnings. NFCI. llvm-svn: 360267	2019-05-08 16:27:24 +00:00
Simon Pilgrim	e3eec06dde	[AMDGPU] Reapplied BFE canonicalization from D60462 This was committed in rL358887 but reverted in rL360066 due to a x86 regression, really it should be have been pre-committed instead of being part of the SimplifyDemandedBits bitcast patch. llvm-svn: 360263	2019-05-08 15:49:10 +00:00
Simon Pilgrim	ec58090491	[Hexagon] Fix cppcheck reduce variable scope warnings. NFCI. Also fixes a static analyzer "Value stored to 'S2' during its initialization is never read" warning. llvm-svn: 360244	2019-05-08 11:02:46 +00:00
Tim Northover	18adcf331b	ARM: disallow SP as Rn for Thumb2 TST & TEQ instructions Using SP in this position is unpredictable in ARMv7. CMP and CMN are not affected, and of course v8 relaxes this requirement, but that's handled elsewhere. llvm-svn: 360242	2019-05-08 10:59:08 +00:00
Simon Pilgrim	02937dad69	R600InstrInfo.cpp - Add getTransSwizzle assert for the swizzle op index. NFCI. Fixes static analyzer undefined value warning. llvm-svn: 360239	2019-05-08 10:39:56 +00:00
Simon Pilgrim	be9ade93d1	[SIMode] Fix typo in Status constructor As noted in https://www.viva64.com/en/b/0629/ (Snippet No. 36) and the scan-build CI reports (https://llvm.org/reports/scan-build/report-SIModeRegister.cpp-Status-1-1.html#EndPath), rL348754 introduced a typo in the Status constructor due to argument variable names shadowing the member variable names. Differential Revision: https://reviews.llvm.org/D61595 llvm-svn: 360236	2019-05-08 10:24:22 +00:00
Reid Kleckner	6bf108d77a	[COFF] Use COFF stubs for extern_weak functions Summary: A COFF stub indirects the reference to a symbol through memory. A .refptr.$sym global variable pointer is created to refer to $sym. Typically mingw uses these for external global variable declarations, but we can use them for weak function declarations as well. Updates the dso_local classification to add a special case for extern_weak symbols on COFF in both clang and LLVM. Fixes PR37598 Reviewers: smeenai, mstorsjo Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D61615 llvm-svn: 360207	2019-05-07 23:06:21 +00:00
Austin Kerbow	8a3d3a9af6	[AMDGPU] Check MI bundles for hazards Summary: GCNHazardRecognizer fails to identify hazards that are in and around bundles. This patch allows the hazard recognizer to consider bundled instructions in both scheduler and hazard recognizer mode. We ignore “bundledness” for the purpose of detecting hazards and examine the instructions individually. Reviewers: arsenm, msearles, rampitec Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61564 llvm-svn: 360199	2019-05-07 22:12:15 +00:00
Eric Christopher	4727221734	Make sure that the DAG combiner doesn't merge stores that we explicitly asked not be greater than preferred vector width for the vectorizer. Test for both 128 and 256 with a skylake architecture. llvm-svn: 360183	2019-05-07 19:25:34 +00:00
Simon Pilgrim	debb2b2a1e	Fix local shadow variable warning. NFCI. llvm-svn: 360157	2019-05-07 14:56:34 +00:00
Nemanja Ivanovic	b4f028f0f3	[PowerPC] Use the two-constant NR algorithm for refining estimates The single-constant algorithm produces infinities on a lot of denormal values. The precision of the two-constant algorithm is actually sufficient across the range of denormals. We will switch to that algorithm for now to avoid the infinities on denormals. In the future, we will re-evaluate the algorithm to find the optimal one for PowerPC. Differential revision: https://reviews.llvm.org/D60037 llvm-svn: 360144	2019-05-07 13:48:03 +00:00
Diana Picus	0a47fb8884	[ARM GlobalISel] Widen G_SELECT operands ...except for the condition operand. llvm-svn: 360135	2019-05-07 11:39:30 +00:00
Simon Pilgrim	b0f51266b8	[X86][AVX] Fold concat(packus(),packus()) -> packus(concat(),concat()) (PR34773) Basic "revectorization" combine, we can probably do more opcodes here but it can be a tricky cost-benefit depending on where the subvectors came from - but this case helps shuffle combining. llvm-svn: 360134	2019-05-07 11:17:39 +00:00
Simon Pilgrim	a80abeea88	Fixed "Value stored to 'Opc' is never read" warning. NFCI. llvm-svn: 360133	2019-05-07 11:09:16 +00:00
Simon Pilgrim	3c975a0ab5	[X86] Reduce scope of variables where possible. NFCI. Fixes cppcheck warnings. llvm-svn: 360131	2019-05-07 10:50:11 +00:00
Diana Picus	d6d3808fa4	[ARM GlobalISel] Widen G_INTTOPTR/G_PTRTOINT We actually have a couple of G_PTRTOINT to s8 when building clang, so we should do something about them. llvm-svn: 360130	2019-05-07 10:48:01 +00:00
Simon Pilgrim	c5ac14eef8	Fix uninitialized variable warning. NFCI. This also fixes a scan-build "array subscript is undefined" warning. llvm-svn: 360128	2019-05-07 10:30:22 +00:00
Diana Picus	d18bac5d19	[ARM GlobalISel] Widen G_GEP index operand llvm-svn: 360127	2019-05-07 10:11:57 +00:00
Nicolai Haehnle	79ea85c6af	AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operand Summary: No test case because I don't know of a way to trigger this, but I accidentally caused this to fail while working on a different change. Change-Id: I8015aa447fe27163cc4e4902205a203bd44bf7e3 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61490 llvm-svn: 360123	2019-05-07 09:19:09 +00:00
Fangrui Song	da82ce99b7	[DebugInfo] Delete TypedDINodeRef TypedDINodeRef<T> is a redundant wrapper of Metadata * that is actually a T . Accordingly, change DI{Node,Scope,Type}Ref uses to DI{Node,Scope,Type} or their const variants. This allows us to delete many resolve() calls that clutter the code. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D61369 llvm-svn: 360108	2019-05-07 02:06:37 +00:00
Craig Topper	a75630302d	[X86] Use extended vector register classes in getRegForInlineAsmConstraint to support x/y/zmm16-31 when the type is mismatched. The FR32/FR64/VR128/VR256 register classes don't contain the upper 16 registers. For most cases we use the default implementation which will find any register class that contains the register in question if the VT is legal for the register class. But if the VT is i32 or i64, we won't find a matching register class and will instead up in the code modified in this patch. If the requested register is x/y/zmm16-31 we weren't returning a register class that contains those registers and will hit an assertion in the caller. To fix this, I've changed to use the extended register class instead. I don't believe we need a subtarget check to see if avx512 is enabled. The default implementation just pick whatever register class it finds first. I checked and we currently pick FR32X for XMM0 with an f32 type using the default implementation regardless of whether avx512 is enabled. So I assume its it is ok to do the same for i32. Differential Revision: https://reviews.llvm.org/D61457 llvm-svn: 360102	2019-05-06 23:57:42 +00:00
Eli Friedman	2ea088173d	[ARM] Glue register copies to tail calls. This generally follows what other targets do. I don't completely understand why the special case for tail calls existed in the first place; even when the code was committed in r105413, call lowering didn't work in the way described in the comments. Stack protector lowering breaks if the register copies are not glued to a tail call: we have to insert the stack protector check before the tail call, and we choose the location based on the assumption that all physical register dependencies of a tail call are adjacent to the tail call. (See FindSplitPointForStackProtector.) This is sort of fragile, but I don't see any reason to break that assumption. I'm guessing nobody has seen this before just because it's hard to convince the scheduler to actually schedule the code in a way that breaks; even without the glue, the only computation that could actually be scheduled after the register copies is the computation of the call address, and the scheduler usually prefers to schedule that before the copies anyway. Fixes https://bugs.llvm.org/show_bug.cgi?id=41417 Differential Revision: https://reviews.llvm.org/D60427 llvm-svn: 360099	2019-05-06 23:21:59 +00:00
Stanislav Mekhanoshin	491746a584	[AMDGPU] gfx1010 verifier changes Differential Revision: https://reviews.llvm.org/D61521 llvm-svn: 360095	2019-05-06 22:49:45 +00:00
Stanislav Mekhanoshin	971cb8b633	[AMDGPU] gfx1010: prefer V_MUL_LO_U32 over V_MUL_LO_I32 GFX10 deprecates v_mul_lo_i32 instruction, so choose u32 form for all targets. Differential Revision: https://reviews.llvm.org/D61525 llvm-svn: 360094	2019-05-06 22:27:05 +00:00
Stanislav Mekhanoshin	1bc001dec4	[AMDGPU] gfx1010 memory legalizer Differential Revision: https://reviews.llvm.org/D61535 llvm-svn: 360087	2019-05-06 21:57:02 +00:00
Craig Topper	d10a200ceb	[X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing. We require d/q suffixes on the memory form of these instructions to disambiguate the memory size. We don't require it on the register forms, but need to support parsing both with and without it. Previously we always printed the d/q suffix on the register forms, but it's redundant and inconsistent with gcc and objdump. After this patch we should support the d/q for parsing, but not print it when its unneeded. llvm-svn: 360085	2019-05-06 21:39:51 +00:00
Martin Storsjo	899f3cd581	[AArch64] Default to SEH exception handling on MinGW The SEH implementation is pretty mature at this point. Differential Revision: https://reviews.llvm.org/D61590 llvm-svn: 360080	2019-05-06 21:18:15 +00:00
Amara Emerson	3d1128cc9e	[GlobalISel] Handle <1 x T> vector return types properly. After support for dealing with types that need to be extended in some way was added in r358032 we didn't correctly handle <1 x T> return types. These types don't have a GISel direct representation, instead we just see them as scalars. When we need to pad them into <2 x T> types however we need to use a G_BUILD_VECTOR instead of trying to do a G_CONCAT_VECTOR. This fixes PR41738. llvm-svn: 360068	2019-05-06 19:41:01 +00:00
Craig Topper	55a71b575c	Revert r359392 and r358887 Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead" Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list. Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work there. I'll file a separate PR for that and add test cases. Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised if that bug can still be hit independent of that. This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again. llvm-svn: 360066	2019-05-06 19:29:24 +00:00
Guillaume Chatelet	edd69fca3e	Modernize repmovsb implementation of x86 memcpy and allow runtime sizes. Summary: This is a prerequisite to RFC http://lists.llvm.org/pipermail/llvm-dev/2019-April/131973.html Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61593 Fix typo. Turn this patch into an NFC. Addressing comments llvm-svn: 360050	2019-05-06 15:10:19 +00:00
Simon Pilgrim	2a0ef0530b	[X86] Fix uninitialized members in constructor warnings. NFCI. Initialize all member variables in X86ATTInstPrinter and X86DAGToDAGISel constructors to fix cppcheck warning. llvm-svn: 360047	2019-05-06 14:48:02 +00:00
Alexandre Ganea	799d96ec39	Fix compilation warnings when compiling with GCC 7.3 Differential Revision: https://reviews.llvm.org/D61046 llvm-svn: 360044	2019-05-06 13:41:54 +00:00
Nemanja Ivanovic	70afe4f7e1	[PowerPC] Fix erroneous condition for converting uint-to-fp vector conversion A condition for exiting the legalization of v4i32 conversion to v2f64 through extract/convert/build erroneously checks for the extract having type i32. This is not adequate as smaller extracts are actually legalized to i32 as well. Furthermore, an early exit is missing which means that we only check that both extracts are from the same vector if that check fails. As a result, both cases in the included test case fail - the first gets a select error and the second generates incorrect code. The culprit commit is r274535. llvm-svn: 360043	2019-05-06 13:35:49 +00:00
Simon Pilgrim	d672d0e246	X86DAGToDAGISel::tryVPTESTM - fix uninitialized variable warning. NFCI. findBroadcastedOp should always initialize the value if it returns true but static-analyzer isn't great at recognising this. llvm-svn: 360037	2019-05-06 11:52:16 +00:00
Simon Pilgrim	04dad8f66d	[X86] X86InstrInfo::findThreeSrcCommutedOpIndices - fix unread variable warning. scan-build was reporting that CommutableOpIdx1 never used its original initialized value - move it down to where its first used to make the real initialization more obvious (and matches the comment that's there). llvm-svn: 360028	2019-05-06 10:15:34 +00:00
Simon Pilgrim	07d91cd98a	[X86] lowerVectorShuffle - use any_of to detect out of bounds shuffle indices. NFCI. Fixes cppcheck local shadow warning as well. llvm-svn: 360027	2019-05-06 10:11:24 +00:00
Luo, Yuanke	beec41c656	Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake Summary: 1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake; 2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision. VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data. VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data. VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision. For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference Author: LiuTianle Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, RKSimon, spatel Reviewed By: craig.topper Subscribers: kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60550 llvm-svn: 360017	2019-05-06 08:22:37 +00:00
Simon Pilgrim	8462cc3c74	[X86] Pull out repeated Subtarget feature tests. NFCI. Avoids a scan-build "uninitialized value" warning in X86FastISel::X86SelectFPExtOrFPTrunc llvm-svn: 360001	2019-05-05 20:45:20 +00:00
Simon Pilgrim	addc90e4e8	[TTI][X86] Make getAddressComputationCost cost value const. NFCI. llvm-svn: 359999	2019-05-05 20:03:51 +00:00
Simon Pilgrim	5170c0e5fe	Move getOpcode() call into if statement. NFCI. Avoids a cppcheck "Local variable name shadows outer variable" warning. llvm-svn: 359991	2019-05-05 18:34:38 +00:00
Simon Pilgrim	70ee2def90	[X86] Make X86RegisterInfo(const Triple &TT) constructor explicit. Fixes cppcheck warning. llvm-svn: 359981	2019-05-05 12:51:47 +00:00
Simon Pilgrim	cbcd9b1b92	[X86] Fix some cppcheck "Local variable name shadows outer variable" warnings. NFCI. llvm-svn: 359976	2019-05-05 12:00:14 +00:00
Stanislav Mekhanoshin	5ddd564e19	[AMDGPU] Fixed asan error after D61536 llvm-svn: 359963	2019-05-04 06:40:20 +00:00
Stanislav Mekhanoshin	51d1415a16	AMDGPU] gfx1010 hazard recognizer Differential Revision: https://reviews.llvm.org/D61536 llvm-svn: 359961	2019-05-04 04:30:57 +00:00
Stanislav Mekhanoshin	28a1936f6d	[AMDGPU] gfx1010: use fmac instructions Differential Revision: https://reviews.llvm.org/D61527 llvm-svn: 359959	2019-05-04 04:20:37 +00:00
Jessica Paquette	910630c1e4	[AArch64][GlobalISel] Use fcsel instead of csel for G_SELECT on FPRs This saves us some unnecessary copies. If the inputs to a G_SELECT are floating point, we should use fcsel rather than csel. Changes here are... - Teach selectCopy about s1-to-s1 copies across register banks. - AArch64RegisterBankInfo about G_SELECT in general. - Teach the instruction selector about the FCSEL instructions. Also add two tests: - select-select.mir to show that we get the expected FCSEL - regbank-select.mir (unfortunately named) to show the register banks on G_SELECT are properly preserved And update fast-isel-select.ll to show that we do the same thing as other instruction selectors in these cases. llvm-svn: 359940	2019-05-03 22:37:46 +00:00
Stanislav Mekhanoshin	d9dcf392c7	[AMDGPU] gfx1010 wait count insertion Differential Revision: https://reviews.llvm.org/D61534 llvm-svn: 359938	2019-05-03 21:53:53 +00:00
Stanislav Mekhanoshin	41bbe101a2	[AMDGPU] gfx1010 s_code_end generation Also add some missing metadata in the streamer. Differential Revision: https://reviews.llvm.org/D61531 llvm-svn: 359937	2019-05-03 21:26:39 +00:00
Stanislav Mekhanoshin	93f15c922f	[AMDGPU] gfx1010 loop alignment Differential Revision: https://reviews.llvm.org/D61529 llvm-svn: 359935	2019-05-03 21:17:29 +00:00
Mandeep Singh Grang	5dc8aeb26d	[COFF, ARM64] Fix ABI implementation of struct returns Summary: Refer the ABI doc at: https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values Related clang patch: D60349 Reviewers: rnk, efriedma, TomTan, ssijaric Reviewed By: rnk, efriedma Subscribers: mstorsjo, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60348 llvm-svn: 359934	2019-05-03 21:12:36 +00:00
Brian Cain	3428c9daef	[hexagon] change AsmParser assertion to error For immediates that can't be evaluated in assembler-mapped instructions, we should return 'invalid operand' instead of assert. llvm-svn: 359905	2019-05-03 16:50:38 +00:00
Craig Topper	a8f3840c62	[X86] Allow assembly parser to accept x/y/z suffixes on non-memory vfpclassps/pd and on memory forms in intel syntax The x/y/z suffix is needed to disambiguate the memory form in at&t syntax since no xmm/ymm/zmm register is mentioned. But we should also allow it for the register and broadcast forms where its not needed for consistency. This matches gas. The printing code will still only use the suffix for the memory form where it is needed. llvm-svn: 359903	2019-05-03 16:15:15 +00:00
Simon Pilgrim	b323d5ec7c	[X86] LowerToHorizontalOp - Tidyup calls to getHopForBuildVector. NFCI. Merge the if() tests for the various HADD/SUB + Subtarget tests llvm-svn: 359901	2019-05-03 15:56:06 +00:00
Matt Arsenault	657ef48a88	AMDGPU: Select VOP3 form of sub The VOP3 form should always be the preferred selection form to be shrunk later. The r600 sub test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 359899	2019-05-03 15:37:07 +00:00
Matt Arsenault	cfd0ca38b0	AMDGPU: Support shrinking add with FI in SIFoldOperands Avoids test regression in a future patch llvm-svn: 359898	2019-05-03 15:21:53 +00:00
Matt Arsenault	344d68d3c9	AMDGPU: Remove redundant patterns for shifts llvm-svn: 359895	2019-05-03 15:08:36 +00:00
Matt Arsenault	ada33314a2	AMDGPU: Remove redundant patterns for sub There were 2 patterns for sub, one selecting to sub and one to subrev. Only one of these will succeed, so remove the reversed one. llvm-svn: 359894	2019-05-03 15:08:35 +00:00
Matt Arsenault	0446fbe45e	AMDGPU: Replace shrunk instruction with dummy implicit_def This was broken if the original operand was killed. The kill flag would appear on both instructions, and fail the verifier. Keep the kill flag, but remove the operands from the old instruction. This has an added benefit of really reducing the use count for future folds. Ideally the pass would be structured more like what PeepholeOptimizer does to avoid this hack to avoid breaking instruction iterators. llvm-svn: 359891	2019-05-03 14:40:10 +00:00
Simon Pilgrim	bfdd0f75a8	[X86] Remove repeated variables. NFCI. llvm-svn: 359889	2019-05-03 14:37:00 +00:00
Simon Pilgrim	aa49be4926	Avoid cppcheck operator precedence warnings. NFCI. Prefer ((X & Y) ? A : B) to (X & Y ? A : B) llvm-svn: 359884	2019-05-03 13:50:38 +00:00
Matt Arsenault	2c8936fd26	AMDGPU: Fix incorrect commute with sub when folding immediates When a fold of an immediate into a sub/subrev required shrinking the instruction, the wrong VOP2 opcode was used. This was using the VOP2 equivalent of the original instruction, not the commuted instruction with the inverted opcode. llvm-svn: 359883	2019-05-03 13:42:56 +00:00
Simon Pilgrim	a359ef192b	[X86] LowerMULH - remove unused Lo/Hi vector indices. NFCI. Leftover from before we had the extract128BitVector helpers. llvm-svn: 359871	2019-05-03 10:32:07 +00:00
Simon Pilgrim	88f9117168	Reduce variable scope to just the if() block its actually used in. NFCI. llvm-svn: 359869	2019-05-03 10:13:41 +00:00
Craig Topper	d724360695	[X86] Add more one checks to masked compare patterns that were missed in r358358. This covers the patterns we use for widening 128/256 comparisons to 512-bit when AVX512VL isn't supported. llvm-svn: 359863	2019-05-03 07:14:05 +00:00
Eli Friedman	7238353848	[AArch64][MC] Reject "add x0, x1, w2, lsl #1" etc. Looks like just a minor oversight in the parsing code. Fixes https://bugs.llvm.org/show_bug.cgi?id=41504. Differential Revision: https://reviews.llvm.org/D60840 llvm-svn: 359855	2019-05-03 00:59:52 +00:00
Craig Topper	bf29238e1a	[X86] Remove LEA16r references from X86FixupLEAs. NFCI As far as I know, we never emit LEA16r llvm-svn: 359840	2019-05-02 22:46:23 +00:00
Craig Topper	e1e38d4248	[X86] Correct the register class for specific mask register constraints in getRegForInlineAsmConstraint when the VT is a scalar type The default impementation in the base class for TargetLowering::getRegForInlineAsmConstraint doesn't work for mask registers when the VT is a scalar type integer types since the only legal mask types are vXi1. So we end up just getting whatever the first register class that contains the register. Currently this appears to be VK1, but its really dependent on the order tablegen outputs the register classes. Some code in the caller ends up looking up the type for this register class and find v1i1 then generates a copyfromreg from the physical k-register with the v1i1 type. Then it generates an any_extend from v1i1 to the scalar VT which isn't legal. This bad any_extend sticks around until isel where it selects a MOVZX32rr8 with a v1i1 input or maybe a i8 input. Not sure but eventually we pick up a copy from VK1 to GR8 in MachineIR which isn't supported. This leads to a failure in physical register copying. This patch uses the scalar type to find a VK class of the right size. In the attached test case this will be VK16. This causes a bitcast from vk16 to i16 to be generated instead of an any_extend. This will be properly iseled to a VK16 to GR32 copy and a GR32->GR16 extract_subreg. Fixes PR41678 Differential Revision: https://reviews.llvm.org/D61453 llvm-svn: 359837	2019-05-02 22:26:40 +00:00
Evandro Menezes	111df108e6	[AArch64] Update for Exynos Fix the forwarding of multiplication results for Exynos M4. llvm-svn: 359834	2019-05-02 22:01:39 +00:00
Craig Topper	47d8865a38	[X86] Remove string literal from an if. NFC This if used to be an assert that got refactored into an if, but left the string literal behind. Fixes PR41718 llvm-svn: 359833	2019-05-02 21:57:18 +00:00
Sanjay Patel	284472be6d	[SelectionDAG] remove constant folding limitations based on FP exceptions We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops), so it does not make sense to have them here in the DAG either. Nothing else in the backend tries to preserve exceptions (again outside of strict ops), so I don't see how this could have ever worked for real code that cares about FP exceptions. There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least partially) to preserve exceptions without even asking if the target supports FP exceptions. Those should be corrected in subsequent patches. Real support for FP exceptions requires several changes to handle the constrained/strict FP ops. Differential Revision: https://reviews.llvm.org/D61331 llvm-svn: 359791	2019-05-02 14:47:59 +00:00
Simon Pilgrim	df8daf0ef4	[X86][SSE] lowerAddSubToHorizontalOp - enable ymm extraction+fold Limiting scalar hadd/hsub generation to the lowest xmm looks to be unnecessary - we will be extracting one upper xmm whatever, and we can remove a shuffle by using the hop which is inline with what shouldUseHorizontalOp expects to happen anyway. Testing on btver2 (the main target for fast-hops) shows this is beneficial even for float ops where we have a 'shuffle' to extract the float result: https://godbolt.org/z/0R-U-K Differential Revision: https://reviews.llvm.org/D61426 llvm-svn: 359786	2019-05-02 14:00:55 +00:00
Simon Pilgrim	9fa56f7829	[X86][SSE] Move shouldUseHorizontalOp inside isHorizontalBinOp. NFCI. Matches what we do for lowerAddSubToHorizontalOp and will make it easier to peek through subvectors to help fix PR39921 llvm-svn: 359782	2019-05-02 12:18:24 +00:00
Diana Picus	1136ea2d44	[ARM GlobalISel] Fixup r359768 Get rid of local variable used only in assertion. llvm-svn: 359772	2019-05-02 10:08:29 +00:00
Diana Picus	06a61ccc42	[ARM GlobalISel] Select extensions to < 32 bits Select G_SEXT and G_ZEXT with destination types smaller than 32 bits in the exact same way as 32 bits. This overwrites the higher bits, but that should be ok since all legal users of types smaller than 32 bits ignore those bits anyway. llvm-svn: 359768	2019-05-02 09:28:00 +00:00
Diana Picus	53bcf6f2e7	[ARM GlobalISel] Legalize extensions to < 32 bits Make it legal to extend from e.g. s1 to s8 or s16. llvm-svn: 359766	2019-05-02 09:21:46 +00:00
Kang Zhang	1a0d6d6899	[NFC][PowerPC] Return early if the element type is not byte-sized in combineBVOfConsecutiveLoads Summary: Based on the Eli Friedman's comments in https://reviews.llvm.org/D60811 , we'd better return early if the element type is not byte-sized in `combineBVOfConsecutiveLoads`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D61076 llvm-svn: 359764	2019-05-02 08:15:13 +00:00
Stanislav Mekhanoshin	64399da8b8	[AMDGPU] gfx1010 lost VOP2 forms of some add/sub Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32. Differential Revision: llvm-svn: 359757	2019-05-02 04:26:35 +00:00
Stanislav Mekhanoshin	5cf8167735	[AMDGPU] gfx1010 allows VOP3 to have a literal Differential Revision: https://reviews.llvm.org/D61413 llvm-svn: 359756	2019-05-02 04:01:39 +00:00
Stanislav Mekhanoshin	f2baae0abb	[AMDGPU] gfx1010 constant bus limit Constant bus limit has increased to 2 with GFX10. Differential Revision: https://reviews.llvm.org/D61404 llvm-svn: 359754	2019-05-02 03:47:23 +00:00
Craig Topper	b929a0062e	[X86] Remove the redundant suffix in vfpclassp[d,s]'s broadcasting variant The broadcasting variant for instruction vfpclassp[d,s] shouldn't use suffix q/l. So remove them from the template. Patch by Pengfei Wang Differential Revision: https://reviews.llvm.org/D61295 llvm-svn: 359753	2019-05-02 03:25:50 +00:00
Jessica Paquette	a3843fe6f4	[GlobalISel][AArch64] Use fmov for G_FCONSTANT when possible This adds support for using fmov rather than a standard mov to materialize G_FCONSTANT when it's safe to do so. Update arm64-fast-isel-materialize.ll and select-constant.mir to show that the selection is correct. llvm-svn: 359734	2019-05-01 22:39:43 +00:00
Simon Pilgrim	9f04d97cd7	[X86][SSE] Fold scalar horizontal add/sub for non-0/1 element extractions We already perform horizontal add/sub if we extract from elements 0 and 1, this patch extends it to non-0/1 element extraction indices (as long as they are from the lowest 128-bit vector). Differential Revision: https://reviews.llvm.org/D61263 llvm-svn: 359707	2019-05-01 17:13:35 +00:00
Stanislav Mekhanoshin	3b7925f035	[AMDGPU] gfx1010 GCNRegBankReassign pass Reassign registers to reduce register bank conflicts. Differential Revision: https://reviews.llvm.org/D61344 llvm-svn: 359704	2019-05-01 16:49:31 +00:00
Stanislav Mekhanoshin	c29d491596	[AMDGPU] gfx1010 GCNNSAReassign pass Convert NSA into non-NSA images. Differential Revision: https://reviews.llvm.org/D61341 llvm-svn: 359700	2019-05-01 16:40:49 +00:00
Stanislav Mekhanoshin	692560dc98	[AMDGPU] gfx1010 MIMG implementation Differential Revision: https://reviews.llvm.org/D61339 llvm-svn: 359698	2019-05-01 16:32:58 +00:00
Stanislav Mekhanoshin	a224f68a10	[AMDGPU] gfx1010 DS implementation Differential Revision: https://reviews.llvm.org/D61332 llvm-svn: 359696	2019-05-01 16:11:11 +00:00
Simon Pilgrim	f5bdff7747	Fix 80 column violation. NFCI. llvm-svn: 359694	2019-05-01 16:01:49 +00:00
Simon Pilgrim	6711b9699a	[X86][SSE] Add demanded elts support X86ISD::PMULDQ\PMULUDQ Add to SimplifyDemandedVectorEltsForTargetNode and SimplifyDemandedBitsForTargetNode llvm-svn: 359686	2019-05-01 14:50:50 +00:00
Simon Pilgrim	3d6899e369	[X86][SSE] Add SSE vector shift support to SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359680	2019-05-01 13:51:09 +00:00
Simon Pilgrim	ba372c6e62	[X86][SSE] Split 512-bit -> 128-bit vector directly in SimplifyDemandedVectorEltsForTargetNode llvm-svn: 359678	2019-05-01 12:48:42 +00:00
Simon Pilgrim	951a6b4579	[X86][SSE] Add 512-bit vector support to SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359677	2019-05-01 12:37:41 +00:00
Simon Pilgrim	37c2419cc7	[X86][SSE] Add X86ISD::PACKSS\PACKUS to SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359673	2019-05-01 11:29:36 +00:00
Simon Pilgrim	3353cee06c	[X86][SSE] Add X86ISD::UNPCKL\UNPCK to SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359670	2019-05-01 11:08:03 +00:00
Simon Pilgrim	f7b978a71b	[X86][SSE] Move extract_subvector(pshufb) fold to SimplifyDemandedVectorEltsForTargetNode This lets us hit more cases than combineExtractSubvector and allows us reuse more code. llvm-svn: 359669	2019-05-01 10:58:38 +00:00
Simon Pilgrim	a7d107a3e0	[X86] SimplifyDemandedVectorEltsForTargetNode - pull out vector halving code. NFCI. Pull out the HADD/HSUB code to halve vector widths if the upper half isn't used - prep work to adding support for other opcodes. llvm-svn: 359667	2019-05-01 10:38:10 +00:00
Simon Pilgrim	99eefe94b5	[X86][SSE] Extract i1 elements from vXi1 bool vectors This is an alternative to D59669 which more aggressively extracts i1 elements from vXi1 bool vectors using a MOVMSK. Differential Revision: https://reviews.llvm.org/D61189 llvm-svn: 359666	2019-05-01 10:02:22 +00:00
Craig Topper	dd66acef96	[X86FixupLEAs] Hoist the calls to isLEA out of the 3 separate functions and put it in the basic block instruction loop. NFC Now need to check it 3 different times. Just do it once at the top of the loop. llvm-svn: 359658	2019-05-01 06:53:03 +00:00
David L. Jones	fccb505f0f	Revert "[llvm] r359313 - [PowerPC] Update P9 vector costs for insert/extract element" This causes segfaults during optimized builds. More details, including a reproducer, are on the llvm-commits thread for r359313. llvm-svn: 359648	2019-05-01 05:01:03 +00:00
Sam Clegg	6898781d87	[WebAssembly] Update expectations for gcc torture tests This is needed to make the wasm waterfall green again after we land the update to WASI: https://github.com/WebAssembly/waterfall/pull/492 Differential Revision: https://reviews.llvm.org/D61351 llvm-svn: 359634	2019-04-30 23:10:28 +00:00
Stanislav Mekhanoshin	a6322941ff	[AMDGPU] gfx1010 VMEM and SMEM implementation Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621	2019-04-30 22:08:23 +00:00
Simon Pilgrim	07ab4e7db8	[X86][SSE] Fold extract_subvector(extend(x)) -> extend_vector_inreg(x) This adds any extend support - folding to zero_extend_vector_inreg (PMOVZX) for legality Minor improvement for PR39709 llvm-svn: 359608	2019-04-30 20:31:07 +00:00
Dan Gohman	3a7532e645	[WebAssembly] Support f16 libcalls Add support for f16 libcalls in WebAssembly. This entails adding signatures for the remaining F16 libcalls, and renaming gnu_f2h_ieee/gnu_h2f_ieee to truncsfhf2/extendhfsf2 for consistency between f32 and f64/f128 (compiler-rt already supports this). Differential Revision: https://reviews.llvm.org/D61287 Reviewer: dschuff llvm-svn: 359600	2019-04-30 19:17:59 +00:00
Craig Topper	cad318014e	[X86] Remove if that's always true It's been like this since it was added in a refactor of this code. Fixes PR41659 llvm-svn: 359597	2019-04-30 19:02:15 +00:00
Craig Topper	3958719dda	[X86] If PreprocessISelDAG reorders a load before a call, make sure we remove dead nodes from the graph The reordering can leave at least a dead TokenFactor in the graph. This cause the linearize scheduler to fail with something like the assert seen in PR22614. This is only one of many ways we can break the linearize scheduler today so I can't say for sure that any of the other failures in that bug were caused by this issue. This takes the heavy hammer approach of just running RemoveDeadNodes unconditionally at the end of the PreprocessISelDAG. If this turns out to be a compile time hit, we can try to refine it. Differential Revision: https://reviews.llvm.org/D61164 llvm-svn: 359582	2019-04-30 17:56:47 +00:00
Craig Topper	965d1306ae	[X86] Initial cleanups on the FixupLEAs pass. Separate Atom LEA creation from other LEA optimizations. This removes some of the class variables. Merge basic block processing into runOnMachineFunction to keep the flags local. Pass MachineBasicBlock around instead of an iterator. We can get the iterator in the few places that need it. Allows a range-based outer for loop. Separate the Atom optimization from the rest of the optimizations. This allows fixupIncDec to create INC/DEC and still allow Atom to turn it back into LEA when profitable by its heuristics. I'd like to improve fixupIncDec to turn LEAs into ADD any time the base or index register is equal to the destination register. This is profitable regardless of the various slow flags. But again we would want Atom to be able to undo that. Differential Revision: https://reviews.llvm.org/D60993 llvm-svn: 359581	2019-04-30 17:56:28 +00:00
Sjoerd Meijer	ea31ddb36f	[ARM] Implement TTI::getMemcpyCost This implements TargetTransformInfo method getMemcpyCost, which estimates the number of instructions to which a memcpy instruction expands to. Differential Revision: https://reviews.llvm.org/D59787 llvm-svn: 359547	2019-04-30 10:28:50 +00:00
Simon Pilgrim	22641cc194	Fix for bug 41512: lower INSERT_VECTOR_ELT(ZeroVec, 0, Elt) to SCALAR_TO_VECTOR(Elt) for all SSE flavors Current LLVM uses pxor+pinsrb on SSE4+ for INSERT_VECTOR_ELT(ZeroVec, 0, Elt) insead of much simpler movd. INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is idiomatic construct which is used e.g. for _mm_cvtsi32_si128(Elt) and for lowest element initialization in _mm_set_epi32. So such inefficient lowering leads to significant performance digradations in ceratin cases switching from SSSE3 to SSE4. https://bugs.llvm.org/show_bug.cgi?id=41512 Here INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is simply converted to SCALAR_TO_VECTOR(Elt) when applicable since latter is closer match to desired behavior and always efficiently lowered to movd and alike. Committed on behalf of @Serge_Preis (Serge Preis) Differential Revision: https://reviews.llvm.org/D60852 llvm-svn: 359545	2019-04-30 10:18:25 +00:00
Diana Picus	59a4c0481a	[ARM GlobalISel] Widen small shift operands The legalizer was already widening the shift amount. Add tests for that behaviour, and also support widening the shifted value. llvm-svn: 359542	2019-04-30 09:24:43 +00:00
Fangrui Song	7bce25cd7d	[AsmPrinter] Make AsmPrinter::HandlerInfo::Handler a unique_ptr Handlers.clear() in AsmPrinter::doFinalization() will destroy these handlers. A unique_ptr makes the ownership clearer. llvm-svn: 359541	2019-04-30 09:14:02 +00:00
Diana Picus	1e88ac213b	[ARM GlobalISel] Be more careful about bailing out Bail out on function arguments/returns with types aggregating an unsupported type. This fixes cases where we would happily and incorrectly lower functions taking e.g. [1 x i64] parameters, when we don't even support plain i64 yet. llvm-svn: 359540	2019-04-30 09:05:25 +00:00
Sjoerd Meijer	180f1ae57c	[TargetLowering] Change getOptimalMemOpType to take a function attribute list The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537	2019-04-30 08:38:12 +00:00
Dan Gohman	8d6e80f959	[WebAssembly] Make an assertion message prettier. NFC. This is a follow-up to https://reviews.llvm.org/D59521. llvm-svn: 359509	2019-04-29 22:37:08 +00:00
Dan Gohman	8306cb5702	[WebAssembly] Define the signature for __stack_chk_fail The WebAssembly backend needs to know the signatures of all runtime libcall functions. This adds the signature for __stack_chk_fail which was previously missing. Also, make the error message for a missing libcall include the name of the function. Differential Revision: https://reviews.llvm.org/D59521 Reviewed By: sbc100 llvm-svn: 359505	2019-04-29 21:09:44 +00:00
Roland Froese	728e139700	[PowerPC] Try harder to avoid load/move-to VSR for partial vector loads Change the PPCISelLowering.cpp function that decides to avoid update form in favor of partial vector loads to know about newer load types and to not be confused by the chain operand. Differential Revision: https://reviews.llvm.org/D60102 llvm-svn: 359504	2019-04-29 21:08:35 +00:00
Jessica Paquette	7f6fe7c02c	[GlobalISel][AArch64] Select llvm.aarch64.crypto.sha1h This was falling back and gives us a reason to create a selectIntrinsic function which we would need eventually anyway. Update arm64-crypto.ll to show that we correctly select it. Also factor out the code for finding an intrinsic ID. llvm-svn: 359501	2019-04-29 20:58:17 +00:00
Martin Storsjo	c0d138d147	[X86] Run CFIInstrInserter on Windows if Dwarf is used This is necessary since SVN r330706, as tail merging can include CFI instructions since then. This fixes PR40322 and PR40012. Differential Revision: https://reviews.llvm.org/D61252 llvm-svn: 359496	2019-04-29 20:25:51 +00:00
Simon Pilgrim	028485d7b9	[X86][SSE] isHorizontalBinOp - add support for target shuffles Add target shuffle decoding to isHorizontalBinOp as well as ISD::VECTOR_SHUFFLE support. This does mean we can go through bitcasts so we need to bitcast the extracted args to ensure they are the correct type Fixes PR39936 and should help with PR39920/PR39921 Differential Revision: https://reviews.llvm.org/D61245 llvm-svn: 359491	2019-04-29 19:52:59 +00:00
Simon Pilgrim	0a5c2b2449	[X86] scaleShuffleMask - avoid potential signed overflow warning. Use size_t assignment to prevent a bad explicit type conversion warning. Given the typical size of shuffle masks this was never going to happen, but this at least stops the warning. Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359479	2019-04-29 18:32:06 +00:00
Simon Pilgrim	19cde62008	Avoid "checking a pointer after dereferencing" warning. NFCI. Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359473	2019-04-29 17:38:18 +00:00
Simon Pilgrim	6f349d8c39	Move if() to newline to stop ambiguity over whether it should be else if. NFCI. Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359472	2019-04-29 17:34:26 +00:00
Simon Pilgrim	2755b73ba0	Fix operator precedence warning. NFCI. Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359469	2019-04-29 17:04:14 +00:00
Simon Pilgrim	864cf8e274	Remove superfluous break from switch statement. NFCI. Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359467	2019-04-29 16:45:35 +00:00
Cullen Rhodes	2c0d5043a7	[AArch64][SVE] Asm: add aliases for unpredicated bitwise logical instructions This patch adds aliases for element sizes .B/.H/.S to the AND/ORR/EOR/BIC bitwise logical instructions. The assembler now accepts these instructions with all element sizes up to 64-bit (.D). The preferred disassembly is .D. llvm-svn: 359457	2019-04-29 15:27:27 +00:00
Diogo N. Sampaio	d95abb170b	[ARM] Add bitcast/extract_subvec. of fp16 vectors Summary: This patch adds some basic operations for fp16 vectors, such as bitcast from fp16 to i16, required to perform extract_subvector (also added here) and extract_element. Reviewers: SjoerdMeijer, DavidSpickett, t.p.northover, ostannard Reviewed By: ostannard Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60618 llvm-svn: 359433	2019-04-29 10:28:07 +00:00
Diogo N. Sampaio	2078eb745d	[ARM] Add v4f16 and v8f16 types to the CallingConv Summary: The Procedure Call Standard for the Arm Architecture states that float16x4_t and float16x8_t behave just as uint16x4_t and uint16x8_t for argument passing. This patch adds the fp16 vectors to the ARMCallingConv.td file. Reviewers: miyuki, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60720 llvm-svn: 359431	2019-04-29 10:10:37 +00:00
Craig Topper	9202d5f8f1	[X86] Remove some intel syntax aliases on (v)cvtpd2(u)dq, (v)cvtpd2ps, (v)cvt(u)qq2ps. Add 'x' and'y' suffix aliases to masked version of the same in att syntax. The 128/256 bit version of these instructions require an 'x' or 'y' suffix to disambiguate the memory form in att syntax. We were allowing the same suffix in intel syntax, but it appears gas does not do that. gas does allow the 'x' and 'y' suffix on register and broadcast forms even though its not needed. We were allowing it on unmasked register form, but not on masked versions or on masked or unmasked broadcast form. While there fix some test coverage holes so they can be extended with the 'x' and 'y' suffix tests. llvm-svn: 359418	2019-04-29 06:13:41 +00:00
Simon Pilgrim	d5cc753b6d	[X86][SSE] combineExtractVectorElt - add early-out to return zero/undef for out-of-range extraction indices. llvm-svn: 359406	2019-04-28 19:12:58 +00:00
Simon Pilgrim	22d1476bfa	[X86][AVX] Combine non-lane crossing binary shuffles using X86ISD::VPERMV3 Some of the combines might be further improved if we lower more shuffles with X86ISD::VPERMV3 directly, instead of waiting to combine the results. llvm-svn: 359400	2019-04-28 14:31:01 +00:00
Simon Pilgrim	93ad48210c	[X86][SSE] Optimize llvm.experimental.vector.reduce.xor.vXi1 parity reduction (PR38840) An xor reduction of a bool vector can be optimized to a parity check of the MOVMSK/BITCAST'd integer - if the population count is odd return 1, else return 0. Differential Revision: https://reviews.llvm.org/D61230 llvm-svn: 359396	2019-04-28 10:46:17 +00:00
Craig Topper	bd35a30940	[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead Summary: The register form of these instructions are CodeGenOnly instructions that cover GR32->FR32 and GR64->FR64 bitcasts. There is a similar set of instructions for the opposite bitcast. Due to the patterns using bitcasts these instructions get marked as "bitcast" machine instructions as well. The peephole pass is able to look through these as well as other copies to try to avoid register bank copies. Because FR32/FR64/VR128 are all coalescable to each other we can end up in a situation where a GR32->FR32->VR128->FR64->GR64 sequence can be reduced to GR32->GR64 which the copyPhysReg code can't handle. To prevent this, this patch removes one set of the 'bitcast' instructions. So now we can only go GR32->VR128->FR32 or GR64->VR128->FR64. The instruction that converts from GR32/GR64->VR128 has no special significance to the peephole pass and won't be looked through. I guess the other option would be to add support to copyPhysReg to just promote the GR32->GR64 to a GR64->GR64 copy. The upper bits were basically undefined anyway. But removing the CodeGenOnly instruction in favor of one that won't be optimized seemed safer. I deleted the peephole test because it couldn't be made to work with the bitcast instructions removed. The load version of the instructions were unnecessary as the pattern that selects them contains a bitcasted load which should never happen. Fixes PR41619. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61223 llvm-svn: 359392	2019-04-28 06:25:33 +00:00
Simon Pilgrim	03c4e2663c	Revert rL359389: [X86][SSE] Add support for <64 x i1> bool reduction Minor generalization of the existing <32 x i1> pre-AVX2 split code. ........ Causing irregular buildbot failures. llvm-svn: 359391	2019-04-27 20:44:08 +00:00
Simon Pilgrim	4118be3af6	[X86][SSE] Add support for <64 x i1> bool reduction Minor generalization of the existing <32 x i1> pre-AVX2 split code. llvm-svn: 359389	2019-04-27 20:04:44 +00:00
Simon Pilgrim	2a2d422400	[X86][AVX512] Improve vector bool reductions As predicate masks are legal on AVX512 targets, we avoid MOVMSK in these cases, but we can just bitcast the bool vector to the integer equivalent directly - avoiding expansion of the reduction to a shuffle pattern. llvm-svn: 359386	2019-04-27 17:32:46 +00:00
Simon Pilgrim	acc1e6d1c6	[X86][AVX] Merge mask select with shuffles across extract_subvector (PR40332) Fixes PR40332 in the limited case where we're selecting between a target shuffle and a zero vector. We can extend this in the future to handle more opcodes and non-zero selections. llvm-svn: 359378	2019-04-27 13:35:32 +00:00
Craig Topper	063b471ff7	[X86] Use MOVQ for i64 atomic_stores when SSE2 is enabled Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store. Reviewers: spatel, RKSimon, reames, jfb, efriedma Reviewed By: RKSimon Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60546 llvm-svn: 359368	2019-04-27 03:38:15 +00:00
Mark Searles	76c5b62988	Revert "AMDGPU: Split block for si_end_cf" This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4. We discovered some internal test failures, so reverting for now. Differential Revision: https://reviews.llvm.org/D61213 llvm-svn: 359363	2019-04-27 00:51:18 +00:00
Stanislav Mekhanoshin	4f331cb1f3	[AMDGPU] gfx1010 VOPC implementation Differential Revision: https://reviews.llvm.org/D61208 llvm-svn: 359358	2019-04-26 23:16:16 +00:00
Jessica Paquette	76f64b665b	[GlobalISel][AArch64] Use getConstantVRegValWithLookThrough for extracts getConstantVRegValWithLookThrough does the same thing as the getConstantValueForReg function, and has more visibility across GISel. Plus, it supports looking through G_TRUNC, G_SEXT, and G_ZEXT. So, we get better code reuse and more functionality for free by using it. Add some test cases to select-extract-vector-elt.mir to show that we can now look through those instructions. llvm-svn: 359351	2019-04-26 21:53:13 +00:00
Nick Desaulniers	7ab164c4a4	[AsmPrinter] refactor to support %c w/ GlobalAddress' Summary: Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when printing the address of a MachineOperand::MO_GlobalAddress. Move that handling into a new overriden method in each base class. A virtual method was added to the base class for handling the generic case. Refactors a few subclasses to support the target independent %a, %c, and %n. The patch also contains small cleanups for AVRAsmPrinter and SystemZAsmPrinter. It seems that NVPTXTargetLowering is possibly missing some logic to transform GlobalAddressSDNodes for TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended inline assembly asm constraints. Fixes: - https://bugs.llvm.org/show_bug.cgi?id=41402 - https://github.com/ClangBuiltLinux/linux/issues/449 Reviewers: echristo, void Reviewed By: void Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60887 llvm-svn: 359337	2019-04-26 18:45:04 +00:00
Simon Pilgrim	27e01e675c	[X86][AVX] Fold extract_subvector(broadcast(x)) -> broadcast(x) iff x has one use llvm-svn: 359332	2019-04-26 18:02:14 +00:00
Jessica Paquette	67ab9eb193	[AArch64][GlobalISel] Select G_BSWAP for vectors of s32 and s64 There are instructions for these, so mark them as legal. Select the correct instruction in AArch64InstructionSelector.cpp. Update select-bswap.mir and arm64-rev.ll to reflect the changes. llvm-svn: 359331	2019-04-26 18:00:01 +00:00
Stanislav Mekhanoshin	61beff020e	[AMDGPU] gfx1010 VOP3 and VOP3P implementation Differential Revision: https://reviews.llvm.org/D61202 llvm-svn: 359328	2019-04-26 17:56:03 +00:00
Craig Topper	354247c08d	[X86] Sink NoRegister creation for unused Base/Index registers into getAddressOperands. NFCI llvm-svn: 359318	2019-04-26 16:39:38 +00:00
Craig Topper	ad662cf4c1	[X86] Segment registers should have i16 type not i32. Probably doesn't really matter, but was inconsistent with the rest of the code. llvm-svn: 359317	2019-04-26 16:39:35 +00:00
Stanislav Mekhanoshin	8f3da70eed	[AMDGPU] gfx1010 VOP2 changes Differential Revision: https://reviews.llvm.org/D61156 llvm-svn: 359316	2019-04-26 16:37:51 +00:00
Roland Froese	4b17772b9e	[PowerPC] Update P9 vector costs for insert/extract element The PPC vector cost model values for insert/extract element reflect older processors that lacked vector insert/extract and move-to/move-from VSR instructions. Update getVectorInstrCost to give appropriate values for when the newer instructions are present. Differential Revision: https://reviews.llvm.org/D60160 llvm-svn: 359313	2019-04-26 16:14:17 +00:00
Simon Pilgrim	c3a34c3e07	Fix Wparentheses warning. NFCI. llvm-svn: 359299	2019-04-26 12:23:42 +00:00
Simon Pilgrim	bb230c5e79	[X86][SSE] Pull out OR(EXTRACTELT(X,0),OR(EXTRACTELT(X,1),...)) matching code from LowerVectorAllZeroTest Create a matchBitOpReduction helper that checks for the pattern with any opcode. First step towards reusing this code to recognize other scalar reduction patterns. llvm-svn: 359296	2019-04-26 11:45:54 +00:00
Simon Pilgrim	5d6ef94c36	[X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 targets (PR40758) As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs). Differential Revision: https://reviews.llvm.org/D61068 llvm-svn: 359293	2019-04-26 10:49:13 +00:00
Simon Pilgrim	5e161df9f8	[X86][AVX] Combine shuffles extracted from a common vector A small step towards combining shuffles across vector sizes - this recognizes when a shuffle's operands are all extracted from the same larger source and tries to combine to an unary shuffle of that source instead. Fixes one of the test cases from PR34380. Differential Revision: https://reviews.llvm.org/D60512 llvm-svn: 359292	2019-04-26 09:56:14 +00:00
Hans Wennborg	5d5ee4aff7	Fix alignment in AArch64InstructionSelector::emitConstantPoolEntry() The code was using the alignment of a pointer to the value, not the alignment of the constant itself. Maybe we got away with it so far because the pointer alignment is fairly high, but we did end up under-aligning <16 x i8> vectors, which was caught in the Chromium build after lld stopped over-aligning the .rodata.cst16 section in r356428. (See crbug.com/953815) Differential revision: https://reviews.llvm.org/D61124 llvm-svn: 359287	2019-04-26 08:31:00 +00:00
Artem Belevich	5fe85a003f	[CUDA] Implemented _[bi]mma* builtins. These builtins provide access to the new integer and sub-integer variants of MMA (matrix multiply-accumulate) instructions provided by CUDA-10.x on sm_75 (AKA Turing) GPUs. Also added a feature for PTX 6.4. While Clang/LLVM does not generate any PTX instructions that need it, we still need to pass it through to ptxas in order to be able to compile code that uses the new 'mma' instruction as inline assembly (e.g used by NVIDIA's CUTLASS library https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101) Differential Revision: https://reviews.llvm.org/D60279 llvm-svn: 359248	2019-04-25 22:28:09 +00:00
Artem Belevich	16737538f4	PTX 6.3 extends `wmma` instruction to support s8/u8/s4/u4/b1 -> s32. All of the new instructions are still handled mostly by tablegen. I've slightly refactored the code to drive intrinsic/instruction generation from a master list of supported variants, so all irregularities have to be implemented in one place only. The test generation script wmma.py has been refactored in a similar way. Differential Revision: https://reviews.llvm.org/D60015 llvm-svn: 359247	2019-04-25 22:27:57 +00:00
Artem Belevich	8d825b38ed	[NVPTX] generate correct MMA instruction mnemonics with PTX63+. PTX 6.3 requires using ".aligned" in the MMA instruction names. In order to generate correct name, now we pass current PTX version to each instruction as an extra constant operand and InstPrinter adjusts its output accordingly. Differential Revision: https://reviews.llvm.org/D59393 llvm-svn: 359246	2019-04-25 22:27:46 +00:00
Artem Belevich	7ecd82ce19	[NVPTX] Refactor generation of MMA intrinsics and instructions. NFC. Generalized constructions of 'fragments' of MMA operations to provide common primitives for construction of the ops. This will make it easier to add new variants of the instructions that operate on integer types. Use nested foreach loops which makes it possible to better control naming of the intrinsics. This patch does not affect LLVM's output, so there are no test changes. Differential Revision: https://reviews.llvm.org/D59389 llvm-svn: 359245	2019-04-25 22:27:35 +00:00
Stanislav Mekhanoshin	917c477a07	[AMDGPU] gfx1010 - fix ubsan failure Revert DecoderNamespace in one place for now. It will need more changes to properly work. llvm-svn: 359239	2019-04-25 20:39:06 +00:00
David Blaikie	0c4dbf9ecd	Assigning to a local object in a return statement prevents copy elision. NFC. I added a diagnostic along the lines of `-Wpessimizing-move` to detect `return x = y` suppressing copy elision, but I don't know if the diagnostic is really worth it. Anyway, here are the places where my diagnostic reported that copy elision would have been possible if not for the assignment. P1155R1 in the post-San-Diego WG21 (C++ committee) mailing discusses whether WG21 should fix this pitfall by just changing the core language to permit copy elision in cases like these. (Kona update: The bulk of P1155 is proceeding to CWG review, but specifically not the parts that explored the notion of permitting copy-elision in these specific cases.) Reviewed By: dblaikie Author: Arthur O'Dwyer Differential Revision: https://reviews.llvm.org/D54885 llvm-svn: 359236	2019-04-25 20:09:00 +00:00
Jessica Paquette	f54258c888	[GlobalISel][AArch64] Make G_EXTRACT_VECTOR_ELT legal for v8s16s This case was missing before, so we couldn't legalize it. Add it to AArch64LegalizerInfo.cpp and update select-extract-vector-elt.mir. llvm-svn: 359231	2019-04-25 20:00:57 +00:00
Stanislav Mekhanoshin	2c97ff07bf	[AMDGPU] gfx1010 VOP1 instructions Differential Revision: https://reviews.llvm.org/D61099 llvm-svn: 359225	2019-04-25 19:01:51 +00:00
Stanislav Mekhanoshin	956b0be72e	[AMDGPU] gfx1010 utility functions Differential Revision: https://reviews.llvm.org/D61094 llvm-svn: 359224	2019-04-25 18:53:41 +00:00
Jessica Paquette	8184b6e7f6	[GlobalISel][AArch64] Add generic legalization rule for extends This adds a legalization rule for G_ZEXT, G_ANYEXT, and G_SEXT which allows extends whenever the types will fit in registers (or the source is an s1). Update tests. Add GISel checks throughout all of arm64-vabs.ll, where we now select a good portion of the code. Add GISel checks to arm64-subvector-extend.ll, which has a good number of vector extends in it. Differential Revision: https://reviews.llvm.org/D60889 llvm-svn: 359222	2019-04-25 18:42:00 +00:00
Jessica Paquette	ba55767f51	[GlobalISel][AArch64] Legalize G_FNEARBYINT Add legalizer support for G_FNEARBYINT. It's the same as G_FCEIL etc. Since the importer allows us to automatically select this after legalization, also add tests for selection etc. Also update arm64-vfloatintrinsics.ll. llvm-svn: 359204	2019-04-25 16:44:40 +00:00
Simon Pilgrim	0a7d1b3ce1	[X86][SSE] combineBitcastvxi1 - add support for bitcasting to non-scalar integers Truncate the movmsk scalar integer result to the equivalent scalar integer width as before but then bitcast to the requested type. We still have the issue identified in PR41594 but D61114 should handle this. llvm-svn: 359176	2019-04-25 09:34:36 +00:00
Simon Atanasyan	a0291110da	[MIPS] Use custom bitcast lowering to avoid excessive instructions On Mips32r2 bitcast can be expanded to two sw instructions and an ldc1 when using bitcast i64 to double or an sdc1 and two lw instructions when using bitcast double to i64. By introducing custom lowering that uses mtc1/mthc1 we can avoid excessive instructions. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D61069 llvm-svn: 359171	2019-04-25 07:47:28 +00:00
Craig Topper	013503c78d	[X86] Remove part of an if condition that should always be true. The IndexReg will always be non-null at this point. Earlier in the function, if IndexReg was null we set it to CurDAG->getRegister(0, VT) which made it non-null. llvm-svn: 359170	2019-04-25 06:08:02 +00:00
Austin Kerbow	83e52142d1	Fix spelling error. NFC Summary: Test commit. Reviewers: msearles, jkorous Reviewed By: jkorous Subscribers: dexonsmith, arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61093 llvm-svn: 359154	2019-04-24 23:32:21 +00:00
Amy Huang	68c9199493	Recommitting r358783 and r358786 "[MS] Emit S_HEAPALLOCSITE debug info" with fixes for buildbot error (undefined assembler label). Summary: This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug info in codeview. Currently only changes FastISel, so emitting labels still needs to be implemented in SelectionDAG. Reviewers: rnk Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D61083 llvm-svn: 359149	2019-04-24 23:02:48 +00:00
Joerg Sonnenberger	8372b467f1	[PowerPC] Allow using initial-exec TLS with PIC Using initial-exec TLS variables is a reasonable performance optimisation for system libraries. Use the correct PIC mechanism to get hold of the GOT to avoid text relocations. Differential Revision: https://reviews.llvm.org/D61026 llvm-svn: 359146	2019-04-24 22:12:22 +00:00
Sean Fertile	526633deea	Add period at end of comment. llvm-svn: 359144	2019-04-24 21:51:30 +00:00
Craig Topper	6932abee2c	[X86] Attempt to fix use-after-poison from r359121. llvm-svn: 359143	2019-04-24 21:48:24 +00:00
Stanislav Mekhanoshin	9d287358a8	[AMDGPU] gfx1010 SOP instructions Differential Revision: https://reviews.llvm.org/D61080 llvm-svn: 359139	2019-04-24 20:44:34 +00:00
Craig Topper	af194e9380	[X86] Prevent folding a load into an AND if that AND is really a ZEXT_INREG that should use movzx. This can save a 32-bit immediate move. We would shrink the load and fold it if it was non-volatile, but that's trickier to check for. llvm-svn: 359129	2019-04-24 19:28:38 +00:00
Craig Topper	882ca6d484	[X86] Remove dead nodes left after ReplaceAllUsesWith calls during address matching ReplaceAllUsesWith doesn't remove the node that was replaced. So its left around in the graph messing up use counts on other nodes. One thing to note, is that this isn't valid if the node being deleted is the root node of an LEA match that gets rejected. In that case the node needs to stay alive because the isel table walking code would still have a reference to it that its going to try to match next. I don't think that's the case here though because the nodes being deleted here should be "and", "srl", and "zero_extend" none of which can be the root node of an LEA match. Differential Revision: https://reviews.llvm.org/D61048 llvm-svn: 359121	2019-04-24 18:02:07 +00:00
Stanislav Mekhanoshin	33d806a517	[AMDGPU] gfx1010 sgpr register changes Differential Revision: https://reviews.llvm.org/D61045 llvm-svn: 359117	2019-04-24 17:28:30 +00:00
Stanislav Mekhanoshin	cee607e414	[AMDGPU] Add gfx1010 target definitions Differential Revision: https://reviews.llvm.org/D61041 llvm-svn: 359113	2019-04-24 17:03:15 +00:00
Dmitry Preobrazhensky	47621d7c89	[AMDGPU][MC] Parser cleanup and refactoring Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60767 llvm-svn: 359096	2019-04-24 14:06:15 +00:00
Sanjay Patel	b1b3368907	[x86] make sure horizontal op and broadcast types match to simplify (PR41414) If the types don't match, we can't just remove the shuffle. There may be some other opportunity for optimization here, but this should prevent the crashing seen in: https://bugs.llvm.org/show_bug.cgi?id=41414 llvm-svn: 359095	2019-04-24 14:05:08 +00:00
Simon Pilgrim	d30745b2a0	[X86] Add shouldFoldConstantShiftPairToMask override placeholder. NFCI. Prep work toward fixing PR40758 llvm-svn: 359088	2019-04-24 12:34:08 +00:00
Bjorn Pettersson	71e8c6f20f	Add "const" in GetUnderlyingObjects. NFC Summary: Both the input Value pointer and the returned Value pointers in GetUnderlyingObjects are now declared as const. It turned out that all current (in-tree) uses of GetUnderlyingObjects were trivial to update, being satisfied with have those Value pointers declared as const. Actually, in the past several of the users had to use const_cast, just because of ValueTracking not providing a version of GetUnderlyingObjects with "const" Value pointers. With this patch we get rid of those const casts. Reviewers: hfinkel, materi, jkorous Reviewed By: jkorous Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61038 llvm-svn: 359072	2019-04-24 06:55:50 +00:00
Craig Topper	1e413ffa7b	[Mips][CodeGen] Remove MachineFunction::setSubtarget. Change Mips to just copy the subtarget from the MachineFunction instead of recalculating it. Summary: The MachineFunction should have been created with the correct subtarget. As long as there is no way to change it, MipsTargetMachine can just capture it directly from the MachineFunction without calling getSubtargetImpl again. While there, const correct the Subtarget pointer to avoid a const_cast. I believe the Mips16Subtarget and NoMips16Subtarget members are never used, but I'll leave there removal for a separate patch. Reviewers: echristo, atanasyan Reviewed By: atanasyan Subscribers: sdardis, arichardson, hiraditya, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60936 llvm-svn: 359071	2019-04-24 06:48:31 +00:00
Jessica Paquette	4fe7574d5d	[AArch64][GlobalISel] Select G_INTRINSIC_ROUND Add selection support for G_INTRINSIC_ROUND, add a selection test, and add check lines to arm64-vfloatintrinsics.ll and f16-instructions.ll. llvm-svn: 359046	2019-04-23 23:03:03 +00:00
Jessica Paquette	9766bf1854	[AArch64][GlobalISel] Mark G_INTRINSIC_ROUND as a pre-isel floating point opcode Add G_INTRINSIC_ROUND to isPreISelGenericFloatingPointOpcode to ensure that its input and output are assigned the correct register bank. Add a regbankselect test to verify that we get what we expect here. llvm-svn: 359044	2019-04-23 22:47:00 +00:00
Heejin Ahn	b9f282d384	[WebAssembly] Emit br_table for most switch instructions Summary: Always convert switches to br_tables unless there is only one case, which is equivalent to a simple branch. This reduces code size for wasm, and we defer possible jump table optimizations to the VM. Addresses PR41502. Reviewers: kripken, sunfish Subscribers: dschuff, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60966 llvm-svn: 359038	2019-04-23 21:30:30 +00:00
Jessica Paquette	3cc6d1f542	[AArch64][GlobalISel] Legalize G_INTRINSIC_ROUND Add it to the same rule as G_FCEIL etc. Add a legalizer test, and add a missing switch case to AArch64LegalizerInfo.cpp. llvm-svn: 359033	2019-04-23 21:11:57 +00:00
Jessica Paquette	991cb39242	[AArch64][GlobalISel] Actually select G_INTRINSIC_TRUNC Apparently FileCheck wasn't actually matching the fallback check lines in arm64-vfloatintrinsics.ll properly. So, there were selection fallbacks for G_INTRINSIC_TRUNC there. Actually hook it up into AArch64InstructionSelector.cpp and write a proper selection test. I guess I'll figure out the FileCheck magic to make the fallback checks work properly in arm64-vfloatintrinsics.ll. llvm-svn: 359030	2019-04-23 20:46:19 +00:00
Jessica Paquette	ede0b2e695	[AArch64][GlobalISel] Teach regbankselect about G_INTRINSIC_TRUNC Add it to isPreISelGenericFloatingPointOpcode, and add a regbankselect test. Update arm64-vfloatintrinsics.ll now that we can select it. llvm-svn: 359022	2019-04-23 18:20:47 +00:00
Jessica Paquette	56342642a0	[AArch64][GlobalISel] Legalize G_INTRINSIC_TRUNC Same patch as G_FCEIL etc. Add the missing switch case in widenScalar, add G_INTRINSIC_TRUNC to the correct rule in AArch64LegalizerInfo.cpp, and add a test. llvm-svn: 359021	2019-04-23 18:20:44 +00:00
Stanislav Mekhanoshin	c464dddccb	[AMDGPU] Fixed addReg() in SIOptimizeExecMaskingPreRA.cpp The second argument is flags, not subreg. Differential Revision: https://reviews.llvm.org/D61031 llvm-svn: 359017	2019-04-23 17:59:26 +00:00
Jessica Paquette	df5ce782ad	[AArch64][GlobalISel] Legalize G_FMA for more vector types Same as G_FCEIL, G_FABS, etc. Just move it into that rule. Add a legalizer test for G_FMA, which we didn't have before and update arm64-vfloatintrinsics.ll. llvm-svn: 359015	2019-04-23 17:37:56 +00:00
Jessica Paquette	e50e6d2563	[AArch64][GlobalISel] Add G_FMA to isPreISelGenericFloatingPointOpcode Noticed an unnecessary fallback in arm64-vmul caused by this. Also add a regbankselect test for G_FMA. llvm-svn: 359013	2019-04-23 17:17:06 +00:00
Sanjay Patel	12a561fa1b	[x86] use psubus for more vsetcc lowering (PR39859) Circling back to a leftover bit from PR39859: https://bugs.llvm.org/show_bug.cgi?id=39859#c1 ...we have this counter-intuitive (based on the test diffs) opportunity to use 'psubus'. This appears to be the better perf option for both Haswell and Jaguar based on llvm-mca. We already do this transform for the SETULT predicate, so this makes the code more symmetrical too. If we have pminub/pminuw, we prefer those, so this should not affect anything but pre-SSE4.1 subtargets. $ cat before.s movdqa -16(%rip), %xmm2 ## xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768] pxor %xmm0, %xmm2 pcmpgtw -32(%rip), %xmm2 ## xmm2 = [255,255,255,255,255,255,255,255] pand %xmm2, %xmm0 pandn %xmm1, %xmm2 por %xmm2, %xmm0 $ cat after.s movdqa -16(%rip), %xmm2 ## xmm2 = [256,256,256,256,256,256,256,256] psubusw %xmm0, %xmm2 pxor %xmm3, %xmm3 pcmpeqw %xmm2, %xmm3 pand %xmm3, %xmm0 pandn %xmm1, %xmm3 por %xmm3, %xmm0 $ llvm-mca before.s -mcpu=haswell Iterations: 100 Instructions: 600 Total Cycles: 909 Total uOps: 700 Dispatch Width: 4 uOps Per Cycle: 0.77 IPC: 0.66 Block RThroughput: 1.8 $ llvm-mca after.s -mcpu=haswell Iterations: 100 Instructions: 700 Total Cycles: 409 Total uOps: 700 Dispatch Width: 4 uOps Per Cycle: 1.71 IPC: 1.71 Block RThroughput: 1.8 Differential Revision: https://reviews.llvm.org/D60838 llvm-svn: 358999	2019-04-23 15:20:17 +00:00
Joerg Sonnenberger	6e7cc49d5c	[SPARC] Use the correct register set for the "r" asm constraint. 64bit mode must use 64bit registers, otherwise assumptions about the top half of the registers are made. Problem found by Takeshi Nakayama in NetBSD. llvm-svn: 358998	2019-04-23 15:15:33 +00:00
Fangrui Song	efd94c56ba	Use llvm::stable_sort While touching the code, simplify if feasible. llvm-svn: 358996	2019-04-23 14:51:27 +00:00
Lewis Revill	df3cb477a3	[RISCV] Support assembling %tls_{ie,gd}_pcrel_hi modifiers This patch adds support for parsing and assembling the %tls_ie_pcrel_hi and %tls_gd_pcrel_hi modifiers. Differential Revision: https://reviews.llvm.org/D55342 llvm-svn: 358994	2019-04-23 14:46:13 +00:00
Scott Linder	3eed961973	[AMDGPU] Fix hidden argument metadata duplication for V3 Essentially complete a proper rebase of the V3 metadata change over https://reviews.llvm.org/D49096. Minimize the diff between the V2 and V3 variants of the relevant lit tests, and clean up some trailing whitespace. llvm-svn: 358992	2019-04-23 14:31:17 +00:00
Simon Pilgrim	0e4992ce27	[X86] Pull out collectConcatOps helper. NFCI. Create collectConcatOps helper that returns all the subvector ops for CONCAT_VECTORS or a INSERT_SUBVECTOR series. llvm-svn: 358989	2019-04-23 14:07:49 +00:00
Tim Northover	6af366be8a	ARM: disallow add/sub to sp unless Rn is also sp. The manual says that Thumb2 add/sub instructions are only allowed to modify sp if the first source is also sp. This is slightly different from the usual rGPR restriction since it's context-sensitive, so implement it in C++. llvm-svn: 358987	2019-04-23 13:50:13 +00:00
Nicolai Haehnle	7edae4c403	AMDGPU: Fix LCSSA phi lowering in SILowerI1Copies Summary: When an LCSSA phi survives through instruction selection, the pass ends up removing that phi entirely because it is dominated by the logic that does the lanemask merging. This then used to trigger an assertion when processing a dependent phi instruction. Change-Id: Id4949719f8298062fe476a25718acccc109113b6 Reviewers: llvm-commits Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, tpr, dstuttard, rtaylor, arsenm Tags: #llvm Differential Revision: https://reviews.llvm.org/D60999 llvm-svn: 358983	2019-04-23 13:12:52 +00:00
Fedor Sergeev	652168a99b	[CallSite removal] move InlineCost to CallBase usage Converting InlineCost interface and its internals into CallBase usage. Inliners themselves are still not converted. Reviewed By: reames Tags: #llvm Differential Revision: https://reviews.llvm.org/D60636 llvm-svn: 358982	2019-04-23 12:43:27 +00:00
David Green	c519d3c403	[ARM] Update check for CBZ in Ifcvt The check for creating CBZ in constant island pass recently obtained the ability to search backwards to find a Cmp instruction. The code in IfCvt should mirror this to allow more conversions to the smaller form. The common code has been pulled out into a separate function to be shared between the two places. Differential Revision: https://reviews.llvm.org/D60090 llvm-svn: 358977	2019-04-23 12:11:26 +00:00
David Green	2f9eed6265	[ARM] Don't replicate instructions in Ifcvt at minsize Ifcvt can replicate instructions as it converts them to be predicated. This stops that from happening on thumb2 targets at minsize where an extra IT instruction is likely needed. Differential Revision: https://reviews.llvm.org/D60089 llvm-svn: 358974	2019-04-23 11:46:58 +00:00
Simon Pilgrim	e7a68fd93e	Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFCI. llvm-svn: 358969	2019-04-23 11:11:34 +00:00
Javed Absar	1cdc3dbc58	[AArch64] Add support for MTE intrinsics This patch provides intrinsics support for Memory Tagging Extension (MTE), which was introduced with the Armv8.5-a architecture. The intrinsics are described in detail in the latest ACLE Q1 2019 documentation: https://developer.arm.com/docs/101028/latest Reviewed by: David Spickett Differential Revision: https://reviews.llvm.org/D60486 llvm-svn: 358963	2019-04-23 09:39:58 +00:00
Diogo N. Sampaio	2619f399f9	[ARM][FIX] Add missing f16.lane.vldN/vstN lowering Summary: Add missing D and Q lane VLDSTLane lowering for fp16 elements. Reviewers: efriedma, kosarev, SjoerdMeijer, ostannard Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60874 llvm-svn: 358962	2019-04-23 09:36:39 +00:00
Sam Clegg	9da81421b8	[WebAssembly] Bail out of fastisel earlier when computing PIC addresses This change partially reverts https://reviews.llvm.org/D54647 in favor of bailing out during computeAddress instead. This catches the condition earlier and handles more cases. Differential Revision: https://reviews.llvm.org/D60986 llvm-svn: 358948	2019-04-23 03:43:26 +00:00
Sanjay Patel	bf8aacb715	[SelectionDAG] move splat util functions up from x86 lowering This was supposed to be NFC, but the change in SDLoc definitions causes instruction scheduling changes. There's nothing x86-specific in this code, and it can likely be used from DAGCombiner's simplifyVBinOp(). llvm-svn: 358930	2019-04-22 22:43:36 +00:00
Michael Liao	389d5a3474	[AMDGPU] Fix an issue in `op_sel_hi` skipping. Summary: - Only apply packed literal `op_sel_hi` skipping on operands requiring packed literals. Even an instruction is `packed`, it may have operand requiring non-packed literal, such as `v_dot2_f32_f16`. Reviewers: rampitec, arsenm, kzhuravl Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60978 llvm-svn: 358922	2019-04-22 22:05:49 +00:00
Matt Arsenault	f84ce75cd1	AMDGPU: Skip debug instructions in assert These are inserted after branch relaxation, and for some reason it's decided to put them in the long branch expansion block. It's probably not great to rely on the source block address, so this should probably be switched to being PC relative instead of relying on the block address llvm-svn: 358909	2019-04-22 19:14:26 +00:00
Matt Arsenault	2b6f76f05f	AMDGPU/GlobalISel: Fix non-power-of-2 G_EXTRACT sources llvm-svn: 358894	2019-04-22 15:22:46 +00:00
Matt Arsenault	70346d127b	AMDGPU: Fix not checking for copy when looking at copy src Effectively reverts r356956. The check for isFullCopy was excessive, but there still needs to be a check that this is a copy. llvm-svn: 358890	2019-04-22 14:54:39 +00:00
Dmitry Preobrazhensky	e2707f5aac	[AMDGPU][MC] Corrected parsing of SP3 'neg' modifier See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60624 llvm-svn: 358888	2019-04-22 14:35:47 +00:00
Simon Pilgrim	6276ce0142	[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. Differential Revision: https://reviews.llvm.org/D60462 llvm-svn: 358887	2019-04-22 14:04:35 +00:00
Craig Topper	5c43ab337f	[X86] Reject 512-bit types in getRegForInlineAsmConstraint when AVX512 is not enabled. Same for 256 bit and AVX. llvm-svn: 358872	2019-04-22 06:12:02 +00:00
David Green	0d741507f7	[ARM] Rewrite isLegalT2AddressImmediate This does two main things, firstly adding some at least basic addressing modes for i64 types, and secondly treats floats and doubles sensibly when there is no fpu. The floating point change can help codesize in some cases, especially with D60294. Most backends seems to not consider the exact VT in isLegalAddressingMode, instead switching on type size. That is now what this does when the target does not have an fpu (as the float data will be loaded using LDR's). i64's currently use the address range of an LDRD (even though they may be legalised and loaded with an LDR). This is at least better than marking them all as illegal addressing modes. I have not attempted to do much with vectors yet. That will need changing once MVE is added. Differential Revision: https://reviews.llvm.org/D60677 llvm-svn: 358845	2019-04-21 09:54:29 +00:00
Craig Topper	df02beb416	[X86] Add the rounding control operand to the printing for some scalar FMA instructions. llvm-svn: 358844	2019-04-21 07:12:56 +00:00
Craig Topper	63db7e347b	[X86] Don't form masked vfpclass instruction from and+vfpclass unless the fpclass only has a single use. llvm-svn: 358841	2019-04-21 05:18:04 +00:00
Amara Emerson	4286652556	Revert r358800. Breaks Obsequi from the test suite. The last attempt fixed gcc and consumer-typeset, but Obsequi seems to fail with a different issue. llvm-svn: 358829	2019-04-20 21:25:00 +00:00
Craig Topper	3980d1ca6b	[X86] Disable argument copy elision for arguments passed via pointers Summary: If you pass two 1024 bit vectors in IR with AVX2 on Windows 64. Both vectors will be split in four 256 bit pieces. The four pieces of the first argument will be passed indirectly using 4 gprs. The second argument will get passed via pointers in memory. The PartOffsets stored for the second argument are all in terms of its original 1024 bit size. So the PartOffsets for each piece are 32 bytes apart. So if we consider it for copy elision we'll only load an 8 byte pointer, but we'll move the address 32 bytes. The stack object size we create for the first part is probably wrong too. This issue was encountered by ISPC. I'm working on getting a reduce test case, but wanted to go ahead and get feedback on the fix. Reviewers: rnk Reviewed By: rnk Subscribers: dbabokin, llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D60801 llvm-svn: 358817	2019-04-20 15:26:44 +00:00
Nikita Popov	b75c8fc6fb	[X86] Fix stack probing on x32 (PR41477) Fix for https://bugs.llvm.org/show_bug.cgi?id=41477. On the x32 ABI with stack probing a dynamic alloca will result in a WIN_ALLOCA_32 with a 32-bit size. The current implementation tries to copy it into RAX, resulting in a physreg copy error. Fix this by copying to EAX instead. Also fix incorrect opcodes or registers used in subs. llvm-svn: 358807	2019-04-20 07:25:46 +00:00
Craig Topper	4d4b5d952e	[X86] Don't turn (and (shl X, C1), C2) into (shl (and X, (C1 >> C2), C2) if the original AND can represented by MOVZX. The MOVZX doesn't require an immediate to be encoded at all. Though it does use a 2 byte opcode so its the same size as a 1 byte immediate. But it has a separate source and dest register so can help avoid copies. llvm-svn: 358805	2019-04-20 04:38:53 +00:00
Craig Topper	8b8264828c	[X86] Turn (and (anyextend (shl X, C1), C2)) into (shl (and (anyextend X), (C1 >> C2), C2) if the AND could match a movzx. There's one slight regression in here because we don't check that the immediate already allowed movzx before the shift. I'll fix that next. llvm-svn: 358804	2019-04-20 04:38:49 +00:00
Amara Emerson	eac69e9377	Revert "Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores"" We were shifting the wrong component of a split load when trying to combine them back into a single value. llvm-svn: 358800	2019-04-19 23:54:44 +00:00
Jessica Paquette	d5c69e0836	[GlobalISel][AArch64] Legalize + select G_FRINT Exactly the same as G_FCEIL, G_FABS, etc. Add tests for the fp16/nofp16 behaviour, update arm64-vfloatintrinsics, etc. Differential Revision: https://reviews.llvm.org/D60895 llvm-svn: 358799	2019-04-19 23:41:52 +00:00
Sam Clegg	a27252794e	[WebAssembly] FastISel: Don't fallback to SelectionDAG after BuildMI in selectCall My understanding is that once BuildMI has been called we can't fallback to SelectionDAG. This change moves the fallback for when getRegForValue() fails for that target of an indirect call. This was failing in -fPIC mode when the callee is GlobalValue. Add a test case that tickles this. Differential Revision: https://reviews.llvm.org/D60908 llvm-svn: 358793	2019-04-19 22:43:32 +00:00
Eli Friedman	1810339bc3	[AArch64] Fix checks for AArch64MCExpr::VK_SABS flag. VK_SABS is part of the SymLoc bitfield in the variant kind which should be compared for equality, not by checking the VK_SABS bit. As far as I know, the existing code happened to produce the correct results in all cases, so this is just a cleanup. Patch by Stephen Crane. Differential Revision: https://reviews.llvm.org/D60596 llvm-svn: 358788	2019-04-19 21:58:10 +00:00
Amara Emerson	36c5baef49	Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores" This introduces some runtime failures which I'll need to investigate further. llvm-svn: 358771	2019-04-19 17:42:13 +00:00
Jessica Paquette	dfd87f6fa1	[GlobalISel][AArch64] Legalize vector G_FPOW This instruction is legalized in the same way as G_FSIN, G_FCOS, G_FLOG10, etc. Update legalize-pow.mir and arm64-vfloatintrinsics.ll to reflect the change. Differential Revision: https://reviews.llvm.org/D60218 llvm-svn: 358764	2019-04-19 16:28:08 +00:00
Bjorn Pettersson	238c9d6308	[CodeGen] Add "const" to MachineInstr::mayAlias Summary: The basic idea here is to make it possible to use MachineInstr::mayAlias also when the MachineInstr is const (or the "Other" MachineInstr is const). The addition of const in MachineInstr::mayAlias then rippled down to the need for adding const in several other places, such as TargetTransformInfo::getMemOperandWithOffset. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60856 llvm-svn: 358744	2019-04-19 09:08:38 +00:00
Piotr Sobczak	72e2960e52	[AMDGPU] Ignore non-SUnits edges Summary: Ignore edges to non-SUnits (e.g. ExitSU) when checking for low latency instructions. When calling the function isLowLatencyInstruction(), an ExitSU could be on the list of successors, not necessarily a regular SU. In other places in the code there is a check "Succ->NodeNum >= DAGSize" to prevent further processing of ExitSU as "Succ->getInstr()" is NULL in such a case. Also, 8 out of 9 cases of "SUnit *Succ = SuccDep.getSUnit())" has the guard, so it is clearly an omission here. Change-Id: Ica86f0327c7b2e6bcb56958e804ea6c71084663b Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60864 llvm-svn: 358740	2019-04-19 06:19:14 +00:00
Craig Topper	bb769a2946	[X86] Turn (and (shl X, C1), C2) into (shl (and X, (C1 >> C2), C2) if the AND could match a movzx. Could get further improvements by recognizing (i64 and (anyext (i32 shl))). llvm-svn: 358737	2019-04-19 05:48:13 +00:00
Craig Topper	f73caae956	[X86] Make sure we copy the HandleSDNode back to N before executing the default code after the switch in matchAddressRecursively Summary: There are two places where we create a HandleSDNode in address matching in order to handle the case where N is changed by CSE. But if we end up not matching, we fall back to code at the bottom of the switch that really would like N to point to something that wasn't CSEd away. So we should make sure we copy the handle back to N on any paths that can reach that code. This appears to be the true reason we needed to check DELETED_NODE in the negation matching. In pr32329.ll we had two subtracts back to back. We recursed through the first subtract, and onto the second subtract. The second subtract called matchAddressRecursively on its LHS which caused that subtract to CSE. We ultimately failed the match and ended up in the default code. But N was pointing at the old node that had been deleted, but the default code didn't know that and took it as the base register. Then we unwound back to the first subtract and tried to access this bogus base reg requiring the check for deleted node. With this patch we now use the CSE result as the base reg instead. matchAdd has been broken since sometime in 2015 when it was pulled out of the switch into a helper function. The assignment to N at the end was still there, but N was passed by value and not by reference so the update didn't go anywhere. Reviewers: niravd, spatel, RKSimon, bkramer Reviewed By: niravd Subscribers: llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D60843 llvm-svn: 358735	2019-04-19 04:52:21 +00:00
Jessica Paquette	0aa9b453c4	[GlobalISel][AArch64] Legalize/select G_(S/Z/ANY)_EXT for v8s8s This adds legalization for G_SEXT, G_ZEXT, and G_ANYEXT for v8s8s. We were falling back on G_ZEXT in arm64-vabs.ll before, preventing us from selecting the @llvm.aarch64.neon.sabd.v8i8 intrinsic. This adds legalizer support for those 3, which gives us selection via the importer. Update the relevant tests (legalize-ext.mir, select-int-ext.mir) and add a GISel line to arm64-vabs.ll. Differential Revision: https://reviews.llvm.org/D60881 llvm-svn: 358715	2019-04-18 21:15:48 +00:00
Jessica Paquette	3b5119c684	[GlobalISel][AArch64] Legalize v8s8 loads Add legalizer support for loads of v8s8 and update legalize-load-store.mir. Differential Revision: https://reviews.llvm.org/D60877 llvm-svn: 358714	2019-04-18 21:13:58 +00:00
Simon Pilgrim	4171a91e92	[X86] combineVectorTruncationWithPACKUS - remove split/concatenation of mask combineVectorTruncationWithPACKUS is currently splitting the upper bit bit masking into 128-bit subregs and then concatenating them back together. This was originally done to avoid regressions that caused existing subregs to be concatenated to the larger type just for the AND masking before being extracted again. This was fixed by @spatel (notably rL303997 and rL347356). This also lets SimplifyDemandedBits do some further improvements before it hits the recursive depth limit. My only annoyance with this is that we were broadcasting some xmm masks but we seem to have lost them by moving to ymm - but that's a known issue as the logic in lowerBuildVectorAsBroadcast isn't great. Differential Revision: https://reviews.llvm.org/D60375#inline-539623 llvm-svn: 358692	2019-04-18 17:23:09 +00:00
Simon Pilgrim	8f87e53462	[X86][SSE] Lower ICMP EQ(AND(X,C),C) -> SRA(SHL(X,LOG2(C)),BW-1) iff C is power-of-2. This replaces the MOVMSK combine introduced at D52121/rL342326 (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C)) with the more general icmp lowering so it can pick up more cases through bitcasts - notably vXi8 cases which use vXi16 shifts+masks, this patch can remove the mask and use pcmpgtb(0,x) for the sra. Differential Revision: https://reviews.llvm.org/D60625 llvm-svn: 358651	2019-04-18 09:58:59 +00:00
Kang Zhang	009a21d2fd	[PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS() Summary: This issue from the bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41177 When the two operands for BUILD_VECTOR are same, we will get assert error. llvm::SDValue combineBVOfConsecutiveLoads(llvm::SDNode*, llvm::SelectionDAG&): Assertion `!(InputsAreConsecutiveLoads && InputsAreReverseConsecutive) && "The loads cannot be both consecutive and reverse consecutive."' failed. This error caused by the wrong ElemSIze when calling isConsecutiveLS(). We should use `getScalarType().getStoreSize();` to get the ElemSize instread of `getScalarSizeInBits() / 8`. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D60811 llvm-svn: 358644	2019-04-18 07:24:15 +00:00
Tim Renouf	7c55c8d8c3	[AMDGPU] Avoid DAG combining assert with fneg(fadd(A,0)) fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but creating the new fadd folds to just fneg(A). When A has multiple uses, this confuses it and you get an assert. Fixed. Differential Revision: https://reviews.llvm.org/D60633 Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c llvm-svn: 358640	2019-04-18 05:27:01 +00:00
Sanjay Patel	fb363a778f	[x86] try to widen 'shl' as part of LEA formation The test file has pairs of tests that are logically equivalent: https://rise4fun.com/Alive/2zQ %t4 = and i8 %t1, 8 %t5 = zext i8 %t4 to i16 %sh = shl i16 %t5, 2 %t6 = add i16 %sh, %t0 => %t4 = and i8 %t1, 8 %sh2 = shl i8 %t4, 2 %z5 = zext i8 %sh2 to i16 %t6 = add i16 %z5, %t0 ...so if we can fold the shift op into LEA in the 1st pattern, then we should be able to do the same in the 2nd pattern (unnecessary 'movzbl' is a separate bug I think). We don't want to do this any sooner though because that would conflict with generic transforms that try to narrow the width of the shift. Differential Revision: https://reviews.llvm.org/D60789 llvm-svn: 358622	2019-04-17 22:38:51 +00:00
Nick Desaulniers	9609ce2f33	[AsmPrinter] hoist %a output template to base class for ARM+Aarch64 Summary: X86 is quite complicated; so I intend to leave it as is. ARM+Aarch64 do basically the same thing (Aarch64 did not correctly handle immediates, ARM has a test llvm/test/CodeGen/ARM/2009-04-06-AsmModifier.ll that uses %a with an immediate) for a flag that should be target independent anyways. Reviewers: echristo, peter.smith Reviewed By: echristo Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60841 llvm-svn: 358618	2019-04-17 22:21:10 +00:00
Amara Emerson	daf6e66ac5	[GlobalISel] Add legalization support for non-power-2 loads and stores Legalize things like i24 load/store by splitting them into smaller power of 2 operations. This matches how SelectionDAG handles these operations. Differential Revision: https://reviews.llvm.org/D59971 llvm-svn: 358613	2019-04-17 21:30:07 +00:00
Nick Desaulniers	a2077bab40	[AsmPrinter] defer %c to base class for ARM, PPC, and Hexagon. NFC Summary: None of these derived classes do anything that the base class cannot. If we remove these case statements, then the base class can handle them just fine. Reviewers: peter.smith, echristo Reviewed By: echristo Subscribers: nemanjai, javed.absar, eraman, kristof.beyls, hiraditya, kbarton, jsji, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60803 llvm-svn: 358603	2019-04-17 18:22:48 +00:00
Dmitry Preobrazhensky	394d0a1637	[AMDGPU][MC] Corrected handling of "-" before expressions See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60622 llvm-svn: 358596	2019-04-17 16:56:34 +00:00
Rhys Perry	c2814e12e7	AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructions Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches. Reviewers: arsen, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60824 llvm-svn: 358592	2019-04-17 16:31:52 +00:00
Dmitry Preobrazhensky	20d52e3aa2	[AMDGPU][MC] Corrected parsing of registers See bug 41280: https://bugs.llvm.org/show_bug.cgi?id=41280 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60621 llvm-svn: 358581	2019-04-17 14:44:01 +00:00
Tim Renouf	59e8bd3093	[AMDGPU] Flag new raw/struct atomic ops as source of divergence Differential Revision: https://reviews.llvm.org/D60731 Change-Id: I821d93dec8b9cdd247b8172d92fb5e15340a9e7d llvm-svn: 358579	2019-04-17 14:04:31 +00:00
Simon Pilgrim	9daacec816	[CostModel][X86] Add bool anyof/allof reduction costs On pre-AVX512 targets we can use MOVMSK to extract reduced boolean results. This is properly optimized, annoyingly AVX512 isn't and produces code that is almost as bad as the (unchanged) costs suggest...... Differential Revision: https://reviews.llvm.org/D60403 llvm-svn: 358574	2019-04-17 10:58:19 +00:00
Craig Topper	6bf0802738	[X86] In CopyToFromAsymmetricReg, use VR128 instead of FR32 instructions for GR32<->XMM register copies. We have two versions of some instructions, VR128 versions and FR32 versions that are marked as CodeGenOnly. This change switches to using the VR128 versions for these copies. It's after register allocation so the class size no longer matters. This matches how GR64 works. llvm-svn: 358555	2019-04-17 06:09:11 +00:00
Nick Desaulniers	3271ca01fe	[NVPTXAsmPrinter] clean up dead code. NFC Summary: The printOperand function takes a default parameter, for which there are zero call sites that explicitly pass such a parameter. As such, there is no case to support. This means that the method printVecModifiedImmediate is purly dead code, and can be removed. The eventual goal for some of these AsmPrinter refactoring is to have printOperand be a virtual method; making it easier to print operands from the base class for more generic Asm printing. It will help if all printOperand methods have the same function signature (ie. no Modifier argument when not needed). Reviewers: echristo, tra Reviewed By: echristo Subscribers: jholewinski, hiraditya, llvm-commits, craig.topper, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60727 llvm-svn: 358527	2019-04-16 21:04:34 +00:00
Simon Pilgrim	e5573f4f4e	[TargetLowering] Rename preferShiftsToClearExtremeBits and shouldFoldShiftPairToMask (PR41359) As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious. shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair llvm-svn: 358526	2019-04-16 20:57:28 +00:00
Simon Pilgrim	d769bb1e58	[X86][AVX] X86ISD::PERMV/PERMV3 node types can never fold index ops Improves codegen demonstrated by D60512 - instructions represented by X86ISD::PERMV/PERMV3 can never memory fold the operand used for their index register. This patch updates the 'isUseOfShuffle' helper into the more capable 'isFoldableUseOfShuffle' that recognises that the op is used for a X86ISD::PERMV/PERMV3 index mask and can't be folded - allowing us to use broadcast/subvector-broadcast ops to reduce the size of the mask constant pool data. Differential Revision: https://reviews.llvm.org/D60562 llvm-svn: 358516	2019-04-16 19:18:53 +00:00
Krzysztof Parzyszek	ef6823ec8d	[Hexagon] Remove indeterministic traversal order Patch by Sergei Larin. llvm-svn: 358505	2019-04-16 16:05:07 +00:00
Luis Marques	20d2424016	[RISCV] Custom lower SHL_PARTS, SRA_PARTS, SRL_PARTS When not optimizing for minimum size (-Oz) we custom lower wide shifts (SHL_PARTS, SRA_PARTS, SRL_PARTS) instead of expanding to a libcall. Differential Revision: https://reviews.llvm.org/D59477 llvm-svn: 358498	2019-04-16 14:38:32 +00:00
Craig Topper	0495f29e42	[X86] Limit the 'x' inline assembly constraint to zmm0-15 when used for a 512 type. The 'v' constraint is used to select zmm0-31. This makes 512 bit consistent with 128/256-bit.a llvm-svn: 358450	2019-04-15 21:06:32 +00:00
Matt Arsenault	101abd219b	AMDGPU: Fix unreachable when counting register usage of SGPR96 llvm-svn: 358447	2019-04-15 20:51:12 +00:00
Matt Arsenault	fbdd2a1887	AMDGPU: Fix printed format of SReg_96 These are artificial, so I think this should only come up with inline asm comments. llvm-svn: 358446	2019-04-15 20:42:18 +00:00
Craig Topper	3d9b47c770	[X86] Block i32/i64 for 'k' and 'Yk' in getRegForInlineAsmConstraint without avx512bw. 32 and 64 bit k-registers require avx512bw. If we don't block this properly, it leads to a crash. llvm-svn: 358436	2019-04-15 18:39:45 +00:00
Pete Couperus	3929c432e6	Add explicit dependency to MCDwarf.h in ARC backend. llvm-svn: 358430	2019-04-15 17:36:19 +00:00
Craig Topper	8e364c680f	[X86] Restore the pavg intrinsics. The pattern we replaced these with may be too hard to match as demonstrated by PR41496 and PR41316. This patch restores the intrinsics and then we can start focusing on the optimizing the intrinsics. I've mostly reverted the original patch that removed them. Though I modified the avx512 intrinsics to not have masking built in. Differential Revision: https://reviews.llvm.org/D60674 llvm-svn: 358427	2019-04-15 17:17:35 +00:00
Sean Fertile	8d856488a8	Add slbfee instruction. llvm-svn: 358425	2019-04-15 17:08:43 +00:00
Tim Renouf	842be38162	[AMDGPU] Fixed incorrect test in vcnd/vcmp optimization This fixes a test I introduced in change D59191 (that added src0 and src1 modifiers to the v_cndmask instruction for disassembly purposes). Spotted by David Binderman in bug 41488. Differential Revision: https://reviews.llvm.org/D60652 Change-Id: I6ac95e66cd84e812ed3359ad57bcd0e13198ba0c llvm-svn: 358392	2019-04-15 10:36:24 +00:00
Jim Lin	489f8255fc	[Sparc] Fix typo. NFC. llvm-svn: 358370	2019-04-15 05:16:46 +00:00
Amara Emerson	946b1246d6	[GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only. Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't be degraded. This change also improves the IRTranslator so that in most places, but not all, it creates constants using the MIRBuilder directly instead of first creating a new destination vreg and then creating a constant. By doing this, the buildConstant() method can just return the vreg of an existing G_CONSTANT instead of having to create a COPY from it. I measured a 0.2% improvement in compile time and a 0.9% improvement in code size at -O0 ARM64. Compile time: Program base cse diff test-suite...ark/tramp3d-v4/tramp3d-v4.test 9.04 9.12 0.8% test-suite...Mark/mafft/pairlocalalign.test 2.68 2.66 -0.7% test-suite...-typeset/consumer-typeset.test 5.53 5.51 -0.4% test-suite :: CTMark/lencod/lencod.test 5.30 5.28 -0.3% test-suite :: CTMark/Bullet/bullet.test 25.82 25.76 -0.2% test-suite...:: CTMark/ClamAV/clamscan.test 6.92 6.90 -0.2% test-suite...TMark/7zip/7zip-benchmark.test 34.24 34.17 -0.2% test-suite :: CTMark/SPASS/SPASS.test 6.25 6.24 -0.1% test-suite...:: CTMark/sqlite3/sqlite3.test 1.66 1.66 -0.1% test-suite :: CTMark/kimwitu++/kc.test 13.61 13.60 -0.0% Geomean difference -0.2% Code size: Program base cse diff test-suite...-typeset/consumer-typeset.test 1315632 1266480 -3.7% test-suite...:: CTMark/ClamAV/clamscan.test 1313892 1297508 -1.2% test-suite :: CTMark/lencod/lencod.test 1439504 1423112 -1.1% test-suite...TMark/7zip/7zip-benchmark.test 2936980 2904172 -1.1% test-suite :: CTMark/Bullet/bullet.test 3478276 3445460 -0.9% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8082868 `8033492` -0.6% test-suite :: CTMark/kimwitu++/kc.test `3870380` 3853972 -0.4% test-suite :: CTMark/SPASS/SPASS.test 1434904 1434896 -0.0% test-suite...Mark/mafft/pairlocalalign.test 764528 764528 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 782092 782092 0.0% Geomean difference -0.9% Differential Revision: https://reviews.llvm.org/D60580 llvm-svn: 358369	2019-04-15 05:04:20 +00:00
Amara Emerson	d189680baa	[GlobalISel] Introduce a CSEConfigBase class to allow targets to define their own CSE configs. Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE configs that can be passed between TargetPassConfig and the targets' custom pass configs. This CSEConfigBase allows targets to create custom CSE configs which is then used by the GISel passes for the CSEMIRBuilder. This support will be used in a follow up commit to allow constant-only CSE for -O0 compiles in D60580. llvm-svn: 358368	2019-04-15 04:53:46 +00:00
Craig Topper	5b92eb007b	[X86] Redefine KUNPCK instructions to take a narrower source register class than destination register class. Remove copies from the isel output pattern. There's no reason for the inputs to be the destination register class. This just forces an unnecessary copy in the output patterns. llvm-svn: 358362	2019-04-14 20:52:42 +00:00
Craig Topper	96950f1fa9	[X86] Put the locked mi8 instrutions above the locked mi/mi32 so they will be prefered. We want 64mi8 to be prefered over 64mi32. The order for 16mi/32mi doesn't really matter. llvm-svn: 358361	2019-04-14 19:00:00 +00:00
Craig Topper	72b976e5d7	[X86] Change IMUL with immediate instruction order to ri8 instructions come before ri/ri32 instructions. This will ensure IMUL64ri8 is tried before IMUL64ri32. For IMUL32 and IMUL16 the order doesn't really matter because only the ri8 versions use a predicate. That automatically gives them priority. llvm-svn: 358360	2019-04-14 18:59:57 +00:00
Craig Topper	3c57976447	[X86] Move VPTESTM matching from the isel table to custom code in X86ISelDAGToDAG. We had many tablegen patterns for these instructions. And due to the commutability of the patterns, tablegen expands them to even more patterns. All together VPTESTMD patterns accounted for more the 50K of the 610K isel table. This had gotten bad when we stopped canonicalizing AND to vXi64. This required a pattern for every combination of bitcast input type. This change moves the matching to custom code where it is easier to look through the bitcasts without being concerned with the specific types. The test changes are because we are now stricter with one use checks as its required to make load folding legal. We now require the AND and any BITCAST to only have a single use. This prevents forming VPTESTM and a VPAND with the same inputs. We now support broadcast loads for 128/256 patterns without VLX. We'll widen to 512-bit like and still fold the broadcast since the amount of memory read doesn't change. There are a few tests that got slightly longer because are now prefering load + VPTESTM over XOR+VPCMPEQ for (seteq (load), allzeros). Previously we were able to share the XOR with multiple VPTESTM instructions. llvm-svn: 358359	2019-04-14 18:26:11 +00:00
Craig Topper	b17e5ec61b	[X86] Don't form masked vpcmp/vcmp/vptestm operations if the setcc node has more than one use. We're better of emitting a single compare + kand rather than a compare for the other use and a masked compare. I'm looking into using custom instruction selection for VPTESTM to reduce the ridiculous number of permutations of patterns in the isel table. Putting a one use check on all masked compare folding makes load fold matching in the custom code easier. llvm-svn: 358358	2019-04-14 18:26:06 +00:00
Craig Topper	fdcdf74b0e	[X86] Remove some unused tablegen multiclasses. NFC llvm-svn: 358345	2019-04-14 04:20:38 +00:00
Bill Wendling	191f1487b6	[X86] Use PC-relative mode for the kernel code model Summary: The Linux kernel uses PC-relative mode, so allow that when the code model is "kernel". Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits, kees, nickdesaulniers Tags: #llvm Differential Revision: https://reviews.llvm.org/D60643 llvm-svn: 358343	2019-04-13 21:39:28 +00:00
Craig Topper	55b0d987fd	[X86] Use int64_t and isInt<N> instead of APInt operations in foldLoadStoreIntoMemOperand. NFC We know all our values are limited to 64 bits here so we don't need an APInt. This should save some generated code checking between large and small size. llvm-svn: 358338	2019-04-13 18:57:41 +00:00
Heejin Ahn	5f3a04510a	[WebAssembly] Use Function::hasOptSize() (NFC) Summary: Use member function. Reviewers: aheejin Subscribers: sunfish, hiraditya, sbc100, jgravelle-google, dschuff, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60651 Patch by Hideto Ueno (uenoku) llvm-svn: 358336	2019-04-13 16:54:39 +00:00
Amara Emerson	93e58d2396	[AArch64][GlobalISel] Enable copy elision in the pre-legalizer combine and fix a crash. This enables the simple copy combine that already exists in the CombinerHelper. However, it exposed a bug in the GISelChangeObserver where it wouldn't clear a set of MIs to process, and so would end up causing a crash when deleted MIs were being added to the combiner worklist again. Differential Revision: https://reviews.llvm.org/D60579 llvm-svn: 358318	2019-04-13 00:33:25 +00:00
Amara Emerson	2806fd01a1	[AArch64][GlobalISel] Fix a crash when selecting shufflevectors with an undef mask element. If a shufflevector's mask vector has an element with "undef" then the generic instruction defining that element register is a G_IMPLICT_DEF instead of G_CONSTANT. This fixes the selector to handle this case, and for now assumes that undef just means zero. In future we'll optimize this case properly. llvm-svn: 358312	2019-04-12 21:31:21 +00:00
Thomas Lively	9e27514996	[WebAssembly] Add mutable-globals to bleeding-edge CPU Summary: This brings the backend in line with Clang. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60594 llvm-svn: 358310	2019-04-12 20:39:53 +00:00
Brendon Cahoon	4df216cd62	[Hexagon] Fix reuse bug in Vector Loop Carried Reuse pass The Hexagon Vector Loop Carried Reuse pass was allowing reuse between two shufflevectors with different masks. The reason is that the masks are not instruction objects, so the code that checks each operand just skipped over the operands. This patch fixes the bug by checking if the operands are the same when they are not instruction objects. If the objects are not the same, then the code assumes that reuse cannot occur. Differential Revision: https://reviews.llvm.org/D60019 llvm-svn: 358292	2019-04-12 16:37:12 +00:00
Simon Pilgrim	6c8f4ada36	[X86][SSE] Recognise vXi1 boolean anyof/allof reduction patterns Currently combineHorizontalPredicateResult only handles anyof/allof reduction patterns of legal types, which can be tricky to match as type legalization of bools can introduce bitcasts/truncs/extensions. This patch extends combineHorizontalPredicateResult to recognise vXi1 bool reductions as well and uses the existing combineBitcastvxi1 helper to create the MOVMSK necessary to then compare the signmask result. This ensures the accuracy of the reduction costs added in D60403 which assume the MOVMSK generation. Differential Revision: https://reviews.llvm.org/D60610 llvm-svn: 358286	2019-04-12 14:22:57 +00:00
Kang Zhang	2446f843ae	[PowerPC] Add initialization for some ppc passes Summary: Some llc debug options need pass-name as the parameters. But if we use the pass-name ppc-early-ret, we will get below error: llc test.ll -stop-after ppc-early-ret LLVM ERROR: "ppc-early-ret" pass is not registered. Below pass-names have the pass is not registered error: ppc-ctr-loops ppc-ctr-loops-verify ppc-loop-preinc-prep ppc-toc-reg-deps ppc-vsx-copy ppc-early-ret ppc-vsx-fma-mutate ppc-vsx-swaps ppc-reduce-cr-ops ppc-qpx-load-splat ppc-branch-coalescing ppc-branch-select Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D60248 llvm-svn: 358271	2019-04-12 09:59:40 +00:00
Eric Christopher	6b06c6a5ef	Add explicit dependencies on MCSection.h and MCDwarf.h to the .cpp files rather than rely on transitive includes from MCStreamer.h. llvm-svn: 358263	2019-04-12 07:40:01 +00:00
Eric Christopher	b6926bdcff	Revert "[PowerPC] Add initialization for some ppc passes" This reverts commit `6f8f98ce8d` as it is breaking nearly every bot. llvm-svn: 358260	2019-04-12 07:16:58 +00:00
Kang Zhang	6f8f98ce8d	[PowerPC] Add initialization for some ppc passes Summary: Some llc debug options need pass-name as the parameters. But if we use the pass-name ppc-early-ret, we will get below error: llc test.ll -stop-after ppc-early-ret LLVM ERROR: "ppc-early-ret" pass is not registered. Below pass-names have the pass is not registered error: ppc-ctr-loops ppc-ctr-loops-verify ppc-loop-preinc-prep ppc-toc-reg-deps ppc-vsx-copy ppc-early-ret ppc-vsx-fma-mutate ppc-vsx-swaps ppc-reduce-cr-ops ppc-qpx-load-splat ppc-branch-coalescing ppc-branch-select Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D60248 llvm-svn: 358256	2019-04-12 06:35:15 +00:00
Eric Christopher	b6c190da23	Include what's used in a few cpp files - these were getting transitive includes from MCDwarf.h. llvm-svn: 358254	2019-04-12 06:16:33 +00:00
Zi Xuan Wu	ac79ef8f0e	[PowerPC] More precise exploitation of P9 maddld instruction when operands are constant There are 3 operands of maddld, (add (mul %1, %2), %3) and sometimes they are constant. If there is constant operand, it takes extra li to materialize the operand, and one more extra register too. So it's not profitable to use maddld to optimize mul-add pattern. Differential Revision: https://reviews.llvm.org/D60181 llvm-svn: 358253	2019-04-12 05:21:31 +00:00
Nick Desaulniers	8ec304c9fd	[X86AsmPrinter] refactor static functions into private methods. NFC Summary: A lot of the code for printing special cases of operands in this translation unit are static functions. While I too have suffered many years of abuse at the hands of C, we should prefer private methods, particularly when you start passing around *this as your first argument, which is a code smell. This will help make generic vs arch specific asm printing easier, as it brings X86AsmPrinter more in line with other arch's derived AsmPrinters. We will then be able to more easily move architecture generic code to the base class, and architecture specific code to the derived classes. Some other small refactorings while we're here: - the parameter Op is now consistently OpNo - add spaces around binary expressions. I know we're not millionaires but c'mon. Reviewers: echristo Reviewed By: echristo Subscribers: smeenai, hiraditya, llvm-commits, srhines, craig.topper Tags: #llvm Differential Revision: https://reviews.llvm.org/D60577 llvm-svn: 358236	2019-04-11 22:47:13 +00:00
Amara Emerson	7e9355f870	[AArch64][GlobalISel] Flesh out vector load/store support for more types. Some of these were legalizing into smaller vector types unnecessarily, others were simply not supported yet. llvm-svn: 358223	2019-04-11 20:40:01 +00:00
Amara Emerson	b956051415	[AArch64][GlobalISel] Legalization and ISel support for load/stores of vectors of pointers. Loads and store of values with type like <2 x p0> currently don't get imported because SelectionDAG has no knowledge of pointer types. To leverage the existing support for vector load/stores, we can bitcast the value to have s64 element types instead. We do this as a custom legalization. This patch also adds support for general loads of <2 x s64>, and relaxes some type conditions on selecting G_BITCAST. Differential Revision: https://reviews.llvm.org/D60534 llvm-svn: 358221	2019-04-11 20:32:24 +00:00
Craig Topper	68a5d619a4	[X86] Restrict vselect handling in scalarizeExtEltFP to only case to pre type legalization where the setcc result type is vXi1. If the vector setcc has been legalized then we will need to convert a vector boolean of 0 or -1 to a scalar boolean of 0 or 1. The added test case previously crashed in 32-bit mode by creating a setcc with an i64 condition that type legalization couldn't expand. llvm-svn: 358218	2019-04-11 19:57:44 +00:00
Craig Topper	586fad50ac	[X86] Add patterns for using movss/movsd for atomic load/store of f32/64. Remove atomic fadd pseudos use isel patterns instead. This patch adds patterns for turning bitcasted atomic load/store into movss/sd. It also removes the pseudo instructions for atomic RMW fadd. Instead just adding isel patterns for folding an atomic load into addss/sd. And relying on the new movss/sd store pattern to handle the write part. This also makes the fadd patterns use VEX and EVEX instructions when AVX or AVX512F are enabled. Differential Revision: https://reviews.llvm.org/D60394 llvm-svn: 358215	2019-04-11 19:19:52 +00:00
Craig Topper	f7e548c076	Recommit r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2" With correct test checks this time. If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integ This matches what gcc and icc do for this case and removes an existing FIXME. llvm-svn: 358214	2019-04-11 19:19:42 +00:00
Craig Topper	8200880c9a	Revert r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2" I seem to have messed up the test checks. llvm-svn: 358212	2019-04-11 19:04:38 +00:00
Craig Topper	1c2dfc3100	[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2 If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integer and store it to a stack temporary. From there we can do two 32-bit loads to get the value into integer registers without worrying about atomicness. This matches what gcc and icc do for this case and removes an existing FIXME. Differential Revision: https://reviews.llvm.org/D60156 llvm-svn: 358211	2019-04-11 18:40:21 +00:00
Simon Pilgrim	40b647ae8e	[X86] SimplifyDemandedVectorElts - add X86ISD::VPERMV3 mask support Completes SimplifyDemandedVectorElts's basic variable shuffle mask support which should help D60512 + D60562 llvm-svn: 358186	2019-04-11 15:29:15 +00:00
Roger Ferrer Ibanez	b621f04135	[RISCV] Diagnose invalid second input register operand when using %tprel_add RISCVMCCodeEmitter::expandAddTPRel asserts that the second operand must be x4/tp. As we are not currently checking this in the RISCVAsmParser, the assert is easy to trigger due to wrong assembly input. This patch does a late check of this constraint. An alternative could be using a singleton register class for x4/tp similar to the current one for sp. Unfortunately it does not result in a good diagnostic. Because add is an overloaded mnemonic, if no matching is possible, the diagnostic of the first failing alternative seems to be used as the diagnostic itself. This means that this case the %tprel_add is diagnosed as an invalid operand (because the real add instruction only has 3 operands). Differential Revision: https://reviews.llvm.org/D60528 llvm-svn: 358183	2019-04-11 15:13:12 +00:00
Luo, Yuanke	a2b4d3fab6	[X86] Add MM register mapping from CodeView to MC register id Differential Revision: https://reviews.llvm.org/D60437 Change-Id: I2183a6d825d0284b22705d423b88882992b236c5 llvm-svn: 358179	2019-04-11 15:01:03 +00:00
Simon Pilgrim	8a25154fa7	[X86] SimplifyDemandedVectorElts - add X86ISD::VPERMV mask support llvm-svn: 358174	2019-04-11 14:35:45 +00:00
Diogo N. Sampaio	8ddfd46c61	[AArch64] Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64 Summary: Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64 Reviewers: pbarrio, DavidSpickett, LukeGeeson Reviewed By: LukeGeeson Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60259 llvm-svn: 358171	2019-04-11 14:19:43 +00:00
Simon Pilgrim	6f3866c6fb	[X86] SimplifyDemandedVectorElts - add X86ISD::VPERMILPV mask support llvm-svn: 358170	2019-04-11 14:15:01 +00:00
Simon Pilgrim	cb5218ad48	[X86] SimplifyDemandedVectorElts - add X86ISD::VPERMIL2 mask support llvm-svn: 358167	2019-04-11 14:04:19 +00:00
Simon Pilgrim	e468cc7f14	[X86] SimplifyDemandedVectorElts - add VPPERM support We need to add support for all variable shuffle mask ops, but VPPERM is the only one that already has test coverage. llvm-svn: 358165	2019-04-11 13:30:38 +00:00
Oliver Stannard	6fa145e429	Test commit access llvm-svn: 358162	2019-04-11 12:53:33 +00:00
Shiva Chen	7cc03bd064	[RISCV] Put data smaller than eight bytes to small data section Because of gp = sdata_start_address + 0x800, gp with signed twelve-bit offset could covert most of the small data section. Linker relaxation could transfer the multiple data accessing instructions to a gp base with signed twelve-bit offset instruction. Differential Revision: https://reviews.llvm.org/D57493 llvm-svn: 358150	2019-04-11 04:59:13 +00:00
Amara Emerson	213e0bde04	[AArch64][GlobalISel] Make <2 x p0> = G_BUILD_VECTOR legal. The existing isel support already works for p0 once the legalizer accepts it. llvm-svn: 358144	2019-04-10 23:06:14 +00:00
Amara Emerson	a7ff111b04	[AArch64][GlobalISel] Add legalizer support for <8 x s16> and <16 x s8> G_ADD. llvm-svn: 358143	2019-04-10 23:06:11 +00:00
Amara Emerson	ae878dab03	[AArch64][GlobalISel] Scalarize vector SDIV. llvm-svn: 358142	2019-04-10 23:06:08 +00:00
Craig Topper	61f31cbcb2	[X86] Teach foldMaskedShiftToScaledMask to look through an any_extend from i32 to i64 between the and & shl foldMaskedShiftToScaledMask tries to reorder and & shl to enable the shl to fold into an LEA. But if there is an any_extend between them it doesn't work. This patch modifies the code to look through any_extend from i32 to i64 when the and mask only uses bits that weren't from the extended part. This will prevent a regression from D60358 caused by 64-bit SHL being narrowed to 32-bits when their upper bits aren't demanded. Differential Revision: https://reviews.llvm.org/D60532 llvm-svn: 358139	2019-04-10 21:42:08 +00:00
Craig Topper	4a32ce39b7	[X86] Make _Int instructions the preferred instructon for the assembly parser and disassembly parser to remove inconsistencies between VEX and EVEX. Many of our instructions have both a _Int form used by intrinsics and a form used by other IR constructs. In the EVEX space the _Int versions usually cover all the capabilities include broadcasting and rounding. While the other version only covers simple register/register or register/load forms. For this reason in EVEX, the non intrinsic form is usually marked isCodeGenOnly=1. In the VEX encoding space we were less consistent, but usually the _Int version was the isCodeGenOnly version. This commit makes the VEX instructions match the EVEX instructions. This was done by manually studying the AsmMatcher table so its possible I missed some cases, but we should be closer now. I'm thinking about using the isCodeGenOnly bit to simplify the EVEX2VEX tablegen code that disambiguates the _Int and non _Int versions. Currently it checks register class sizes and Record the memory operands come from. I have some other changes I was looking into for D59266 that may break the memory check. I had to make a few scheduler hacks to keep the _Int versions from being treated differently than the non _Int version. Differential Revision: https://reviews.llvm.org/D60441 llvm-svn: 358138	2019-04-10 21:29:41 +00:00
Craig Topper	87a8f9761e	[X86] Replace some if statements in isel address matching that should never be true with asserts. And move them earlier before we looked through operands that don't change size. NFC These ifs were ensuring we don't have to handle types larger than 64 bits probably because we use getZExtValue in several places below them. None of the callers of this code pass types larger than 64-bits so we can just assert instead of branching in release code. I've also moved them earlier since we're just looking through operations that don't effect bit width. This is prep work for some refactoring I plan to do to the (and (shl)) handling code. llvm-svn: 358123	2019-04-10 19:08:59 +00:00
Nick Desaulniers	ad8f3a1440	[X86AsmPrinter] refactor to limit use of Modifier. NFC Summary: The Modifier memory operands is used in 2 cases of memory references (H & P ExtraCodes). Rather than pass around the likely nullptr Modifier, refactor the handling of the Modifier out from printOperand(). The refactorings in this patch: - Don't forward declare printOperand, move its definition up. - The diff makes it look like there's a change to printPCRelImm (narrator: there's not). - Create printModifiedOperand() - Move logic for Modifier to there from printOperand - Use printModifiedOperand in 3 call sites that actually create Modifiers. - Remove now unused Modifier parameter from printOperand - Remove default parameter from printLeaMemReference as it only has 1 call site that explicitly passes a parameter. - Remove default parameter from printMemReference, make call lone call site explicitly pass nullptr. - Drop Modifier parameter from printIntelMemReference, as Intel style memory references don't support the Modifiers in question. This will allow future changes to printOperand() to make it a pure virtual method on the base AsmPrinter class, allowing for more generic handling of some architecture generic constraints. X86AsmPrinter was the only derived class of AsmPrinter to have additional parameters on its printOperand function. Reviewers: craig.topper, echristo Reviewed By: echristo Subscribers: hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60526 llvm-svn: 358122	2019-04-10 19:01:44 +00:00
Roman Lebedev	dc67659ba5	[X86] X86ScheduleBdVer2: use !listsplat operator to cleanup loadres calculation The problem is that one can't concatenate an empty list (implied all-ones) with non-empty list here. The result will be the non-empty list, and it won't match the length of the ExePorts list. The problems begin when LoadRes != 1 here, which is the case in PdWriteResYMMPair, and more importantly i think it will be the case for PdWriteResExPair. llvm-svn: 358118	2019-04-10 18:26:42 +00:00
David Green	0861c87b06	Revert rL357745: [SelectionDAG] Compute known bits of CopyFromReg Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not seeing through to the constant in other blocks. Revert this patch while we come up with a better way to handle that. I will try to follow this up with some better tests. llvm-svn: 358113	2019-04-10 18:00:41 +00:00
Craig Topper	35fe07916a	[AArch64] Teach getTestBitOperand to look through ANY_EXTENDS This patch teach getTestBitOperand to look through ANY_EXTENDs when the extended bits aren't used. The test case changed here is based what D60358 did to test16 in tbz-tbnz.ll. So this patch will avoid that regression. Differential Revision: https://reviews.llvm.org/D60482 llvm-svn: 358108	2019-04-10 17:27:29 +00:00
Nick Desaulniers	5277b3ff25	[AsmPrinter] refactor to remove remove AsmVariant. NFC Summary: The InlineAsm::AsmDialect is only required for X86; no architecture makes use of it and as such it gets passed around between arch-specific and general code while being unused for all architectures but X86. Since the AsmDialect is queried from a MachineInstr, which we also pass around, remove the additional AsmDialect parameter and query for it deep in the X86AsmPrinter only when needed/as late as possible. This refactor should help later planned refactors to AsmPrinter, as this difference in the X86AsmPrinter makes it harder to make AsmPrinter more generic. Reviewers: craig.topper Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60488 llvm-svn: 358101	2019-04-10 16:38:43 +00:00
Simon Pilgrim	37d8d55823	[X86][AVX] getTargetConstantBitsFromNode - extract bits from X86ISD::SUBV_BROADCAST llvm-svn: 358096	2019-04-10 16:24:47 +00:00
Diogo N. Sampaio	aae424a2d2	[AArch64] Add lowering pattern for scalar fp16 facge and facgt Summary: The fp16 scalar version of facge and facgt requires a custom patter matching, as the result type is not the same width of the operands. Reviewers: olista01, javed.absar, pbarrio Reviewed By: javed.absar Subscribers: kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60212 llvm-svn: 358083	2019-04-10 13:34:18 +00:00
Diogo N. Sampaio	651463e4a8	[ARM] [FIX] Add missing f16 vector operations lowering Summary: Add missing <8xhalf> shufflevectors pattern, when using concat_vector dag node. As well, allows <8xhalf> and <4xhalf> vldup1 operations. These instructions are required for v8.2a fp16 lowering of vmul_n_f16, vmulq_n_f16 and vmulq_lane_f16 intrinsics. Reviewers: olista01, pbarrio, LukeGeeson, efriedma Reviewed By: efriedma Subscribers: efriedma, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60319 llvm-svn: 358081	2019-04-10 13:28:06 +00:00
Clement Courbet	48e2eb0b27	[NFC] Fix unused variable warning. llvm-svn: 358080	2019-04-10 13:18:05 +00:00
Diana Picus	6bdade85de	Fixup r358063 Fix warning/error about mixed signedness. llvm-svn: 358065	2019-04-10 09:31:28 +00:00
Diana Picus	4a7f8d8d6b	[ARM GlobalISel] Add some asserts. NFC. Make sure some arm opcodes don't unintentionally sneak into thumb mode. llvm-svn: 358064	2019-04-10 09:14:37 +00:00
Diana Picus	b6e83b98f9	[ARM GlobalISel] Select G_FCONSTANT for VFP3 Make it possible to TableGen code for FCONSTS and FCONSTD. We need to make two changes to the TableGen descriptions of vfp_f32imm and vfp_f64imm respectively: * add GISelPredicateCode to check that the immediate fits in 8 bits; * extract the SDNodeXForms into separate definitions and create a GISDNodeXFormEquiv and a custom renderer function for each of them. There's a lot of boilerplate to get the actual value of the immediate, but it basically just boils down to calling ARM_AM::getFP32Imm or ARM_AM::getFP64Imm. llvm-svn: 358063	2019-04-10 09:14:32 +00:00
Diana Picus	3533ad6801	[ARM GlobalISel] Select G_FCONSTANT into pools Put all floating point constants into constant pools and load their values from there. llvm-svn: 358062	2019-04-10 09:14:24 +00:00
Diana Picus	165846b031	[ARM GlobalISel] Map G_FCONSTANT llvm-svn: 358061	2019-04-10 09:14:16 +00:00
Craig Topper	391d5caa10	[X86] Move the 2 byte VEX optimization for MOV instructions back to the X86AsmParser::processInstruction where it used to be. Block when {vex3} prefix is present. Years ago I moved this to an InstAlias using VR128H/VR128L. But now that we support {vex3} pseudo prefix, we need to block the optimization when it is set to match gas behavior. llvm-svn: 358046	2019-04-10 05:43:20 +00:00
Jim Lin	a49c95e02a	[Sparc] Fix incorrect MI insertion position for spilling f128. Summary: Obviously, new built MI (sethi+add or sethi+xor+add) for constructing large offset should be inserted before new created MI for storing even register into memory. So the insertion position should be *StMI instead of II. before fixed: std %f0, [%g1+80] sethi 4, %g1 <<< add %g1, %sp, %g1 <<< this two instructions should be put before "std %f0, [%g1+80]". sethi 4, %g1 add %g1, %sp, %g1 std %f2, [%g1+88] after fixed: sethi 4, %g1 add %g1, %sp, %g1 std %f0, [%g1+80] sethi 4, %g1 add %g1, %sp, %g1 std %f2, [%g1+88] Reviewers: venkatra, jyknight Reviewed By: jyknight Subscribers: jyknight, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60397 llvm-svn: 358042	2019-04-10 01:56:32 +00:00
Craig Topper	9ca3a95f79	[X86] Support the EVEX versions vcvt(t)ss2si and vcvt(t)sd2si with the {evex} pseudo prefix in the assembler. The EVEX versions are ambiguous with the VEX versions based on operands alone so we had explicitly dropped them from the AsmMatcher table. Unfortunately, when we add them they incorrectly show in the table before their VEX counterparts. This is different how the prioritization normally works. To fix this we have to explicitly reject the instructions unless the {evex} prefix has been seen. llvm-svn: 358041	2019-04-10 01:29:59 +00:00
Craig Topper	7143224272	[X86] Add VEX_LIG to scalar VEX/EVEX instructions that were missing it. Scalar VEX/EVEX instructions don't use the L bit and don't look at it for decoding either. So we should ignore it in our disassembler. The missing instructions here were found by grepping the raw tablegen class definitions in the tablegen debug output. llvm-svn: 358040	2019-04-09 23:30:36 +00:00
Craig Topper	60f83544bb	[X86] Fix a dangling StringRef issue introduced in r358029. I was attempting to convert mnemonics to lower case after processing a pseudo prefix. But the ParseOperands just hold a StringRef for tokens so there is no where to allocate the memory. Add FIXMEs for the lower case issue which also exists in the prefix parsing code. llvm-svn: 358036	2019-04-09 21:37:21 +00:00
Amara Emerson	9bf092d719	[AArch64][GlobalISel] Add isel support for vector G_ICMP and G_ASHR & G_SHL The selection for G_ICMP is unfortunately not currently importable from SDAG due to the use of custom SDNodes. To support this, this selection method has an opcode table which has been generated by a script, indexed by various instruction properties. Ideally in future we will have a GISel native selection patterns that we can write in tablegen to improve on this. For selection of some types we also need support for G_ASHR and G_SHL which are generated as a result of legalization. This patch also adds support for them, generating the same code as SelectionDAG currently does. Differential Revision: https://reviews.llvm.org/D60436 llvm-svn: 358035	2019-04-09 21:22:43 +00:00
Amara Emerson	888dd5d198	[AArch64][GlobalISel] Legalize vector G_ICMP. Selection support will be coming in a later patch. Differential Revision: https://reviews.llvm.org/D60435 llvm-svn: 358034	2019-04-09 21:22:40 +00:00
Amara Emerson	92d74f19cf	[AArch64][GlobalISel] Add legalization for some vector G_SHL and G_ASHR. This is needed for some future support for vector ICMP. Differential Revision: https://reviews.llvm.org/D60433 llvm-svn: 358033	2019-04-09 21:22:37 +00:00
Amara Emerson	2b523f8162	[GlobalISel][AArch64] Allow CallLowering to handle types which are normally required to be passed as different register types. E.g. <2 x i16> may need to be passed as a larger <2 x i32> type, so formal arg lowering needs to be able truncate it back. Likewise, when dealing with returns of these types, they need to be widened in the appropriate way back. Differential Revision: https://reviews.llvm.org/D60425 llvm-svn: 358032	2019-04-09 21:22:33 +00:00
Craig Topper	8e2871cd2c	[X86] Add support for {vex2}, {vex3}, and {evex} to the assembler to match gas. Use {evex} to improve the one our 32-bit AVX512 tests. These can be used to force the encoding used for instructions. {vex2} will fail if the instruction is not VEX encoded, but otherwise won't do anything since we prefer vex2 when possible. Might need to skip use of the _REV MOV instructions for this too, but I haven't done that yet. {vex3} will force the instruction to use the 3 byte VEX encoding or fail if there is no VEX form. {evex} will force the instruction to use the EVEX version or fail if there is no EVEX version. Differential Revision: https://reviews.llvm.org/D59266 llvm-svn: 358029	2019-04-09 18:45:15 +00:00
Hiroshi Inoue	30d3c58b81	[PowerPC] fix trivial typos in comment, NFC llvm-svn: 357981	2019-04-09 08:40:02 +00:00
Craig Topper	53ee783c6e	[X86] Have EVEX2VEX tablegenerator use HasVEX_L and HasEVEX_L2 fields instead of the composite EVEX_LL field. Remove the EVEX_LL field. NFCI The composite existed to simplify some other tablegen code and not really in an important way. Remove the combined field and just calculate the vector size using two ifs. llvm-svn: 357972	2019-04-09 07:40:14 +00:00
Craig Topper	f19f991b7f	[X86] Use VEX_WIG for VPINSRB/W and VPEXTRB/W to match what is done for EVEX. The instruction's document this as W0 for the VEX encoding. But there's a footnote mentioning that VEX.W is ignored in 64-bit mode. And the main VEX encoding description says the VEX.W bit is ignored for instructions that are equivalent to a legacy SSE instruction that uses REX.W to select a GPR which would apply here. By making this match EVEX we can remove a special case of allowing EVEX2VEX to turn an EVEX.WIG instruction into VEX.W0. llvm-svn: 357971	2019-04-09 07:40:10 +00:00
Craig Topper	2f9c1732b8	[X86] Split the VEX_WPrefix in X86Inst tablegen class into 3 separate fields with clear meanings. llvm-svn: 357970	2019-04-09 07:40:06 +00:00
Tom Stellard	206b9927f8	AMDGPU/GlobalISel: Implement call lowering for shaders returning values Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, llvm-commits Differential Revision: https://reviews.llvm.org/D57166 llvm-svn: 357964	2019-04-09 02:26:03 +00:00
Chen Zheng	19ce6719bc	[PowerPC] initialize SchedModel according to platform. Differential Revision: https://reviews.llvm.org/D60177 llvm-svn: 357962	2019-04-09 01:25:25 +00:00
Craig Topper	6c11a31bce	[X86] Derive ssmem and sdmem from X86MemOperand. NFCI This changes the operand type from v4f32/v2f64 to iPTR which seems more correct. But that doesn't seem to do anything other than change the comments in X86GenDAGISel.inc. Probably because we use a ComplexPattern to do the matching so there's no autogenerated code to change. llvm-svn: 357959	2019-04-09 00:24:17 +00:00
Craig Topper	3a4c2192a4	[X86] Fix a couple lowering functions that called ReplaceAllUsesOfValueWith for the newly created code and then return SDValue(). Use MERGE_VALUES instead. Returning SDValue() makes the caller think custom lowering was unsuccessful and then it will fall back to trying to expand the original node. This expanded code will end up with no users and end up being pruned later. But it was useless unnecessary work to create it. Instead return a MERGE_VALUES with all the results so the caller knows something changed. The caller can handle the replacements. For one of the cases I had to use UNDEF has a dummy value for a result we know is unused. This should get pruned later. llvm-svn: 357935	2019-04-08 19:44:07 +00:00
Sanjay Patel	50c3b290ed	[x86] make 8-bit shl undesirable I was looking at a potential DAGCombiner fix for 1 of the regressions in D60278, and it caused severe regression test pain because x86 TLI lies about the desirability of 8-bit shift ops. We've hinted at making all 8-bit ops undesirable for the reason in the code comment: // TODO: Almost no 8-bit ops are desirable because they have no actual // size/speed advantages vs. 32-bit ops, but they do have a major // potential disadvantage by causing partial register stalls. ...but that leads to massive diffs and exposes all kinds of optimization holes itself. Differential Revision: https://reviews.llvm.org/D60286 llvm-svn: 357912	2019-04-08 13:58:50 +00:00
Craig Topper	6a6da233b9	[X86] Make LowerOperationWrapper more robust. Remove now unnecessary ReplaceAllUsesWith from LowerMSCATTER. Previously LowerOperationWrapper took the number of results from the original node and counted that many results from the new node. This was intended to drop chain operands from FP_TO_SINT lowering that uses X87 with memory operations to stack temporaries. The final load had an extra chain output that needs to be ignored. Unfortunately, it didn't work with scatter which has 2 result operands, the mask output which is discarded and a chain output. The chain output is the one that is needed but it comes second and it would be dropped by the previous logic here. To workaround this we were doing a ReplaceAllUses in the lowering code so that the generic legalization code wouldn't see any uses to replace since it had been given the wrong result/type. After this change we take the LowerOperation result directly if the original node has one result. This allows us to directly return the chain from scatter or the load data from the FP_TO_SINT case. When the original node has multiple results we'll ensure the returned node has the same number and copy them over. For cases where the original node has multiple results and the new code for some reason has even more results, MERGE_VALUES can be used to pass only the needed results. llvm-svn: 357887	2019-04-08 07:39:17 +00:00
Craig Topper	424417da79	[X86] Use (SUBREG_TO_REG (MOV32rm)) for extloadi64i8/extloadi64i16 when the load is 4 byte aligned or better and not volatile. Summary: Previously we would use MOVZXrm8/MOVZXrm16, but those are longer encodings. This is similar to what we do in the loadi32 predicate. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60341 llvm-svn: 357875	2019-04-07 19:19:44 +00:00
Nikita Popov	3db93ac5d6	Reapply [ValueTracking] Support min/max selects in computeConstantRange() Add support for min/max flavor selects in computeConstantRange(), which allows us to fold comparisons of a min/max against a constant in InstSimplify. This fixes an infinite InstCombine loop, with the test case taken from D59378. Relative to the previous iteration, this contains some adjustments for AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen, which ends up constant folding some existing med3 tests after this change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option is added, which allows disabling scalar IR passes (that use InstSimplify) for testing purposes. Differential Revision: https://reviews.llvm.org/D59506 llvm-svn: 357870	2019-04-07 17:22:16 +00:00
Simon Pilgrim	6d7fdd9ab7	[CostModel][X86] Masked load legalization requires an binary-shuffle not a select (PR39812) Expansion/truncation is better described by SK_PermuteTwoSrc than SK_Select llvm-svn: 357864	2019-04-07 13:26:09 +00:00
Simon Pilgrim	07adb6abda	[X86][SSE] SimplifyDemandedBitsForTargetNode - Add initial PACKSS support In the case where we only want the sign bit (e.g. when using PACKSS truncation of comparison results for MOVMSK) then we can just demand the sign bit of the source operands. This makes use of the fact that PACKSS saturates out of range values to the min/max int values - so the sign bit is always preserved. Differential Revision: https://reviews.llvm.org/D60333 llvm-svn: 357859	2019-04-07 10:40:01 +00:00
Craig Topper	399102b464	[X86] When converting (x << C1) AND C2 to (x AND (C2>>C1)) << C1 during isel, try using andl over andq by favoring 32-bit unsigned immediates. llvm-svn: 357848	2019-04-06 19:00:11 +00:00
Simon Pilgrim	d0a53d4914	[X86] combineBitcastvxi1 - provide dst VT and src SDValue directly. NFCI. Prep work to make it easier to reuse the BITCAST->MOVSMK combine in other cases. llvm-svn: 357847	2019-04-06 18:54:17 +00:00
Craig Topper	f9b9f8d2e4	[X86] Use a signed mask in foldMaskedShiftToScaledMask to enable a shorter immediate encoding. This function reorders AND and SHL to enable the SHL to fold into an LEA. The upper bits of the AND will be shifted out by the SHL so it doesn't matter what mask value we use for these bits. By using sign bits from the original mask in these upper bits we might enable a shorter immediate encoding to be used. llvm-svn: 357846	2019-04-06 18:00:50 +00:00
Simon Pilgrim	af1cbdd3ba	Fix spelling mistake. NFCI. llvm-svn: 357843	2019-04-06 15:38:34 +00:00
Stanislav Mekhanoshin	5182302a37	[AMDGPU] Sort out and rename multiple CI/VI predicates Differential Revision: https://reviews.llvm.org/D60346 llvm-svn: 357835	2019-04-06 09:20:48 +00:00
Francis Visoiu Mistrih	9d9d1b6b2b	[X86] Enable tail calls for CallingConv::Swift It's currently only enabled on AArch64 (enabled in r281376). llvm-svn: 357809	2019-04-05 20:18:25 +00:00
Francis Visoiu Mistrih	ab051a378c	[X86] Preserve operand flag when expanding TCRETURNri The expansion of TCRETURNri(64) would not keep operand flags like undef/renamable/etc. which can result in machine verifier issues. Also add plumbing to be able to use `-run-pass=x86-pseudo`. llvm-svn: 357808	2019-04-05 20:18:21 +00:00
Stanislav Mekhanoshin	c8f78f8dd3	[AMDGPU] Add MachineDCE pass after RenameIndependentSubregs Detect dead lanes can create some dead defs. Then RenameIndependentSubregs will break a REG_SEQUENCE which may use these dead defs. At this point a dead instruction can be removed but we do not run a DCE anymore. MachineDCE was only running before live variable analysis. The patch adds a mean to preserve LiveIntervals and SlotIndexes in case it works past this. Differential Revision: https://reviews.llvm.org/D59626 llvm-svn: 357805	2019-04-05 20:11:32 +00:00
Craig Topper	80aa2290fb	[X86] Merge the different Jcc instructions for each condition code into single instructions that store the condition code as an operand. Summary: This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon Reviewed By: RKSimon Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60228 llvm-svn: 357802	2019-04-05 19:28:09 +00:00
Craig Topper	7323c2bf85	[X86] Merge the different SETcc instructions for each condition code into single instructions that store the condition code as an operand. Summary: This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri Reviewed By: andreadb Subscribers: hiraditya, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60138 llvm-svn: 357801	2019-04-05 19:27:49 +00:00
Craig Topper	e0bfeb5f24	[X86] Merge the different CMOV instructions for each condition code into single instructions that store the condition code as an immediate. Summary: Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models. This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between CMOV instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. This does complicate the scheduler models a little since we can't assign the A and BE instructions to a separate class now. I plan to make similar changes for SETcc and Jcc. Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet Reviewed By: RKSimon Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60041 llvm-svn: 357800	2019-04-05 19:27:41 +00:00
Stanislav Mekhanoshin	7895c03232	[AMDGPU] predicate and feature refactoring We have done some predicate and feature refactoring lately but did not upstream it. This is to sync. Differential revision: https://reviews.llvm.org/D60292 llvm-svn: 357791	2019-04-05 18:24:34 +00:00
Fangrui Song	2c5c12c041	Change some dyn_cast to more apropriate isa. NFC llvm-svn: 357773	2019-04-05 16:16:23 +00:00
Matt Arsenault	4ed6ccab9b	AMDGPU/GlobalISel: Fix non-power-of-2 select llvm-svn: 357762	2019-04-05 14:03:04 +00:00
Sanjay Patel	50a8652785	[DAGCombiner][x86] scalarize splatted vector FP ops There are a variety of vector patterns that may be profitably reduced to a scalar op when scalar ops are performed using a subset (typically, the first lane) of the vector register file. For x86, this is true for float/double ops and element 0 because insert/extract is just a sub-register rename. Other targets should likely enable the hook in a similar way. Differential Revision: https://reviews.llvm.org/D60150 llvm-svn: 357760	2019-04-05 13:32:17 +00:00
Piotr Sobczak	0376ac1d94	[SelectionDAG] Compute known bits of CopyFromReg Summary: Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if the virtual reg used has one def only. This can be particularly useful when calling isBaseWithConstantOffset() with the ISD::CopyFromReg argument, as more optimizations may get enabled in the result. Also add a missing truncation on X86, found by testing of this patch. Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa Reviewers: bogner, craig.topper, RKSimon Reviewed By: RKSimon Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59535 llvm-svn: 357745	2019-04-05 07:44:09 +00:00
Craig Topper	94f1772b1e	[X86] Promote i16 SRA instructions to i32 We already promote SRL and SHL to i32. This will introduce sign extends sometimes which might be harder to deal with than the zero we use for promoting SRL. I ran this through some of our internal benchmark lists and didn't see any major regressions. I think there might be some DAG combine improvement opportunities in the test changes here. Differential Revision: https://reviews.llvm.org/D60278 llvm-svn: 357743	2019-04-05 06:32:50 +00:00
Evandro Menezes	85bd3978ae	[IR] Refactor attribute methods in Function class (NFC) Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731	2019-04-04 22:40:06 +00:00
James Y Knight	a040174418	Revert [X86] When using Win64 ABI, exit with error if SSE is disabled for varargs It unnecessarily breaks previously-working code which used varargs, but didn't pass any float/double arguments (such as EDK2). Also revert the fixup on top of that: Revert [X86] Fix a test from r357317 This reverts r357317 (git commit `d413f41de6`) This reverts r357380 (git commit `7af32444b9`) llvm-svn: 357718	2019-04-04 19:05:48 +00:00
Sam Clegg	2a7cac932b	[WebAssembly] Add new explicit relocation types for PIC relocations See https://github.com/WebAssembly/tool-conventions/pull/106 Differential Revision: https://reviews.llvm.org/D59907 llvm-svn: 357710	2019-04-04 17:43:50 +00:00
Sanjay Patel	17648b848e	[x86] eliminate unnecessary broadcast of horizontal op This is another pattern that comes up if we more aggressively scalarize FP ops. llvm-svn: 357703	2019-04-04 14:46:13 +00:00
Lewis Revill	aa79a3fe8e	[RISCV] Support assembling TLS add and associated modifiers This patch adds support in the MC layer for parsing and assembling the 4-operand add instruction needed for TLS addressing. This also involves parsing the %tprel_hi, %tprel_lo and %tprel_add operand modifiers. Differential Revision: https://reviews.llvm.org/D55341 llvm-svn: 357698	2019-04-04 14:13:37 +00:00
Jonas Paulsson	c56ffed304	[SystemZ] Bugfix in isFusableLoadOpStorePattern() This function is responsible for checking the legality of fusing an instance of load -> op -> store into a single operation. In the SystemZ backend the check was incomplete and a test case emerged with a cycle in the instruction selection DAG as a result. Instead of using the NodeIds to determine node relationships, hasPredecessorHelper() now is used just like in the X86 backend. This handled the failing tests and as well gave a few additional transformations on benchmarks. The SystemZ isFusableLoadOpStorePattern() is now a very near copy of the X86 function, and it seems this could be made a utility function in common code instead. Review: Ulrich Weigand https://reviews.llvm.org/D60255 llvm-svn: 357688	2019-04-04 12:12:35 +00:00
Diana Picus	153c3887e4	[ARM GlobalISel] Support DBG_VALUE Make sure we can map and select DBG_VALUE. llvm-svn: 357681	2019-04-04 10:24:51 +00:00
Sander de Smalen	772e4734d9	[AArch64][AsmParser] Fix .arch_extension directive parsing This patch fixes .arch_extension directive parsing to handle a wider range of architecture extension options. The existing parser was parsing extensions as an identifier which breaks for extensions containing a "-", such as the "tlb-rmi" extension. The extension is now parsed as a string. This is consistent with the extension parsing in the .arch and .cpu directive parsing. Patch by Cullen Rhodes (c-rhodes) Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D60118 llvm-svn: 357677	2019-04-04 09:11:17 +00:00
Craig Topper	3649c20884	[X86] Use INSERT_SUBREG rather than SUBREG_TO_REG when creating LEA64_32 during isel. SUBREG_TO_REG is supposed to be used to assert that we know the upper bits are zero. But that isn't the case here. We've done no analysis of the inputs. llvm-svn: 357673	2019-04-04 05:00:18 +00:00
Sam Clegg	12011fae17	[WebAssembly] EmscriptenEHSjLj: Don't abort if __THREW__ is defined This allows __THREW__ to be defined in the current module, although it is still required to be a GlobalVariable. In emscripten we want to be able to compile the source code that defines this symbols. Previously we were avoid this by not running this pass when building that compiler-rt library, but I have change out to build it using the normal compiler path: https://github.com/emscripten-core/emscripten/pull/8391 Differential Revision: https://reviews.llvm.org/D60232 llvm-svn: 357665	2019-04-04 01:43:21 +00:00
Craig Topper	051bd16faf	[X86] Remove CustomInserters for RDPKRU/WRPKRU. Use some custom lowering and new ISD opcodes instead. These inserters inserted some instructions to zero some registers and copied from virtual registers to physical registers. This change instead inserts the zeros directly into the DAG at lowering time using new ISD opcodes that take the extra zeroes as inputs. The zeros will then go through isel on their own to select the MOV32r0 pseudo. Then we just need to mention the physical registers directly in the isel patterns and the isel table and InstrEmitter will take care of inserting the necessary copies to/from physical registers. llvm-svn: 357659	2019-04-04 00:28:49 +00:00
Craig Topper	52cac4b79f	[X86] Remove CustomInserter pseudos for MONITOR/MONITORX/CLZERO. Use custom instruction selection instead. This custom inserter existed so we could do a weird thing where we pretended that the instructions support a full address mode instead of taking a pointer in EAX/RAX. I think was largely so we could be pointer size agnostic in the isel pattern. To make this work we would then put the address into an LEA into EAX/RAX in front of the instruction after isel. But the LEA is overkill when we just have a base pointer. So we end up using the LEA as a slower MOV instruction. With this change we now just do custom selection during isel instead and just assign the incoming address of the intrinsic into EAX/RAX based on its size. After the intrinsic is selected, we can let isel take care of selecting an LEA or other operation to do any address computation needed in this basic block. I've also split the instruction into a 32-bit mode version and a 64-bit mode version so the implicit use is properly sized based on the pointer. Without this we get comments in the assembly output about killing eax and defing rax or vice versa depending on whether we define the instruction to use EAX/RAX. llvm-svn: 357652	2019-04-03 23:28:30 +00:00
Sanjay Patel	c9a012e4ea	[x86] fold shuffles of h-ops that have an undef operand If an operand is undef, we can assume it's the same as the other operand. llvm-svn: 357644	2019-04-03 22:40:35 +00:00
Sanjay Patel	61b5e3c6a9	[x86] eliminate movddup of horizontal op This pattern would show up as a regression if we more aggressively convert vector FP ops to scalar ops. There's still a missed optimization for the v4f64 legal case (AVX) because we create that h-op with an undef operand. We should probably just duplicate the operands for that pattern to avoid trouble. llvm-svn: 357642	2019-04-03 22:15:29 +00:00
Evandro Menezes	7c711ccf36	[IR] Create new method in `Function` class (NFC) Create method `optForNone()` testing for the function level equivalent of `-O0` and refactor appropriately. Differential revision: https://reviews.llvm.org/D59852 llvm-svn: 357638	2019-04-03 21:27:03 +00:00
Matt Arsenault	396653f8a1	AMDGPU: Split block for si_end_cf Relying on no spill or other code being inserted before this was precarious. It relied on code diligently checking isBasicBlockPrologue which is likely to be forgotten. Ideally this could be done earlier, but this doesn't work because of phis. Any other instruction can't be placed before them, so we have to accept the position being incorrect during SSA. This avoids regressions in the fast register allocator rewrite from inverting the direction. llvm-svn: 357634	2019-04-03 20:53:20 +00:00
Krzysztof Parzyszek	4841643a1d	[X86] Extend boolean arguments to inline-asm according to getBooleanType Differential Revision: https://reviews.llvm.org/D60208 llvm-svn: 357615	2019-04-03 17:43:14 +00:00
Simon Pilgrim	15919ad306	[X86][AVX] combineHorizontalPredicateResult - split any/allof v16i16/v32i8 reduction on AVX1 Perform the 2 x 128-bit lo/hi OR/AND on the vectors before calling PMOVMSKB on the 128-bit result. llvm-svn: 357611	2019-04-03 17:28:34 +00:00
Simon Pilgrim	9e28dddf55	[X86][AVX] combineHorizontalPredicateResult - support v16i16/v32i8 reduction on AVX1 Use getPMOVMSKB helper which splits v32i8 MOVMSK calls on pre-AVX2 targets. llvm-svn: 357608	2019-04-03 17:17:13 +00:00
Jessica Paquette	e794121cd0	[AArch64][GlobalISel] Legalize G_FEXP2 Same as G_EXP. Add a test, and update legalizer-info-validation.mir and f16-instructions.ll. Differential Revision: https://reviews.llvm.org/D60165 llvm-svn: 357605	2019-04-03 16:58:32 +00:00
Lewis Revill	24a74096a4	Test commit: Remove double variable assignment llvm-svn: 357601	2019-04-03 15:54:30 +00:00
Ulrich Weigand	35dfd1b7df	[SystemZ] Improve codegen for certain SADDO-immediate cases When performing an add-with-overflow with an immediate in the range -2G ... -4G, code currently loads the immediate into a register, which generally takes two instructions. In this particular case, it is preferable to load the negated immediate into a register instead, which always only requires one instruction, and then perform a subtract. llvm-svn: 357597	2019-04-03 15:09:19 +00:00
Petar Avramovic	afa3afa384	[MIPS GlobalISel] Select floating point arithmetic operations Select 32 and 64 bit floating point add, sub, mul and div for MIPS32. Differential Revision: https://reviews.llvm.org/D60191 llvm-svn: 357584	2019-04-03 14:12:59 +00:00
Javed Absar	5820db93c9	[AArch64] Update v8.5a MTE LDG/STG instructions The latest MTE specification adds register Xt to the STG instruction family: STG [Xn, #offset] -> STG Xt, [Xn, #offset] The tag written to memory is taken from Xt rather than Xn. Also, the LDG instruction also was changed to read return address from Xt: LDG Xt, [Xn, #offset]. This patch includes those changes and tests. Specification is at: https://developer.arm.com/docs/ddi0596/c Differential Revision: https://reviews.llvm.org/D60188 llvm-svn: 357583	2019-04-03 14:12:13 +00:00
Simon Atanasyan	eb9ae56157	[mips] Remove unused FGRH32 register class. NFC If we need this class in the future we will easily restore it. Differential Revision: http://reviews.llvm.org/D60132 llvm-svn: 357570	2019-04-03 10:08:27 +00:00
Clement Courbet	26a8ed3ac9	[X86] Make the post machine scheduler macrofusion-aware. Summary: Given that X86 does not use this currently, this is an NFC. I'll experiment with enabling and will report numbers. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60185 llvm-svn: 357568	2019-04-03 09:37:30 +00:00
Matt Arsenault	f426ddbfc7	AMDGPU: Assume ECC is enabled by default if supported The test should really be checking for the property directly in the code object headers, but there are problems with this. I don't see this directly represented in the text form, and for the binary emission this is depending on a function level subtarget feature to emit a global flag. llvm-svn: 357558	2019-04-03 01:58:57 +00:00
Sam Clegg	ef4c66c1c8	[WebAssembly] Remove unneeded target operand flags This change is in preparation for the addition of new target operand flags for new relocation types. Have a symbol type as part of the flag set makes it harder to use and AFAICT these are serving no purpose. Differential Revision: https://reviews.llvm.org/D60014 llvm-svn: 357548	2019-04-03 00:17:29 +00:00
Matt Arsenault	807bedab2e	AMDGPU: Remove unnecessary subtarget get llvm-svn: 357542	2019-04-03 00:01:05 +00:00
Matt Arsenault	45c165b917	AMDGPU: Fix names for generation features We should overall stop using these, but the uppercase name didn't work. Any feature string is converted to lowercase, so these could never be found in the table. llvm-svn: 357541	2019-04-03 00:01:03 +00:00
Craig Topper	9224c114a9	[X86] Mark the default case of the X86InstrInfo::convertToThreeAddress switch as unreachable. This function should only be called with instructions that are really convertible. And all convertible instructions need to be handled by the switch. So nothing should use the default. llvm-svn: 357529	2019-04-02 20:52:16 +00:00
Craig Topper	ffd8662558	[X86] Check MI.isConvertibleTo3Addr() before calling convertToThreeAddress in X86FixupLEAs. X86FixupLEAs just assumes convertToThreeAddress will return nullptr for any instruction that isn't convertible. But the code in convertToThreeAddress for X86 assumes that any instruction coming in has at least 2 operands and that the second one is a register. But those properties aren't guaranteed of all instructions. We should check the instruction property first. llvm-svn: 357528	2019-04-02 20:52:10 +00:00
Jessica Paquette	22c6215c7e	[AArch64][GlobalISel] Select llvm.aarch64.stlxr(i64, i64*) This adds partial instruction selection support for llvm.aarch64.stlxr. It also factors out selection for G_INTRINSIC_W_SIDE_EFFECTS into its own function. The new function removes the restriction that the intrinsic ID on the G_INTRINSIC_W_SIDE_EFFECTS be on operand 0. Also add a test, and add a GISel line to arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D60100 llvm-svn: 357518	2019-04-02 19:57:26 +00:00
Craig Topper	0d3a533270	[X86] Allow FixupLEAs to form INC/DEC under OptSize not just MinSize This matches our usual INC/DEC heuristic used during isel. llvm-svn: 357497	2019-04-02 17:13:03 +00:00
Stefan Pintilie	fa6cd5ceb9	[PowerPC] Fix reversed bit issue in DCMX mask for "xvtstdcdp" and "xvtstdcsp" P9 implementation Did experiments on power 9 machine, checked the outputs for NaN & Infinity+ cases with corresponding DCMX bit set. Confirmed the DCMX mask bit for NaN and infinity+ are reversed. This patch fixes the issue. Patch by Victor Huang. Differential Revision: https://reviews.llvm.org/D59384 llvm-svn: 357494	2019-04-02 16:56:01 +00:00
Fangrui Song	83db88717b	[BPF] Replace fstream and sstream with line_iterator Summary: This makes libLLVMBPFCodeGen.so 1128 bytes smaller for my build. Reviewers: yonghong-song Reviewed By: yonghong-song Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60117 llvm-svn: 357489	2019-04-02 16:15:46 +00:00
Jonas Paulsson	f76fe45426	[SystemZ] Improve instruction selection of 64 bit shifts and rotates. For shift and rotate instructions that only use the last 6 bits of the shift amount, a shift amount of (x*64-s) can be substituted with (-s). This saves one instruction and a register: lhi %r1, 64 sr %r1, %r3 sllg %r2, %r2, 0(%r1) => lcr %r1, %r3 sllg %r2, %r2, 0(%r1) Review: Ulrich Weigand llvm-svn: 357481	2019-04-02 15:36:30 +00:00
Simon Atanasyan	4cefa15a14	[mips] Remove the override of the `isMachineVerifierClean()` All issues found by machine verifier in MIPS target have been fixed. llvm-svn: 357473	2019-04-02 13:57:38 +00:00
Simon Atanasyan	2634a141fd	[mips] Use AltOrders to prevent using odd FP-registers To disable using of odd floating-point registers (O32 ABI and -mno-odd-spreg command line option) such registers and their super-registers added to the set of reserved registers. In general, it works. But there is at least one problem - in case of enabled machine verifier pass some floating-point tests failed because live ranges of register units that are reserved is not empty and verification pass failed with "Live segment doesn't end at a valid instruction" error message. There is D35985 patch which tries to solve the problem by explicit removing of register units. This solution did not get approval. I would like to use another approach for prevent using odd floating point registers - define `AltOrders` and `AltOrderSelect` for MIPS floating point register classes. Such `AltOrders` contains reduced set of registers. At first glance, such solution does not break any test cases and allows enabling machine instruction verification for all MIPS test cases. Differential Revision: http://reviews.llvm.org/D59799 llvm-svn: 357472	2019-04-02 13:57:32 +00:00
Alex Bradbury	f8078f6b1d	[RISCV] Support assembling @plt symbol operands This patch allows symbols appended with @plt to parse and assemble with the R_RISCV_CALL_PLT relocation. Differential Revision: https://reviews.llvm.org/D55335 Patch by Lewis Revill. llvm-svn: 357470	2019-04-02 12:47:20 +00:00
Sander de Smalen	7f23e0a62f	Enforce StackID definition in PEI There are various places in LLVM where the definition of StackID is not properly honoured, for example in PEI where objects with a StackID > 0 are allocated on the default stack (StackID0). This patch enforces that PEI only considers allocating objects to StackID 0. Reviewers: arsenm, thegameg, MatzeB Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60062 llvm-svn: 357460	2019-04-02 09:46:52 +00:00
Craig Topper	c5903c935c	[X86] Use unsigned type for opcodes throughout X86FixupLEAs. All of the interfaces related to opcode in MachineInstr and MCInstrInfo refer to opcodes as unsigned. llvm-svn: 357444	2019-04-02 00:50:58 +00:00
Eli Friedman	3813fe0bda	[ARM] Optimize expressions like "return x != 0;" for Thumb1. There's an existing optimization for x != C, but somehow it was missing a special case for 0. While I'm here, also cleaned up the code/comments a bit: the second value produced by the MERGE_VALUES was actually dead, since a CMOV only produces one result. Differential Revision: https://reviews.llvm.org/D59616 llvm-svn: 357437	2019-04-02 00:01:23 +00:00
Eli Friedman	73af6ef2e7	[ARM] Don't try to create "push {r12, lr}" in Thumb1 at -Oz. It's a little tricky to make this issue show up because prologue/epilogue emission normally likes to push at least two registers... but it doesn't when lr is force-spilled due to function length. Not sure if that really makes sense, but I decided not to touch it for now. Differential Revision: https://reviews.llvm.org/D59385 llvm-svn: 357436	2019-04-01 23:55:57 +00:00
Jessica Paquette	e44c20a68d	[AArch64][GlobalISe] Select STRQui for stores into v264s instead of scalarizing This improves selection for vector stores into v2s64s. Before we just scalarized them, but we can just use a STRQui instead. Differential Revision: https://reviews.llvm.org/D60083 llvm-svn: 357432	2019-04-01 22:19:13 +00:00
Craig Topper	4307172b84	[X86] Classify the AVX512 rounding control operand as X86::OPERAND_ROUNDING_CONTROL instead of MCOI::OPERAND_IMMEDIATE. Add an assert on legal values of rounding control in the encoder and remove an explicit mask. This should allow llvm-exegesis to intelligently constrain the rounding mode. The mask in the encoder shouldn't be necessary any more. We used to allow codegen to use 8-11 for rounding mode and the assembler would use 0-3 to mean the same thing so we masked here and in the printer. Codegen now matches the assembler and the printer was updated, but I forgot to update the encoder. llvm-svn: 357419	2019-04-01 19:08:15 +00:00
Bixia Zheng	6c21ccd245	[NVPTX] Fix the codegen for llvm.round. Summary: Previously, we translate llvm.round to PTX cvt.rni, which rounds to the even interger when the source is equidistant between two integers. This is not correct as llvm.round should round away from zero. This change replaces llvm.round with a round away from zero implementation through target specific custom lowering. Modify a few affected tests to not check for cvt.rni. Instead, we check for the use of a few constants used in implementing round. We are also adding CUDA runnable tests to check for the values produced by llvm.round to test-suites/External/CUDA. Reviewers: tra Subscribers: jholewinski, sanjoy, jlebar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59947 llvm-svn: 357407	2019-04-01 16:10:26 +00:00
Neil Henning	0a30f33ce2	[AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure. This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactive lanes. Previously, the SIFixWWMLiveness pass would work out which registers were being trashed within WWM regions, and ensure that the register allocator did not have any values it was depending on resident in those registers if the WWM section would trash them. This worked perfectly well, but would cause sometimes severe register pressure when the WWM section resided before divergent control flow (or at least that is where I mostly observed it). This fix instead runs through the WWM sections and pre allocates some registers for WWM. It then reserves these registers so that the register allocator cannot use them. This results in a significant register saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just this change!). Differential Revision: https://reviews.llvm.org/D59295 llvm-svn: 357400	2019-04-01 15:19:52 +00:00
David Spickett	3d233d5d4d	[AArch64] Add v8.5-a Memory Tagging STZGM instruction This instruction writes a block of allocation tags and stores zero to the associated data locations. It differs from STGM by 1 bit and has the same arguments. The specification can be found here: https://developer.arm.com/docs/ddi0596/c Differential Revision: https://reviews.llvm.org/D60065 llvm-svn: 357397	2019-04-01 14:56:37 +00:00
Alex Bradbury	44668ae7c7	[RISCV] Attach VK_RISCV_CALL to symbols upon creation This patch replaces the addition of VK_RISCV_CALL in RISCVMCCodeEmitter by creating the RISCVMCExpr when tail/call are parsed, or in the codegen case when the callee symbols are created. This required adding a new CallSymbol operand to allow only adding VK_RISCV_CALL to tail/call instructions. This patch will allow further expansion of parsing and codegen to easily include PLT symbols which must generate the R_RISCV_CALL_PLT relocation. Differential Revision: https://reviews.llvm.org/D55560 Patch by Lewis Revill. llvm-svn: 357396	2019-04-01 14:53:17 +00:00
David Spickett	9142b8ef1b	[AArch64] Add v8.5-a Memory Tagging STGM/LDGM instructions The STGV/LDGV instructions were replaced with STGM/LDGM. The encodings remain the same but there is no longer writeback so there are no unpredictable encodings to check for. The specfication can be found here: https://developer.arm.com/docs/ddi0596/c Differential Revision: https://reviews.llvm.org/D60064 llvm-svn: 357395	2019-04-01 14:52:18 +00:00
Alex Bradbury	da20f5ca74	[RISCV] Generate address sequences suitable for mcmodel=medium This patch adds an implementation of a PC-relative addressing sequence to be used when -mcmodel=medium is specified. With absolute addressing, a 'medium' codemodel may cause addresses to be out of range. This is because while 'medium' implies a 2 GiB addressing range, this 2 GiB can be at any offset as opposed to 'small', which implies the first 2 GiB only. Note that LLVM/Clang currently specifies code models differently to GCC, where small and medium imply the same functionality as GCC's medlow and medany respectively. Differential Revision: https://reviews.llvm.org/D54143 Patch by Lewis Revill. llvm-svn: 357393	2019-04-01 14:42:56 +00:00
David Spickett	efe376add6	[AArch64] Add v8.5-a Memory Tagging GMID_EL1 register The latest version of the MTE spec added a system register 'GMID_EL1'. It contains the block size used by the LDGM and STGM instructions and is read only. The specification can be found here: https://developer.arm.com/docs/ddi0596/c llvm-svn: 357392	2019-04-01 14:41:14 +00:00
Matt Arsenault	ebf90db084	X86: Fix override warning llvm-svn: 357388	2019-04-01 14:08:26 +00:00
Clement Courbet	7e062c9b1f	[X86] Make post-ra scheduling macrofusion-aware. Subscribers: MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59688 llvm-svn: 357384	2019-04-01 13:48:50 +00:00
Luis Marques	3091884e25	[RISCV] Add seto pattern expansion Adds a `seto` pattern expansion. Without it the lowerings of `fcmp one` and `fcmp ord` would be inefficient due to an unoptimized double negation. Differential Revision: https://reviews.llvm.org/D59699 llvm-svn: 357378	2019-04-01 09:54:14 +00:00
Craig Topper	2e1bf89e3a	[X86] Use ISD::INTRINSIC_VOID in getTgtMemIntrinsic for truncating stores and scatter intrinsics. This is the appropriate opcode for only having a chain output. Though I'm not sure it matters much. llvm-svn: 357375	2019-04-01 05:26:12 +00:00
Alex Bradbury	ca81a56f65	[RISCV] Don't evaluatePCRelLo if a relocation will be forced (e.g. due to linker relaxation) A pcrel_lo will point to the associated pcrel_hi fixup which in turn points to the real target. RISCVMCExpr::evaluatePCRelLo will work around this indirection in order to allow the fixup to be evaluate properly. However, if relocations are forced (e.g. due to linker relaxation is enabled) then its evaluation is undesired and will result in a relocation with the wrong target. This patch modifies evaluatePCRelLo so it will not try to evaluate if the fixup will be forced as a relocation. A new helper method is added to RISCVAsmBackend to query this. Differential Revision: https://reviews.llvm.org/D59686 llvm-svn: 357374	2019-04-01 02:38:27 +00:00
Sanjay Patel	e1bc360fc6	[x86] allow movmsk with 2-element reductions One motivation for making this change is that the lack of using movmsk is likely a main source of perf difference between clang and gcc on the C-Ray benchmark as shown here: https://www.phoronix.com/scan.php?page=article&item=gcc-clang-2019&num=5 ...but this change alone isn't enough to solve that problem. The 'all-of' examples show what is likely the worst case trade-off: we end up with an extra instruction (or 2 if we count the 'xor' register clearing). The 'any-of' examples look clearly better using movmsk because we've traded 2 vector instructions for 2 scalar instructions, and movmsk may have better timing than the generic 'movq'. If we examine the llvm-mca output for these cases, it appears that even though the 'all-of' movmsk variant looks worse on paper, it would perform better on both Haswell and Jaguar. $ llvm-mca -mcpu=haswell no_movmsk.s -timeline Iterations: 100 Instructions: 400 Total Cycles: 504 Total uOps: 400 Dispatch Width: 4 uOps Per Cycle: 0.79 IPC: 0.79 Block RThroughput: 1.0 $ llvm-mca -mcpu=haswell movmsk.s -timeline Iterations: 100 Instructions: 600 Total Cycles: 358 Total uOps: 600 Dispatch Width: 4 uOps Per Cycle: 1.68 IPC: 1.68 Block RThroughput: 1.5 $ llvm-mca -mcpu=btver2 no_movmsk.s -timeline Iterations: 100 Instructions: 400 Total Cycles: 407 Total uOps: 400 Dispatch Width: 2 uOps Per Cycle: 0.98 IPC: 0.98 Block RThroughput: 2.0 $ llvm-mca -mcpu=btver2 movmsk.s -timeline Iterations: 100 Instructions: 600 Total Cycles: 311 Total uOps: 600 Dispatch Width: 2 uOps Per Cycle: 1.93 IPC: 1.93 Block RThroughput: 3.0 Finally, there may be CPUs where movmsk is horribly slow (old AMD small cores?), but if that's true, then we're also almost certainly making the wrong transform already for reductions with >2 elements, so that should be fixed independently. Differential Revision: https://reviews.llvm.org/D59997 llvm-svn: 357367	2019-03-31 15:11:34 +00:00
Liang Zou	9f4a4d3974	fix typo: "\t" => " " Reviewers: llvm.org, Jim Reviewed By: Jim Subscribers: arsenm, jvesely, nhaehnle, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59983 llvm-svn: 357365	2019-03-31 14:49:00 +00:00
Craig Topper	e4a0fc7d75	[X86] Teach isel for RMW binops to handle negate Negate updates flags like a subtract. We should be able to use the flags from the RMW form of negate when we have (store (X86ISD::SUB 0, load A), A) Differential Revision: https://reviews.llvm.org/D60007 llvm-svn: 357353	2019-03-30 18:59:17 +00:00
Alex Bradbury	0b2803ee65	[RISCV] Add codegen support for ilp32f, ilp32d, lp64f, and lp64d ("hard float") ABIs This patch adds support for the RISC-V hard float ABIs, building on top of rL355771, which added basic target-abi parsing and MC layer support. It also builds on some re-organisations and expansion of the upstream ABI and calling convention tests which were recently committed directly upstream. A number of aspects of the RISC-V float hard float ABIs require frontend support (e.g. flattening of structs and passing int+fp for fp+fp structs in a pair of registers), and will be addressed in a Clang patch. As can be seen from the tests, it would be worthwhile extending RISCVMergeBaseOffsets to handle constant pool as well as global accesses. Differential Revision: https://reviews.llvm.org/D59357 llvm-svn: 357352	2019-03-30 17:59:30 +00:00
Simon Pilgrim	10c9032c02	[X86][SSE] detectAVGPattern - Match zext(or(x,y)) 'add like' patterns (PR41316) Fixes PR41316 where the expanded PAVG intrinsic had had one of its ADDs turned into an OR due to its operands having no conflicting bits. llvm-svn: 357351	2019-03-30 17:12:29 +00:00
Simon Pilgrim	3293455595	[X86][SSE] detectAVGPattern - begin generalizing ADD matches Move the ADD matching into a helper - first NFC stage towards supporting 'ADD like' cases such as in PR41316 llvm-svn: 357349	2019-03-30 15:31:53 +00:00
Heejin Ahn	c4ac74fb49	[WebAssembly] Fix unwind destination mismatches in CFG stackify Summary: Linearing the control flow by placing `try`/`end_try` markers can create mismatches in unwind destinations. This patch resolves these mismatches by wrapping those instructions with an incorrect unwind destination with a nested `try`/`catch`/`end_try` and branching to the right destination within the new catch block. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, chrib, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D48345 llvm-svn: 357343	2019-03-30 11:04:48 +00:00
Heejin Ahn	e9fd9073e4	[WebAssembly] Run ExplicitLocals pass after CFGStackify Summary: While this does not change any final output, this will greatly simplify ixing unwind destination mismatches in CFGStackify (D48345), because we have to create some new registers there. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59652 llvm-svn: 357342	2019-03-30 09:29:57 +00:00
Alex Bradbury	9681b01c21	[RISCV] Add DAGCombine for (SplitF64 (ConstantFP x)) The SplitF64 node is used on RV32D to convert an f64 directly to a pair of i32 (necessary as bitcasting to i64 isn't legal). When performed on a ConstantFP, this will result in a FP load from the constant pool followed by a store to the stack and two integer loads from the stack (necessary as there is no way to directly move between f64 FPRs and i32 GPRs on RV32D). It's always cheaper to just materialise integers for the lo and hi parts of the FP constant, so do that instead. llvm-svn: 357341	2019-03-30 09:15:47 +00:00
Heejin Ahn	7e7aad1510	[WebAssembly] Optimize the number of routing blocks in FixIrreducibleCFG Summary: Currently we create a routing block to the dispatch block for every predecessor of every entry. So the total number of routing blocks created will be (# of preds) * (# of entries). But we don't need to do this: we need at most 2 routing blocks per loop entry, one for when the predecessor is inside the loop and one for it is outside the loop. (We can't merge these into one because this will creates another loop cycle between blocks inside and blocks outside) This patch fixes this and creates at most 2 routing blocks per entry. This also renames variable `Split` to `Routing`, which I think is a bit clearer. Reviewers: kripken Subscribers: sunfish, dschuff, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59462 llvm-svn: 357337	2019-03-30 01:31:11 +00:00
Thomas Lively	5f0c4c67bb	[WebAssembly] Add mutable globals feature Summary: This feature is not actually used for anything in the WebAssembly backend, but adding it allows users to get it into the target features sections of their objects, which makes these objects future-compatible. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jdoerfert, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60013 llvm-svn: 357321	2019-03-29 22:00:18 +00:00
Jessica Paquette	d3ffd47df9	[GlobalISel][AArch64] Add isel support for G_INSERT_VECTOR_ELT on v2s32s This adds support for v2s32 vector inserts, and updates the selection + regbankselect tests for G_INSERT_VECTOR_ELT. Differential Revision: https://reviews.llvm.org/D59910 llvm-svn: 357318	2019-03-29 21:39:36 +00:00
Amara Emerson	d413f41de6	[X86] When using Win64 ABI, exit with error if SSE is disabled for varargs We need XMM registers to handle varargs with the Win64 ABI. Before we would silently generate bad code resulting in an assertion failure elsewhere in the backend. llvm-svn: 357317	2019-03-29 21:30:51 +00:00
Heejin Ahn	67f74aceab	[WebAssembly] Handle END_LOOP in unreachable BB in CFGStackify Summary: This fixes crashes when a BB in which an END_LOOP is to be placed is unreachable and does not have any predecessors. Fixes PR41307. Reviewers: dschuff Subscribers: yurydelendik, sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60004 llvm-svn: 357303	2019-03-29 19:36:51 +00:00
Matt Arsenault	055e4dce45	AMDGPU: Remove dx10-clamp from subtarget features Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired. Also introduce a new amdgpu-ieee attribute to match. The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td. Eventually the calling convention lowering will need to insert a mode switch somewhere for these. llvm-svn: 357302	2019-03-29 19:14:54 +00:00
Craig Topper	4ccb3b96b6	[X86] Use cached OptForSize in X86ISelDAGToDAG.cpp instead of pulling it from the function attribute. NFCI llvm-svn: 357297	2019-03-29 18:36:40 +00:00
Evandro Menezes	0f797b8732	[CodeGen] Refactor the option for the maximum jump table size Refactor the option `max-jump-table-size` to default to the maximum representable number. Essentially, NFC. llvm-svn: 357280	2019-03-29 17:28:11 +00:00
Simon Atanasyan	f26f56d6d3	[mips] Fix lowering a signed immediate for *.d MSA instructions The `lowerMSASplatImm` function zero-extends `i32` immediates while building constant. If target type is `i64`, negative immediate loses the sign. As a result, for example `__builtin_msa_ldi_d(-1)` lowered to series of instruction loads incorrect value 0xffffffff to the `$w0` register instead of single `ldi.d $w0, -1` instruction. The fix zero-extends unsigned immediates and signed-extend signed immediates. Differential Revision: http://reviews.llvm.org/D59884 llvm-svn: 357264	2019-03-29 15:15:22 +00:00
Dmitry Preobrazhensky	d6827ce3a3	[AMDGPU][MC] Corrected conversion rules for inlinable constants to match rules for literals See bug 40806: https://bugs.llvm.org/show_bug.cgi?id=40806 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59786 llvm-svn: 357262	2019-03-29 14:50:20 +00:00
Dmitry Preobrazhensky	7f33574be3	[AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodes See bug 40917: https://bugs.llvm.org/show_bug.cgi?id=40917 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59878 llvm-svn: 357249	2019-03-29 12:16:04 +00:00
Konstantin Zhuravlyov	2b766ed774	AMDGPU: Make sram-ecc off by default for Vega20 Differential Revision: https://reviews.llvm.org/D59718 llvm-svn: 357247	2019-03-29 12:04:18 +00:00
Simon Pilgrim	aeaf7fcdde	[X86] Add X86TargetLowering::isCommutativeBinOp override. We currently just have test coverage for PMULUDQ - will add more in the future. llvm-svn: 357244	2019-03-29 11:25:58 +00:00
Kang Zhang	05f78b35ae	[PowerPC] Add the support for __builtin_setrnd() Summary: PowerPC64/PowerPC64le supports the builtin function __builtin_setrnd to set the floating point rounding mode. This function will use the least significant two bits of integer argument to set the floating point rounding mode. double __builtin_setrnd(int mode); The effective values for mode are: 0 - round to nearest 1 - round to zero 2 - round to +infinity 3 - round to -infinity Note that the mode argument will modulo 4, so if the int argument is greater than 3, it will only use the least significant two bits of the mode. Namely, builtin_setrnd(102)) is equal to builtin_setrnd(2). Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D59405 llvm-svn: 357241	2019-03-29 08:45:24 +00:00
Clement Courbet	b70355f0b4	[ScheduleDAG] Move `Topo` and `addEdge` to base class. Some DAG mutations can only be applied to `ScheduleDAGMI`, and have to internally cast a `ScheduleDAGInstrs` to `ScheduleDAGMI`. There is nothing actually specific to `ScheduleDAGMI` in `Topo`. llvm-svn: 357239	2019-03-29 08:33:05 +00:00
Matt Arsenault	5fddf09187	AMDGPU/GlobalISel: Insert waterfall loop for vector indexing The register index can only really be an SGPR. Lie that a VGPR index is legal, and then rewrite the instruction in a waterfall loop to handle the index. llvm-svn: 357235	2019-03-29 03:54:56 +00:00
Zi Xuan Wu	1445b77e8c	[PowerPC] Strength reduction of multiply by a constant by shift and add/sub in place A shift and add/sub sequence combination is faster in place of a multiply by constant. Because the cycle or latency of multiply is not huge, we only consider such following worthy patterns. ``` (mul x, 2^N + 1) => (add (shl x, N), x) (mul x, -(2^N + 1)) => -(add (shl x, N), x) (mul x, 2^N - 1) => (sub (shl x, N), x) (mul x, -(2^N - 1)) => (sub x, (shl x, N)) ``` And the cycles or latency is subtarget-dependent so that we need consider the subtarget to determine to do or not do such transformation. Also data type is considered for different cycles or latency to do multiply. Differential Revision: https://reviews.llvm.org/D58950 llvm-svn: 357233	2019-03-29 03:08:39 +00:00
Thomas Lively	3f34e1b883	[WebAssembly] Merge used feature sets, update atomics linkage policy Summary: It does not currently make sense to use WebAssembly features in some functions but not others, so this CL adds an IR pass that takes the union of all used feature sets and applies it to each function in the module. This allows us to prevent atomics from being lowered away if some function has opted in to using them. When atomics is not enabled anywhere, we detect whether there exists any atomic operations or thread local storage that would be stripped and disallow linking with objects that contain atomics if and only if atomics or tls are stripped. When atomics is enabled, mark it as used but do not require it of other objects in the link. These changes allow libraries that do not use atomics to be built once and linked into both single-threaded and multithreaded binaries. Reviewers: aheejin, sbc100, dschuff Subscribers: jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59625 llvm-svn: 357226	2019-03-29 00:14:01 +00:00
Yonghong Song	360a4e2ca6	[BPF] add proper multi-dimensional array support For multi-dimensional array like below int a[2][3]; the previous implementation generates BTF_KIND_ARRAY type like below: . element_type: int . index_type: unsigned int . number of elements: 6 This is not the best way to represent arrays, esp., when converting BTF back to headers and users will see int a[6]; instead. This patch generates proper support for multi-dimensional arrays. For "int a[2][3]", the two BTF_KIND_ARRAY types will be generated: Type #n: . element_type: int . index_type: unsigned int . number of elements: 3 Type #(n+1): . element_type: #n . index_type: unsigned int . number of elements: 2 The linux kernel already supports such a multi-dimensional array representation properly. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D59943 llvm-svn: 357215	2019-03-28 21:59:49 +00:00
Craig Topper	c25c9b4d16	[X86] Teach the isel optimization for (x << C1) op C2 to (x op (C2>>C1)) << C1 to consider cases where C2>>C1 can fit an unsigned 32-bit immediate For 64-bit operations we should consider if the immediate can be made to fit in an unsigned 32-bits immedate. For OR/XOR this allows us to load the immediate with MOV32ri instead of movabsq. For AND this allows us to fold the immediate. Differential Revision: https://reviews.llvm.org/D59867 llvm-svn: 357196	2019-03-28 18:05:37 +00:00
Reid Kleckner	85e2cdac73	Delay initialization of three static global maps, NFC This avoids allocating a few KB of heap memory on startup, and instead allocates these maps lazily. I noticed this while profiling LLD. llvm-svn: 357192	2019-03-28 17:33:41 +00:00
Petar Avramovic	1af05df3de	[MIPS GlobalISel] Select float constants Select 32 and 64 bit float constants for MIPS32. Differential Revision: https://reviews.llvm.org/D59933 llvm-svn: 357183	2019-03-28 16:58:12 +00:00
Sanjay Patel	5bbf6f0bd8	[x86] avoid cmov in movmsk reduction This is probably the least important of our movmsk problems, but I'm starting at the bottom to reduce distractions. We were creating a select_cc which bypasses the select and bitmask codegen optimizations that we have now. If we produce a compare+negate instead, we allow things like neg/sbb carry bit hacks, and in all cases we avoid a cmov. There's no partial register update danger in these sequences because we always produce the zero-register xor ahead of the 'set' if needed. There seems to be a missing fold for sext of a bool bit here: negl %ecx movslq %ecx, %rax ...but that's an independent transform. Differential Revision: https://reviews.llvm.org/D59818 llvm-svn: 357172	2019-03-28 14:16:13 +00:00
Clement Courbet	699dc025a6	[X86MacroFusion] Handle branch fusion (AMD CPUs). Summary: This adds a BranchFusion feature to replace the usage of the MacroFusion for AMD CPUs. See D59688 for context. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59872 llvm-svn: 357171	2019-03-28 14:12:46 +00:00
Matt Arsenault	a353fd572a	AMDGPU: Make exec mask optimzations more resistant to block splits Also improve the check for SALU instructions to also ignore implicit_def and other fake instructions. llvm-svn: 357170	2019-03-28 14:01:39 +00:00
Roman Lebedev	c325be6cef	[X86] AMD Piledriver (BdVer2): fine-tune some latencies Based on llvm-exegesis measurements. Now that llvm-exegesis is ~2 magnitudes faster, and is a bit smarter, it is now possible to continue cleanup of the scheduler model. With this, there are no more latency inconsistencies for the opcodes that produce stable measurements, and only a few inconsistencies for unstable measurements (MMX_* opcodes, opcodes that llvm-exegesis measures by chaining - CMP, TEST, BT, SETcc, CVT, MOV, etc.) llvm-svn: 357169	2019-03-28 13:40:34 +00:00
Clement Courbet	54c95e5172	[NFC] Format InlineFeatureIgnoreList. To avoid more spurious clang-format changes when adding features (D59872). llvm-svn: 357168	2019-03-28 13:38:58 +00:00
Simon Pilgrim	22be913ac0	[X85][AVX] Add missing vXi16 broadcast fold patterns Now that D59484 has landed its easier to add these. Added missing AVX512BW v32i16 equivalents while I was at it. llvm-svn: 357155	2019-03-28 10:25:13 +00:00
Diana Picus	52495c472f	[ARM GlobalISel] Fix G_STORE with s1 G_STORE for 1-bit values uses a STRBi12, which stores the whole byte. Zero out the undefined bits before writing. llvm-svn: 357154	2019-03-28 09:09:36 +00:00
Diana Picus	4d512df300	[ARM GlobalISel] Fix selection of G_SELECT G_SELECT uses a 1-bit scalar for the condition, and is currently implemented with a plain CMPri against 0. This means that values such as 0x1110 are interpreted as true, when instead the higher bits should be treated as undefined and therefore ignored. Replace the CMPri with a TSTri against 0x1, which performs an implicit AND, yielding the expected result. llvm-svn: 357153	2019-03-28 09:09:27 +00:00
Sam Clegg	a5e175c60c	[WebAssembly] Rename wasm fixup kinds These fixup kinds are not explicitly related to the code section. They are there to signal how to apply the fixup. Also, a couple of other minor wasm cleanups. Differential Revision: https://reviews.llvm.org/D59908 llvm-svn: 357145	2019-03-28 02:07:28 +00:00
Sam Clegg	b2978c0203	[ARM] Remove dead function ARMMCCodeEmitter::getSOImmOpValue The last reference to this function was removed from the ARM td files in 2015 in rL225266. Differential Revision: https://reviews.llvm.org/D59868 llvm-svn: 357130	2019-03-27 23:00:12 +00:00
Sanjay Patel	1df0bb6264	[x86] improve AVX lowering of vector zext If we know the 2 halves of an oversized zext-in-reg are the same, don't create those halves independently. I tried several different approaches to fold this, but it's difficult to get right during legalization. In the default path, we are creating a generic shuffle that looks like an unpack high, but it can get transformed into a different mask (a blend), so it's not straightforward to match that. If we try to fold after it actually becomes an X86ISD::UNPCKH node, we can't be sure what the operand node is - it might be a generic shuffle, or it could be some x86-specific op. From the test output, we should be doing something like this for SSE4.1 as well, but I'd rather leave that as a follow-up since it involves changing lowering actions. Differential Revision: https://reviews.llvm.org/D59777 llvm-svn: 357129	2019-03-27 22:42:11 +00:00
Sanjay Patel	704817912a	[x86] look through bitcast operand of MOVMSK This is not exactly NFC because it should make further combines of MOVMSK easier to match, but there should be no outward differences because we have isel patterns in place specifically to allow this. See: // Also support integer VTs to avoid a int->fp bitcast in the DAG. llvm-svn: 357128	2019-03-27 22:24:03 +00:00
Craig Topper	4bc38cfe29	[X86ISelDAGToDAG] Move initialization of OptForSize and OptForMinSize from PreprocessISelDAG to runOnMachineFunction. NFCI This makes more sense as a place to initialize these. I don't think runOnMachineFunction was overriden when these cached values were originally created. llvm-svn: 357123	2019-03-27 21:05:07 +00:00
Justin Bogner	b1650f0da9	[LegalizeVectorTypes] Allow single loads and stores for more short vectors When lowering a load or store for TypeWidenVector, the type legalizer would use a single load or store if the associated integer type was legal or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable. (See https://reviews.llvm.org/rL236528 for reference.) This applies that behaviour to vector types. If the vector type is TypePromoteInteger, the element type is going to be TypePromoteInteger as well, which will lead to have a single promoting load rather than N individual promoting loads. For instance, if we have a v3i1, we would now have a load of v4i1 instead of 3 loads of i1. Patch by Guillaume Marques. Thanks! Differential Revision: https://reviews.llvm.org/D56201 llvm-svn: 357120	2019-03-27 20:35:56 +00:00
Alon Zakai	e9e01cc73a	[WebAssembly] Add some whitespace to WebAssemblyFixIrreducibleControlFlow Differential Revision: https://reviews.llvm.org/D59855 modified: llvm/lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp llvm-svn: 357117	2019-03-27 20:12:42 +00:00
Eli Friedman	c388bfa230	[ARM] Don't confuse the scheduler for very large VLDMDIA etc. ARMBaseInstrInfo::getNumLDMAddresses is making bad assumptions about the memory operands of load and store-multiple operations. This doesn't really fix the problem properly, but it's enough to prevent crashing, at least. Fixes https://bugs.llvm.org/show_bug.cgi?id=41231 . Differential Revision: https://reviews.llvm.org/D59834 llvm-svn: 357109	2019-03-27 18:33:30 +00:00
Amara Emerson	8a02aea6fc	[AArch64][GlobalISel] Make G_PHI of v2s64, v4s32, v2s32 legal. llvm-svn: 357108	2019-03-27 18:31:46 +00:00
Matt Arsenault	7b14b2425d	Reapply "AMDGPU: Scavenge register instead of findUnusedReg" This reapplies r356149, using the correct overload of findUnusedReg which passes the current iterator. This worked most of the time, because the scavenger iterator was moved at the end of the frame index loop in PEI. This would fail if the spill was the first instruction. This was further hidden by the fact that the scavenger wasn't passed in for normal frame index elimination. llvm-svn: 357098	2019-03-27 17:31:29 +00:00
Craig Topper	7c9afc35bc	[X86] Add post-isel pseudos for rotate by immediate using SHLD/SHRD Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD. When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization. This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work. Fixes PR41055 Differential Revision: https://reviews.llvm.org/D59391 llvm-svn: 357096	2019-03-27 17:29:34 +00:00
Sander de Smalen	e1eab42f65	[AArch64][SVE] Asm: error on unexpected SVE vector register type suffix This patch fixes an assembler bug that allowed SVE vector registers to contain a type suffix when not expected. The SVE unpredicated movprfx instruction is the only instruction affected. The following are examples of what was previously valid: movprfx z0.b, z0.b movprfx z0.b, z0.s movprfx z0, z0.s These instructions are now erroneous. Patch by Cullen Rhodes (c-rhodes) Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D59636 llvm-svn: 357094	2019-03-27 17:23:38 +00:00
Matt Arsenault	17e39100a2	AMDGPU: Enable the scavenger for large frames Another test is needed for the case where the scavenge fail, but there's another issue with that which needs an additional fix. llvm-svn: 357093	2019-03-27 17:14:32 +00:00
Matt Arsenault	4d47ac3b30	AMDGPU: Add additional MIR tests for exec mask optimizations Also includes one example of how this transform is unsound. This isn't verifying the copies are used in the control flow intrinisic patterns. Also add option to disable exec mask opt pass. Since this pass is unsound, it may be useful to turn it off until it is fixed. llvm-svn: 357091	2019-03-27 16:58:30 +00:00
Matt Arsenault	4ab28b64b4	AMDGPU: Skip debug_instr when collapsing end_cf Based on how these are inserted, I doubt this was causing a problem in practice. llvm-svn: 357090	2019-03-27 16:58:27 +00:00
Matt Arsenault	a42b7247d3	AMDGPU: Fix missing scc implicit def on s_andn2_b64_term Introduce new helper class to copy properties directly from the base instruction. llvm-svn: 357089	2019-03-27 16:58:22 +00:00
Matt Arsenault	28f97f1dbc	AMDGPU: Don't hardcode num defs for MUBUF instructions This shouldn't change anything since the no-ret atomics are selected later. llvm-svn: 357084	2019-03-27 16:12:29 +00:00
Matt Arsenault	e9ad7e9a71	AMDGPU: wave_barrier is not isBarrier This is not a control flow instruction, so should not be marked as isBarrier. This fixes a verifier error if followed by unreachable. llvm-svn: 357081	2019-03-27 15:54:45 +00:00
Yonghong Song	6c56edfe42	[BPF] use std::map to ensure consistent output The .BTF.ext FuncInfoTable and LineInfoTable contain information organized per ELF section. Current definition of FuncInfoTable/LineInfoTable is: std::unordered_map<uint32_t, std::vector<BTFFuncInfo>> FuncInfoTable std::unordered_map<uint32_t, std::vector<BTFLineInfo>> LineInfoTable where the key is the section name off in the string table. The unordered_map may cause the order of section output different for different platforms. The same for unordered map definition of std::unordered_map<std::string, std::unique_ptr<BTFKindDataSec>> DataSecEntries where BTF_KIND_DATASEC entries may have different ordering for different platforms. This patch fixed the issue by using std::map. Test static-var-derived-type.ll is modified to generate two DataSec's which will ensure the ordering is the same for all supported platforms. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 357077	2019-03-27 15:45:27 +00:00
Matt Arsenault	bbc59d8d0d	AMDGPU: Fix areLoadsFromSameBasePtr for DS atomics The offset operand index is different for atomics. llvm-svn: 357073	2019-03-27 15:41:00 +00:00
Dmitry Preobrazhensky	40f0162a9a	Revert of 357063 [AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodes Reason: the change was mistakenly committed before review llvm-svn: 357066	2019-03-27 13:49:52 +00:00
Sander de Smalen	90d1b551e1	[AArch64] NFC: Cleanup isAArch64FrameOffsetLegal Cleanup isAArch64FrameOffsetLegal by: - Merging the large switch statement to reuse AArch64InstrInfo::getMemOpInfo(). - Using AArch64InstrInfo::getUnscaledLdSt() to determine whether an instruction has an unscaled variant. - Simplifying the logic that calculates the offset to fit the immediate. Reviewers: paquette, evandro, eli.friedman, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D59636 llvm-svn: 357064	2019-03-27 13:16:19 +00:00
Dmitry Preobrazhensky	bcc4d53835	[AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodes See bug 40917: https://bugs.llvm.org/show_bug.cgi?id=40917 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59305 llvm-svn: 357063	2019-03-27 13:07:41 +00:00
Sander de Smalen	46edefe3c4	[AArch64] Adds cases for LDRSHWui and LDRSHXui to getMemOpInfo This patch also adds cases PRFUMi and PRFMui. This change was discussed in https://reviews.llvm.org/D59635. llvm-svn: 357059	2019-03-27 10:39:03 +00:00
Simon Pilgrim	ccb71b2985	Revert rL356864 : [X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685) Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure. This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain. ........ Causes PR41249 llvm-svn: 357057	2019-03-27 10:25:02 +00:00
Craig Topper	7da7b97487	[X86] When iselling (x << C1) and/or/xor C2 as (x and/or/xor (C2>>C1)) << C1, go through the isel table instead of manually selecting. Previously we manually selected the AND/OR/XOR with immediate and the SHL(or ADD if the shift is 1). But this was missing out on the opportunity to use a 64 bit AND with a 32-bit immediate and possibly other isel tricks we have built into the tables. Instead, insert the new nodes into the DAG using insertDAGNode and allow them each to be selected through the normal table. llvm-svn: 357049	2019-03-27 04:45:58 +00:00
QingShan Zhang	5321dcd608	[NFC][PowerPC] Custom PowerPC specific machine-scheduler This patch lays the groundwork for extending the generic machine scheduler by providing a PPC-specific implementation. There are no functional changes as this is an incremental patch that simply provides the necessary overrides which just encapsulate the behavior of the generic scheduler. Subsequent patches will add specific behavior. Differential Revision: https://reviews.llvm.org/D59284 llvm-svn: 357047	2019-03-27 03:50:16 +00:00
Craig Topper	22387a56fe	[X86] Simplify some code in matchBitExtract by using ANY_EXTEND. We were manually outputting the code we would get from selecting ANY_EXTEND. We can save some code by just letting an ANY_EXTEND go through isel on its own. llvm-svn: 357045	2019-03-27 02:08:03 +00:00
Guozhi Wei	330dcd9dab	[PPC] Refactor PPCBranchSelector.cpp This patch splits the huge function PPCBranchSelector.cpp:runOnMachineFunction into several smaller functions. No functional change. Differential Revision: https://reviews.llvm.org/D59623 llvm-svn: 357033	2019-03-26 21:27:38 +00:00
Stefan Pintilie	e1d79a87c6	[PowerPC] Remove UseVSXReg The UseVSXReg flag can be safely removed and the code cleaned up. Patch By: Yi-Hong Liu Differential Revision: https://reviews.llvm.org/D58685 llvm-svn: 357028	2019-03-26 20:28:21 +00:00
Sam Clegg	492f752969	[WebAssembly] Initial implementation of PIC code generation This change implements lowering of references global symbols in PIC mode. This change implements lowering of global references in PIC mode using a new @GOT reference type. @GOT references can be used with function or data symbol names combined with the get_global instruction. In this case the linker will insert the wasm global that stores the address of the symbol (either in memory for data symbols or in the wasm table for function symbols). For now I'm continuing to use the R_WASM_GLOBAL_INDEX_LEB relocation type for this type of reference which means that this relocation type can refer to either a global or a function or data symbol. We could choose to introduce specific relocation types for GOT entries in the future. See the current dynamic linking proposal: https://github.com/WebAssembly/tool-conventions/blob/master/DynamicLinking.md Differential Revision: https://reviews.llvm.org/D54647 llvm-svn: 357022	2019-03-26 19:46:15 +00:00
Heejin Ahn	54551c1df7	[WebAssembly] Don't analyze branches after CFGStackify Summary: `WebAssembly::analyzeBranch` now does not analyze anything if the function is CFG stackified. We were previously doing similar things by checking if a branch's operand is whether an integer or an MBB, but this failed to bail out when a BB did not have any terminators. Consider this case: ``` bb0: try $label0 call @foo // unwinds to %ehpad bb1: ... br $label0 // jumps to %cont. can be deleted ehpad: catch ... cont: end_try ``` Here `br $label0` will be deleted in CFGStackify's `removeUnnecessaryInstrs` function, because we jump to the %cont block even without the branch. But in this case, MachineVerifier fails to verify this, because `ehpad` is not a successor of `bb1` even if `bb1` does not have any terminators. MachineVerifier incorrectly thinks `bb1` falls through to the next block. This pass now consistently rejects all analysis after CFGStackify whether a BB has terminators or not, also making the MachineVerifier work. (MachineVerifier does not try to verify relationships between BBs if `analyzeBranch` fails, the behavior we want after CFGStackify.) This also adds a new option `-wasm-disable-ehpad-sort` for testing. This option helps create the sorted order we want to test, and without the fix in this patch, the tests in cfg-stackify-eh.ll fail at MachineVerifier with `-wasm-disable-ehpad-sort`. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59740 llvm-svn: 357015	2019-03-26 18:21:20 +00:00
Heejin Ahn	1aaa481fc1	[WebAssembly] Add CFGStacikfied field to WebAssemblyFunctionInfo Summary: This adds `CFGStackified` field and its serialization to WebAssemblyFunctionInfo. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59747 llvm-svn: 357011	2019-03-26 17:46:14 +00:00
Heejin Ahn	52221d56bc	[WebAssembly] Support WebAssemblyFunctionInfo serialization Summary: The framework for supporting target-specific MachineFunctionInfo was added in r356215. This adds serialization support for WebAssemblyFunctionInfo on top of that. This patch only adds the framework and does not actually serialize anything at this point; we have to add YAML mapping later for the fields in WebAssemblyFunctionInfo we want to serialize if necessary. Reviewers: dschuff, arsenm Subscribers: sunfish, wdng, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59737 llvm-svn: 357009	2019-03-26 17:35:35 +00:00
Heejin Ahn	222718fdd2	[WebAssembly] Fix a bug when mixing TRY/LOOP markers Summary: When TRY and LOOP markers are in the same BB and END_TRY and END_LOOP markers are in the same BB, END_TRY should be _before_ END_LOOP, because LOOP is always before TRY if they are in the same BB. (TRY is placed in the latest possible position, whereas LOOP is in the earliest possible position.) Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59751 llvm-svn: 357008	2019-03-26 17:29:55 +00:00
Heejin Ahn	44a5a4b107	[WebAssembly] Fix bugs in BLOCK/TRY placement Summary: Before we placed all TRY/END_TRY markers before placing BLOCK/END_BLOCK markers. This couldn't handle this case: ``` bb0: br bb2 bb1: // nearest common dominator of bb3 and bb4 br_if ... bb3 br bb4 bb2: ... bb3: call @foo // unwinds to ehpad bb4: call @bar // unwinds to ehpad ehpad: catch ... ``` When we placed TRY markers, we placed it in bb1 because it is the nearest common dominator of bb3 and bb4. But because bb0 jumps to bb2, when we placed block markers, we ended up with interleaved scopes like ``` block try end_block catch end_try ``` which was not correct. This patch fixes the bug by placing BLOCK and TRY markers in one pass while iterating BBs in a function. This also adds some more routines to `placeTryMarkers`, because we now have to assume that there can be previously placed BLOCK and END_BLOCK. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59739 llvm-svn: 357007	2019-03-26 17:15:55 +00:00
Jonas Paulsson	8f8c38174e	[SystemZ] Remove LRMux pseudo instruction. This instruction is unused and not needed. Review: Ulrich Weigand. llvm-svn: 356997	2019-03-26 15:13:48 +00:00
Luis Marques	614fd9d830	[RISCV] Improve codegen for icmp {ne,eq} with a constant Adds two patterns to improve the codegen of GPR value comparisons with small constants. Instead of first loading the constant into another register and then doing an XOR of those registers, these patterns directly use the constant as an XORI immediate. llvm-svn: 356990	2019-03-26 12:55:00 +00:00
Oliver Stannard	5c90238479	[ARM][Asm] Accept upper case coprocessor number and registers Differential revision: https://reviews.llvm.org/D59760 llvm-svn: 356984	2019-03-26 10:24:03 +00:00
Craig Topper	4dcabf8ddf	[X86] In matchBitExtract, place all of the new nodes before Node's position in the DAG for the topological sort. We were using OrigNBits, but that put all the nodes before the node we used to start the control computation. This caused some node earlier than the sequence we inserted to be selected before the sequence we created. We want our new sequence to be selected first since it depends on OrigNBits. I don't have a test case. Found by reviewing the code. llvm-svn: 356979	2019-03-26 05:31:32 +00:00
Craig Topper	10576fea82	[X86] In matchBitExtract, if we need to truncate the BEXTR make sure we put the BEXTR at Node's position in the DAG for the topological sort. We were using OrigNBits, but that doesn't guarantee that it will be selected before the nodes that make up X. llvm-svn: 356978	2019-03-26 05:12:23 +00:00
Craig Topper	795ebe3bff	[X86] Remove unneeded FIXME. NFC We do fold loads right below this. llvm-svn: 356977	2019-03-26 05:12:21 +00:00
Craig Topper	fd880d30b1	X86Parser: Fix potential reference to deleted object Within the MatchFPUWaitAlias function, Operands[0] is potentially overwritten leading to &Op referencing a deleted object. To fix this, assign the reference after the function. Differential Revision: https://reviews.llvm.org/D57376 llvm-svn: 356973	2019-03-26 03:12:43 +00:00
Craig Topper	3dce29b8e9	X86AsmParser: Do not process a non-existent token This error can only happen if an unfinished operation is at Eof. Patch by Brandon Jones Differential Revision: https://reviews.llvm.org/D57379 llvm-svn: 356972	2019-03-26 03:12:41 +00:00
Eli Friedman	1e5d569c8c	[ARM] Add missing memory operands to a bunch of instructions. This should hopefully lead to minor improvements in code generation, and more accurate spill/reload comments in assembly. Also fix isLoadFromStackSlotPostFE/isStoreToStackSlotPostFE so they don't lead to misleading assembly comments for merged memory operands; this is technically orthogonal, but in practice the relevant memory operand lists don't show up without this change. Differential Revision: https://reviews.llvm.org/D59713 llvm-svn: 356963	2019-03-25 22:42:30 +00:00
Matt Arsenault	8bbc159786	Revert "AMDGPU: Scavenge register instead of findUnusedReg" This reverts r356149. This is crashing on rocBLAS. llvm-svn: 356958	2019-03-25 21:41:40 +00:00
Matt Arsenault	77bf2e3704	AMDGPU: Remove unnecessary check for isFullCopy Subregister indexes are not used for physical register operands, so isFullCopy is implied by the physical register check. llvm-svn: 356956	2019-03-25 21:28:53 +00:00
Eli Friedman	92d0d13366	[AArch64] Prefer "mov" over "orr" to materialize constants. This is generally more readable due to the way the assembler aliases work. (This causes a lot of test changes, but it's not really as scary as it looks at first glance; it's just mechanically changing a bunch of checks for orr to check for mov instead.) Differential Revision: https://reviews.llvm.org/D59720 llvm-svn: 356954	2019-03-25 21:25:28 +00:00
Matt Arsenault	bc978872de	AMDGPU: Set hasSideEffects 0 on _term instructions These were defaulting to true, but they are just wrappers around bit operations. This avoids regressions in the exec mask optimization passes in a future commit. llvm-svn: 356952	2019-03-25 21:10:12 +00:00
Konstantin Zhuravlyov	51809cbc98	AMDGPU: Add support for cross address space synchronization scopes Differential Revision: https://reviews.llvm.org/D59517 llvm-svn: 356946	2019-03-25 20:50:21 +00:00
Matt Arsenault	fa28455116	AMDGPU: Preserve LiveIntervals in WQM This seems to already be done, but wasn't marked. llvm-svn: 356922	2019-03-25 16:47:42 +00:00
Petar Avramovic	a034a64f84	[MIPS GlobalISel] Select copy for arguments from FPRBRegBank Move selectCopy into MipsInstructionSelector class. Select copy for arguments from FPRBRegBank for MIPS32. Differential Revision: https://reviews.llvm.org/D59644 llvm-svn: 356886	2019-03-25 11:38:06 +00:00
Petar Avramovic	3dfa368d5d	[MIPS GlobalISel] Add floating point register bank Add floating point register bank for MIPS32. Implement getRegBankFromRegClass for float register classes. Differential Revision: https://reviews.llvm.org/D59643 llvm-svn: 356883	2019-03-25 11:30:46 +00:00
Petar Avramovic	5a457e08f6	[MIPS GlobalISel] Lower float and double arguments in registers Lower float and double arguments in registers for MIPS32. When float/double argument is passed through gpr registers select appropriate move instruction. Differential Revision: https://reviews.llvm.org/D59642 llvm-svn: 356882	2019-03-25 11:23:41 +00:00
Diana Picus	254b11a0fd	[ARM GlobalISel] 64-bit memops should be aligned We currently use only VLDR/VSTR for all 64-bit loads/stores, so the memory operands must be word-aligned. Mark aligned operations as legal and narrow non-aligned ones to 32 bits. While we're here, also mark non-power-of-2 loads/stores as unsupported. llvm-svn: 356872	2019-03-25 08:54:29 +00:00
Craig Topper	a17287f084	[X86] Update some of the getMachineNode calls from X86ISelDAGToDAG to also include a VT for a EFLAGS result. This makes the nodes consistent with how they would be emitted from the isel table. llvm-svn: 356870	2019-03-25 07:22:18 +00:00
Craig Topper	1cc01c3228	[X86] When selecting (x << C1) op C2 as (x op (C2>>C1)) << C1, use the operation VT for the target constant. Normally when the nodes we use here(AND32ri8 for example) are selected their immediates are just converted from ConstantSDNode to TargetConstantSDNode without changing VT from the original operation VT. So we should still be emitting them with the operation VT. Theoretically this could expose more accurate opportunities for CSE. llvm-svn: 356869	2019-03-25 06:53:45 +00:00
Craig Topper	3810e35d3f	[X86] Remove GetLo8XForm and use GetLo32XForm instead. NFCI We were using this to create an AND32ri8 node from a 64-bit and, but that node normally still uses a 32-bit immediate. So we should just truncate the existing immediate to i32. We already verified it has the same value in bits 31:7. llvm-svn: 356868	2019-03-25 06:53:44 +00:00
Craig Topper	5b43446831	[X86] Remove a couple unused SDNodeXForms. NFC llvm-svn: 356867	2019-03-25 06:53:43 +00:00
Craig Topper	7c2554dd92	Revert r356688 "[X86] Don't avoid folding multiple use sign extended 8-bit immediate into instructions under optsize." Looking back over how the one use optimization works, I don't think this is the right way to fix this. llvm-svn: 356866	2019-03-25 01:25:32 +00:00
Simon Pilgrim	87d4ab8b92	[X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685) Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure. This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain. llvm-svn: 356864	2019-03-24 19:06:35 +00:00
Heejin Ahn	803c7782d5	[WebAssembly] Rename a variable in CFGSort (NFC) Class `RegionInfo` was `SortUnitInfo` before, so the variables were named `SUI`. Now the class name is `RegionInfo`, so this renames `SUI` to `RI`, matching the class name. llvm-svn: 356861	2019-03-24 17:34:40 +00:00
Simon Pilgrim	a71c0ed471	[X86][AVX] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685) Just enable this for AVX for now as SSE41 introduces extra register moves for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern (but otherwise helps reduce port5 usage on Intel targets). Only AVX support is required for PR40685 as the issue is due to 8i8->8i32 zext shuffle leftovers. llvm-svn: 356858	2019-03-24 16:30:35 +00:00
Sanjay Patel	7d676dfd86	[x86] improve the default expansion of uaddsat/usubsat This is yet another step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 uaddsat X, Y --> (X >u (X + Y)) ? -1 : X + Y usubsat X, Y --> (X >u Y) ? X - Y : 0 We can't count on a sane vector ISA, so override the default (umin/umax) expansion of unsigned add/sub saturate in cases where we do not have umin/umax. Differential Revision: https://reviews.llvm.org/D59006 llvm-svn: 356855	2019-03-24 13:55:54 +00:00
Sanjay Patel	2e92846d36	[x86] reduce code duplication; NFC llvm-svn: 356836	2019-03-23 15:00:52 +00:00
Eli Friedman	b906bba576	[ARM] Don't form "ands" when it isn't scheduled correctly. In r322972/r323136, the iteration here was changed to catch cases at the beginning of a basic block... but we accidentally deleted an important safety check. Restore that check to the way it was. Fixes https://bugs.llvm.org/show_bug.cgi?id=41116 Differential Revision: https://reviews.llvm.org/D59680 llvm-svn: 356809	2019-03-22 20:49:15 +00:00
Craig Topper	ce1ed55a4a	[X86] Use xmm registers to implement 64-bit popcnt on 32-bit targets if possible if popcnt instruction is not available On 32-bit targets without popcnt, we currently expand 64-bit popcnt to sequences of arithmetic and logic ops for each 32-bit half and then add the 32 bit halves together. If we have xmm registers we can use use those to implement the operation instead. This results in less instructions then doing two separate 32-bit popcnt sequences. This mitigates some of PR41151 for the i64 on i686 case when we have SSE2. Differential Revision: https://reviews.llvm.org/D59662 llvm-svn: 356808	2019-03-22 20:47:02 +00:00
Craig Topper	1ffd8e8114	[X86] Use movq for i64 atomic load on 32-bit targets when sse2 is enable We used a lock cmpxchg8b to do i64 atomic loads. But if we have SSE2 we can do better and use a plain movq to do the load instead. I tried to just use an f64 atomic load and add isel patterns to MOVSD(which the domain fixing pass can turn to MOVQ), but the atomic_load SDNode in TargetSelectionDAG.td requires the type to be integer. So I've emitted VZEXT_LOAD instead which should be selected by isel to a MOVQ. Hopefully we don't need a specific atomic flavor of this. I kept the memory operand from the original AtomicSDNode. I wasn't sure if I might need to set the MOVolatile flag? I've left some FIXMEs for improvements we can do without SSE2. Differential Revision: https://reviews.llvm.org/D59679 llvm-svn: 356807	2019-03-22 20:46:56 +00:00
Evandro Menezes	4a7739b681	[AArch64, ARM] Add support for Exynos M5 Add Exynos M5 support and test cases. llvm-svn: 356793	2019-03-22 18:42:14 +00:00
Eli Friedman	c7870cce80	[ARM] [NFC] Use tGPR in patterns where appropriate. This doesn't have any practical effect at the moment, as far as I know, because high registers aren't allocatable in Thumb1 mode. But it might matter in the future. Differential Revision: https://reviews.llvm.org/D59675 llvm-svn: 356791	2019-03-22 18:37:26 +00:00
Simon Pilgrim	564392d752	[X86] lowerShuffleAsBitMask - ensure float bit masks are the correct width (PR41203) llvm-svn: 356784	2019-03-22 17:23:55 +00:00
Alina Sbirlea	bfc779e491	[AliasAnalysis] Second prototype to cache BasicAA / anyAA state. Summary: Adding contained caching to AliasAnalysis. BasicAA is currently the only one using it. AA changes: - This patch is pulling the caches from BasicAAResults to AAResults, meaning the getModRefInfo call benefits from the IsCapturedCache as well when in "batch mode". - All AAResultBase implementations add the QueryInfo member to all APIs. AAResults APIs maintain wrapper APIs such that all alias()/getModRefInfo call sites are unchanged. - AA now provides a BatchAAResults type as a wrapper to AAResults. It keeps the AAResults instance and a QueryInfo instantiated to batch mode. It delegates all work to the AAResults instance with the batched QueryInfo. More API wrappers may be needed in BatchAAResults; only the minimum needed is currently added. MemorySSA changes: - All walkers are now templated on the AA used (AliasAnalysis=AAResults or BatchAAResults). - At build time, we optimize uses; now we create a local walker (lives only as long as OptimizeUses does) using BatchAAResults. - All Walkers have an internal AA and only use that now, never the AA in MemorySSA. The Walkers receive the AA they will use when built. - The walker we use for queries after the build is instantiated on AliasAnalysis and is built after building MemorySSA and setting AA. - All static methods doing walking are now templated on AliasAnalysisType if they are used both during build and after. If used only during build, the method now only takes a BatchAAResults. If used only after build, the method now takes an AliasAnalysis. Subscribers: sanjoy, arsenm, jvesely, nhaehnle, jlebar, george.burgess.iv, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59315 llvm-svn: 356783	2019-03-22 17:22:19 +00:00
Tim Renouf	6f0191a55a	[AMDGPU] Use three- and five-dword result type in image ops Some image ops return three or five dwords. Previously, we modeled that with a 4 or 8 dword register class. The register allocator could cleverly spot that some subregs were dead and allocate something else there, but that caused the de-optimization that waitcnt insertion would think that the result was used immediately. This commit allows such an image op to have a result with a three or five dword result, avoiding the above de-optimization. Differential Revision: https://reviews.llvm.org/D58905 Change-Id: I3651211bbd7ed22721ee7b9fefd7bcc60a809d8b llvm-svn: 356757	2019-03-22 15:21:11 +00:00
Tim Renouf	677387d8dc	[AMDGPU] Implemented dwordx3 variants of buffer/tbuffer load/store intrinsics Now we have vec3 MVTs, this commit implements dwordx3 variants of the buffer intrinsics. On gfx6, a dwordx3 buffer load intrinsic is implemented as a dwordx4 instruction, and a dwordx3 buffer store intrinsic is not supported. We need to support the dwordx3 load intrinsic because it is generated by subtarget-unaware code in InstCombine. Differential Revision: https://reviews.llvm.org/D58904 Change-Id: I016729d8557b98a52f529638ae97c340a5922a4e llvm-svn: 356755	2019-03-22 14:58:02 +00:00
Alex Bradbury	dab1f6fc4e	[RISCV] Add basic RV32E definitions and MC layer support The RISC-V ISA defines RV32E as an alternative "base" instruction set encoding, that differs from RV32I by having only 16 rather than 32 registers. This patch adds basic definitions for RV32E as well as MC layer support (assembling, disassembling) and tests. The only supported ABI on RV32E is ILP32E. Add a new RISCVFeatures::validate() helper to RISCVUtils which can be called from codegen or MC layer libraries to validate the combination of TargetTriple and FeatureBitSet. Other targets have similar checks (e.g. erroring if SPE is enabled on PPC64 or oddspreg + o32 ABI on Mips), but they either duplicate the checks (Mips), or fail to check for both codegen and MC codepaths (PPC). Codegen for the ILP32E ABI support and RV32E codegen are left for a future patch/patches. Differential Revision: https://reviews.llvm.org/D59470 llvm-svn: 356744	2019-03-22 11:21:40 +00:00
Alex Bradbury	b9e78c3994	[RISCV] Optimize emission of SELECT sequences This patch optimizes the emission of a sequence of SELECTs with the same condition, avoiding the insertion of unnecessary control flow. Such a sequence often occurs when a SELECT of values wider than XLEN is legalized into two SELECTs with legal types. We have identified several use cases where the SELECTs could be interleaved with other instructions. Therefore, we extend the sequence to include non-SELECT instructions if we are able to detect that the non-SELECT instructions do not impact the optimization. This patch supersedes https://reviews.llvm.org/D59096, which attempted to address this issue by introducing a new SelectionDAG node. Hat tip to Eli Friedman for his feedback on how to best handle this issue. Differential Revision: https://reviews.llvm.org/D59355 Patch by Luís Marques. llvm-svn: 356741	2019-03-22 10:45:03 +00:00
Alex Bradbury	3369101158	[RISCV] Allow conversion of CC logic to bitwise logic Indicates in the TargetLowering interface that conversions from CC logic to bitwise logic are allowed. Adds tests that show the benefit when optimization opportunities are detected. Also adds tests that show that when the optimization is not applied correct code is generated (but opportunities for other optimizations remain). Differential Revision: https://reviews.llvm.org/D59596 Patch by Luís Marques. llvm-svn: 356740	2019-03-22 10:39:22 +00:00
Tim Renouf	033f99a2e5	[AMDGPU] Added v5i32 and v5f32 register classes They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735	2019-03-22 10:11:21 +00:00
Yonghong Song	ded9a440d0	[BPF] handle derived type properly for computing type id Currently, the type id for a derived type is computed incorrectly. For example, type #1: int type #2: ptr to #1 For a global variable "int *a", type #1 will be attributed to variable "a". This is due to a bug which assigns the type id of the basetype of that derived type as the derived type's type id. This happens to "const", "volatile", "restrict", "typedef" and "pointer" types. This patch fixed this bug, fixed existing test cases and added a new one focusing on pointers plus other derived types. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 356727	2019-03-22 01:30:50 +00:00
Amara Emerson	c10b24691a	[AArch64] Split the neon.addp intrinsic into integer and fp variants. This is the result of discussions on the list about how to deal with intrinsics which require codegen to disambiguate them via only the integer/fp overloads. It causes problems for GlobalISel as some of that information is lost during translation, while with other operations like IR instructions the information is encoded into the instruction opcode. This patch changes clang to emit the new faddp intrinsic if the vector operands to the builtin have FP element types. LLVM IR AutoUpgrade has been taught to upgrade existing calls to aarch64.neon.addp with fp vector arguments, and we remove the workarounds introduced for GlobalISel in r355865. This is a more permanent solution to PR40968. Differential Revision: https://reviews.llvm.org/D59655 llvm-svn: 356722	2019-03-21 22:31:37 +00:00
Craig Topper	b3bad3dce3	[X86] Use LoadInst->getType() instead of LoadInst->getPointerOperandType()->getElementType(). NFCI For the future day when the pointer's don't have element types, we shoudl just use the type of the load result instead. llvm-svn: 356721	2019-03-21 21:37:18 +00:00
Matt Arsenault	e811333b2d	Mips: Fix typo in assert message llvm-svn: 356717	2019-03-21 20:56:06 +00:00
Matt Arsenault	9a1a1f7bb2	Mips: Don't create copy of nothing This was creating a copy of the register the pseudo itself was def'ing, leaving a copy of an undefined register. I'm not sure how the verifier is not catching this, but this avoids asserting in a future change to RegAllocFast llvm-svn: 356716	2019-03-21 20:56:05 +00:00
Evandro Menezes	e5e77815b4	[AArch64] Update for Exynos Fix the feature set for Exynos M4 by removing support for `+fp16fml` and fix test case. llvm-svn: 356698	2019-03-21 18:54:58 +00:00
Simon Pilgrim	c2e4405475	[X86] canonicalizeBitSelect - don't attempt to canonicalize mask registers We don't use X86ISD::ANDNP for mask registers. Test case from @craig.topper (Craig Topper) llvm-svn: 356696	2019-03-21 18:32:38 +00:00
Craig Topper	c14f3e4222	[X86] Don't avoid folding multiple use sign extended 8-bit immediate into instructions under optsize. Under optsize we try to avoid folding immediates into instructions under optsize. But if the immediate is 16-bits or 32 bits, but can be encoded as an 8-bit immediate we don't save enough from disabling the folding unless the immediate has enough uses to make up for the size of the move which is either 3 bytes or 5 bytes since there are no sign extended 8-bit moves. We would also save something if the immediate was a live out of the basic block and thus a move was unavoidable, but that would require a more advanced heuristic than just counting uses. Note we only avoid folding multiple use immediates into the patterns that use X86ISD::ADD/SUB/XOR/OR/AND/CMP/ADC/SBB nodes and not the more common ISD::ADD/SUB/XOR/OR/AND nodes. Differential Revision: https://reviews.llvm.org/D59522 llvm-svn: 356688	2019-03-21 17:38:58 +00:00
Craig Topper	9f0b17a248	[ScalarizeMaskedMemIntrin] Add support for scalarizing expandload and compressstore intrinsics. This adds support for scalarizing these intrinsics as well the X86TargetTransformInfo support to avoid scalarizing them in the cases X86 can handle. I've omitted handling special cases for constant masks for this first pass. Though CodeGenPrepare can constant fold the branch conditions and remove some of the control flow anyway. Fixes PR40994 and is covers most of PR3666. Might want to implement constant masks to close that. Differential Revision: https://reviews.llvm.org/D59180 llvm-svn: 356687	2019-03-21 17:38:52 +00:00
Simon Pilgrim	87d261bfd3	[Thumb] Fix infinite loop in ABS expansion (PR41160) Don't expand ISD::ABS node if its legal. llvm-svn: 356661	2019-03-21 12:41:18 +00:00
Tim Renouf	361b5b2193	[AMDGPU] Support for v3i32/v3f32 Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659	2019-03-21 12:01:21 +00:00
Oliver Stannard	defdb1070f	[AArch64] Allow -mattr=tpidr-el[1\|2\|3] Added subtarget features for AArch64 to use TPIDR_EL[1\|2\|3] as the TLS base register, rather than the default TPIDR_EL0. Patch by Philip Derrin! Differential revision: https://reviews.llvm.org/D54685 llvm-svn: 356657	2019-03-21 11:30:17 +00:00
Craig Topper	8d46403b8e	[X86] Add CMPXCHG8B feature flag. Set it for all CPUs except i386/i486 including 'generic'. Disable use of CMPXCHG8B when this flag isn't set. CMPXCHG8B was introduced on i586/pentium generation. If its not enabled, limit the atomic width to 32 bits so the AtomicExpandPass will expand to lib calls. Unclear if we should be using a different limit for other configs. The default is 1024 and experimentation shows that using an i256 atomic will cause a crash in SelectionDAG. Differential Revision: https://reviews.llvm.org/D59576 llvm-svn: 356631	2019-03-20 23:35:49 +00:00
Thomas Lively	5098f8589d	[WebAssembly][NFC] Fix formatting error from rL356610 llvm-svn: 356622	2019-03-20 22:34:34 +00:00
Tim Renouf	2327c231d6	[AMDGPU] Do not generate spurious PAL metadata My previous fix rL356591 "[AMDGPU] Added MsgPack format PAL metadata" accidentally caused a spurious PAL metadata .note record to be emitted for any AMDGPU output. That caused failures in the lld test amdgpu-relocs.s. Fixed. Differential Revision: https://reviews.llvm.org/D59613 Change-Id: Ie04a2aaae890dcd490f22c89edf9913a77ce070e llvm-svn: 356621	2019-03-20 22:02:09 +00:00
Craig Topper	0367553304	[X86] Call lowerShuffleAsBitMask for 512-bit vectors in lowerShuffleAsBlend. This patch enables the use of lowerShuffleAsBitMask for 512-bit blends before falling back to move immedate, GPR to k-register, and masked op. I had to make some changes to support v8i64 when i64 is not a legal type. And to support floating point types. This trades a load for the move immediate and GPR move which is higher latency. But its probably better for register pressure not having to hop through other register classes. The load+and should play better with LICM and rematerialization I think. Differential Revision: https://reviews.llvm.org/D59479 llvm-svn: 356618	2019-03-20 21:30:20 +00:00
Michael Liao	bbcb95a64e	[AMDGPU] Fix dependency on `BinaryFormat` Summary: - The linking is broken when this library is built as shared one. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59610 llvm-svn: 356617	2019-03-20 21:22:27 +00:00
Matt Arsenault	2065206a9d	AMDGPU: Don't look for constant in insert/extract_vector_elt regbankselect The constantness shouldn't change the register bank choice. We also don't need to restrict this to only indexing VGPRs, since it's possible to index SGPRs (but SelectionDAG made using this difficult). Allow directly indexing SGPRs when appropriate. llvm-svn: 356611	2019-03-20 20:41:34 +00:00
Thomas Lively	f6f4f84378	[WebAssembly] Target features section Summary: Implements a new target features section in assembly and object files that records what features are used, required, and disallowed in WebAssembly objects. The linker uses this information to ensure that all objects participating in a link are feature-compatible and records the set of used features in the output binary for use by optimizers and other tools later in the toolchain. The "atomics" feature is always required or disallowed to prevent linking code with stripped atomics into multithreaded binaries. Other features are marked used if they are enabled globally or on any function in a module. Future CLs will add linker flags for ignoring feature compatibility checks and for specifying the set of allowed features, implement using the presence of the "atomics" feature to control the type of memory and segments in the linked binary, and add front-end flags for relaxing the linkage policy for atomics. Reviewers: aheejin, sbc100, dschuff Subscribers: jgravelle-google, hiraditya, sunfish, mgrang, jfb, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59173 llvm-svn: 356610	2019-03-20 20:26:45 +00:00
Michael Liao	eea5177d30	[AMDGPU] Fix clamp bit DAG operand Summary: - Should use `targetconstant` instead of `constant` operand for clamp bit, which is expected as an immediate operand. Under certain conditions, such as a common `i1 false` constant is used in other place and selected before the instruction with clamp bit, register operand may be added instead of immediate one. Use `targetcosntant` to enforce that. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59608 llvm-svn: 356608	2019-03-20 20:18:56 +00:00
Pete Couperus	b062239d63	[ARC] Add ARCOptAddrMode pass to generate postincrement loads/stores. Build on newly introduced ARC postincrement loads/stores from r356200. Patch By Denis Antrushin! <denis@synopsys.com> Differential Revision: https://reviews.llvm.org/D59409 llvm-svn: 356606	2019-03-20 20:06:21 +00:00
Konstantin Zhuravlyov	88268e3e36	AMDHSA: Fix COMPUTE_PGM_RSRC2.USER_SGPR calculation when parsing ISA assembly It must match https://llvm.org/docs/AMDGPUUsage.html#initial-kernel-execution-state Differential Revision: https://reviews.llvm.org/D59570 llvm-svn: 356603	2019-03-20 19:44:47 +00:00
Eli Friedman	638be660d7	[ARM] Eliminate redundant "mov rN, sp" instructions in Thumb1. This takes sequences like "mov r4, sp; str r0, [r4]", and optimizes them to something like "str r0, [sp]". For regular stack variables, this optimization was already implemented: we lower loads and stores using frame indexes, which are expanded later. However, when constructing a call frame for a call with more than four arguments, the existing optimization doesn't apply. We need to use stores which are actually relative to the current value of sp, and don't have an associated frame index. This patch adds a special case to handle that construct. At the DAG level, this is an ISD::STORE where the address is a CopyFromReg from SP (plus a small constant offset). This applies only to Thumb1: in Thumb2 or ARM mode, a regular store instruction can access SP directly, so the COPY gets eliminated by existing code. The change to ARMDAGToDAGISel::SelectThumbAddrModeSP is a related cleanup: we shouldn't pretend that it can select anything other than frame indexes. Differential Revision: https://reviews.llvm.org/D59568 llvm-svn: 356601	2019-03-20 19:40:45 +00:00
Tim Renouf	e7bd52f86e	[AMDGPU] Added MsgPack format PAL metadata Summary: PAL metadata now supports both the old linear reg=val pairs format and the new MsgPack format. The MsgPack format uses YAML as its textual representation. On output to YAML, a mnemonic name is provided for some hardware registers. Differential Revision: https://reviews.llvm.org/D57028 Change-Id: I2bbaabaaca4b3574f7e03b80fbef7c7a69d06a94 llvm-svn: 356591	2019-03-20 18:47:21 +00:00
Tim Renouf	d737b551e9	[AMDGPU] Factored PAL metadata handling out into its own class Summary: This commit introduces a new AMDGPUPALMetadata class that: * is inside the AMDGPU target; * keeps an in-memory representation of PAL metadata; * provides a method to read the frontend-supplied metadata from LLVM IR; * provides methods for the asm printer to set metadata items; * provides methods to write the metadata as a binary blob to put in a .note record or as an asm directive; * provides a method to read the metadata as a binary blob from a .note record. Because llvm-readobj cannot call directly into a target, I had to remove llvm-readobj's ability to dump PAL metadata, pending a resolution to https://reviews.llvm.org/D52821 Differential Revision: https://reviews.llvm.org/D57027 Change-Id: I756dc830894fcb6850324cdcfa87c0120eb2cf64 llvm-svn: 356582	2019-03-20 17:42:00 +00:00
Dmitry Preobrazhensky	04bd1185ad	[AMDGPU][MC] Corrected checks for DS offset0 range See bug 40889: https://bugs.llvm.org/show_bug.cgi?id=40889 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59313 llvm-svn: 356576	2019-03-20 17:13:58 +00:00
Dmitry Preobrazhensky	137976fae2	[AMDGPU][MC][GFX9] Added support of operands shared_base, shared_limit, private_base, private_limit, pops_exiting_wave_id See bug 39297: https://bugs.llvm.org/show_bug.cgi?id=39297 Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D59290 llvm-svn: 356561	2019-03-20 15:40:52 +00:00
Simon Pilgrim	2acca37a2d	[X86] Use getConstantOperandAPInt to detect out-of-range shifts. llvm-svn: 356549	2019-03-20 11:41:52 +00:00
Andrea Di Biagio	624f5deff4	[X86] Remove X86 specific dag nodes for RDTSC/RDTSCP/RDPMC. NFCI This patch removes the following dag node opcodes from namespace X86ISD: RDTSC_DAG, RDTSCP_DAG, RDPMC_DAG The logic that expands RDTSC/RDPMC/XGETBV intrinsics is basically the same. The only differences are: RDTSC/RDTSCP don't implicitly read ECX. RDTSCP also implicitly writes ECX. I moved the common expansion logic into a helper function with the goal to get rid of code repetition. That helper is now used for the expansion of RDTSC/RDTSCP/RDPMC/XGETBV intrinsics. No functional change intended. Differential Revision: https://reviews.llvm.org/D59547 llvm-svn: 356546	2019-03-20 11:21:15 +00:00
David Stuttard	fc2a747345	[AMDGPU] Allow MIMG with no uses in adjustWritemask in isel Summary: If an MIMG instruction has managed to get through to adjustWritemask in isel but has no uses (and doesn't enable TFC) then prevent an assertion by not attempting to adjust the writemask. The instruction will be removed anyway. Change-Id: I9a5dba6bafe1f35ac99c1b73df390936e2ac27a7 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58964 llvm-svn: 356540	2019-03-20 09:29:55 +00:00
Craig Topper	97d104cbee	[X86] Re-disable cmpxchg16b for 32-bit mode assembly parsing. This was broken recently when I factored the 64 bit mode check into hasCmpxchg16 without thinking about the AssemblerPredicate. llvm-svn: 356531	2019-03-19 23:57:16 +00:00
Eli Friedman	2596e8b3e7	[ARM] Make sure to save/restore LR when we use tBfar. This change does two things. One, it ensures compilation will abort instead of miscompiling if ARMFrameLowering::determineCalleeSaves chooses not to save LR in a case where it's necessary. Two, it changes the way we estimate the size of a function to be more conservative in the presence of constant pool entries and jump tables. EstimateFunctionSizeInBytes probably still isn't really conservative enough, but I'm not sure how we can come up with a reliable estimate before constant islands runs. Differential Revision: https://reviews.llvm.org/D59439 llvm-svn: 356527	2019-03-19 21:48:08 +00:00
Amara Emerson	761ca2e53b	[AArch64][GlobalISel] Add an optimization to select vector DUP instructions. This adds pattern matching for the insert+shufflevector sequence so we can generate dup instructions instead of the current TBL sequence. Differential Revision: https://reviews.llvm.org/D59558 llvm-svn: 356526	2019-03-19 21:43:05 +00:00
Amara Emerson	18e2c5724a	[AArch64][GlobalISel] Make v4s32 G_IMPLICIT_DEF legal. llvm-svn: 356525	2019-03-19 21:43:02 +00:00
Matt Arsenault	cf55a657f0	CodeGen: Refactor regallocator command line and target selection This will allow targets more flexibility to replace the register allocator core passes. In a future commit, AMDGPU will run the core register assignment passes twice, and will also want to disallow using the standard -regalloc option. llvm-svn: 356506	2019-03-19 19:33:12 +00:00
Simon Pilgrim	77482120da	Fix for ABS legalization on PPC buildbot. llvm-svn: 356498	2019-03-19 18:55:46 +00:00
Simon Pilgrim	e744f513c4	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - handle repeated shift amounts If a value with multiple uses is only ever used for SSE shift amounts then we know that only the bottom 64-bits are needed. llvm-svn: 356483	2019-03-19 17:23:25 +00:00
Simon Atanasyan	db4601e60a	[MIPS][microMIPS] Enable dynamic stack realignment Dynamic stack realignment was disabled on micromips by checking if target has standard encoding. We simply change the condition to skip Mips16 only. Patch by Mirko Brkusanin. Differential Revision: http://reviews.llvm.org/D59499 llvm-svn: 356478	2019-03-19 17:01:24 +00:00
Jordan Rupprecht	f74d45a775	[NFC] Fix unused variable in release builds This was introduced in rL356468. llvm-svn: 356477	2019-03-19 16:52:40 +00:00
Simon Pilgrim	7a8e5051f4	Fix unused variable warning. NFCI. llvm-svn: 356474	2019-03-19 16:49:59 +00:00
Simon Pilgrim	a56f2822d0	[SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect These changes are related to PR37743 and include: SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node. Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner. Add promoting the integer ABS node in the LegalizeIntegerType. Expand-based legalization of integer result for the ABS nodes. Expand-based legalization of ABS vector operations. Add some integer abs testcases for different typesizes for Thumb arch Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to: tmp = (SRA, Hi, 31) Lo = (UADDO tmp, Lo) Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1)) Lo = (XOR tmp, Lo) The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern: (ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))). Change integer abs testcases for codegen with the ABS node support for AArch64. Indicate that the ABS is legal for the i64 type when the NEON is supported. Change the integer abs testcases to show changing of codegen. Add combine and legalization of ABS nodes for Thumb arch. Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition. For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743 Patch by: @ikulagin (Ivan Kulagin) Differential Revision: https://reviews.llvm.org/D49837 llvm-svn: 356468	2019-03-19 16:24:55 +00:00
Ryan Taylor	00e063ab92	[AMDGPU] Add buffer/load 8/16 bit overloaded intrinsics Summary: Add buffer store/load 8/16 overloaded intrinsics for buffer, raw_buffer and struct_buffer Change-Id: I166a29f071b2ff4e4683fb0392564b1f223ac61d Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59265 llvm-svn: 356465	2019-03-19 16:07:00 +00:00
Neil Henning	e85f6bd64f	[AMDGPU] Ban i8 min3 promotion. I found this really weird WWM-related case whereby through the WWM transformations our isel lowering was trying to promote 2 min's into a min3 for the i8 type, which our hardware doesn't support. The new min3_i8.ll test case would previously spew the error: PromoteIntegerResult #0: t69: i8 = SMIN3 t70, Constant:i8<0>, t68 Before the simple fix to our isel lowering to not do it for i8 MVT's. Differential Revision: https://reviews.llvm.org/D59543 llvm-svn: 356464	2019-03-19 15:50:24 +00:00
Simon Atanasyan	af40d4371d	[mips] Fix crash on recursive using of .set Switch to the `MCParserUtils::parseAssignmentExpression` for parsing assignment expressions in the `.set` directive reduces code and allows to print an error message instead of crashing in case of incorrect recursive using of the `.set`. Fix for the bug https://bugs.llvm.org/show_bug.cgi?id=41053. Differential Revision: http://reviews.llvm.org/D59452 llvm-svn: 356461	2019-03-19 15:15:35 +00:00
Heejin Ahn	c60bc94afc	[WebAssembly] Small improvements in FixIrreducibleControlFlow (NFC) Summary: - Make some class member methods const - Delete unnecessary includes - Use a simpler form of `BuildMI` Reviewers: kripken Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59454 llvm-svn: 356440	2019-03-19 05:26:33 +00:00
Heejin Ahn	34dc1f2483	[WebAssembly] Rename methods according to instruction name changes (NFC) Reviewers: tlively, sbc100 Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59469 llvm-svn: 356438	2019-03-19 05:07:33 +00:00
Thomas Lively	0200d62ec7	[WebAssembly] Lower SIMD nnan setcc nodes Summary: Adds patterns to lower all the remaining setcc modes: lt, gt, le, and ge. Fixes PR40912. Reviewers: aheejin, sbc100, dschuff Reviewed By: dschuff Subscribers: jgravelle-google, hiraditya, sunfish, jdoerfert, llvm-commits, srj Tags: #llvm Differential Revision: https://reviews.llvm.org/D59519 llvm-svn: 356431	2019-03-19 00:55:34 +00:00
Craig Topper	b24bdf626a	[X86] Disable CQTO and CLTQ instructions in the assembly parser outside 64-bit mode. llvm-svn: 356419	2019-03-18 22:06:14 +00:00
Craig Topper	e732bc6bea	[X86] Allow any 8-bit immediate to be used with BT/BTC/BTR/BTS not just sign extended 8-bit immediates. We need to allow [128,255] in addition to [-128, 127] to match gas. llvm-svn: 356413	2019-03-18 21:33:59 +00:00
Sam Clegg	b7708ec87f	[WebAssembly] Don't override default implementation of isOffsetFoldingLegal. NFC. The default implementation does we want and is going to more compatible with dynamic linking (-fPIC) support that is planned. This is NFC because currently we only build wasm with `-relocation-model=static` which in turn means that the default `isOffsetFoldingLegal` always returns true today. Differential Revision: https://reviews.llvm.org/D54661 llvm-svn: 356410	2019-03-18 21:21:12 +00:00
Craig Topper	f086e562f9	[X86] Use relocImm in the ROL8ri/ROL16ri/ROL32ri/ROL64ri patterns to be consistent with the ROR patterns. llvm-svn: 356407	2019-03-18 20:43:15 +00:00
Craig Topper	0b9c640fe0	[X86] Replace uses of i64immSExt32_su with i64relocImmSExt32_su. For the i8, i16, and i32 instructions we were using a relocImm. Presumably we should for i64 as well. llvm-svn: 356406	2019-03-18 20:43:09 +00:00
Michael Liao	efb4f9e568	[AMDGPU] Enable code selection using `s_mul_hi_u32`/`s_mul_hi_i32`. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59501 llvm-svn: 356405	2019-03-18 20:40:09 +00:00
Tim Renouf	cfdfba996b	[AMDGPU] Asm/disasm clamp modifier on vop3 int arithmetic Allow the clamp modifier on vop3 int arithmetic instructions in assembly and disassembly. This involved adding a clamp operand to the affected instructions in MIR and MC, and thus having to fix up several places in codegen and MIR tests. Differential Revision: https://reviews.llvm.org/D59267 Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e llvm-svn: 356399	2019-03-18 19:35:44 +00:00
Tim Renouf	2e94f6e584	[AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiers This commit allows v_cndmask_b32_e64 with abs, neg source modifiers on src0, src1 to be assembled and disassembled. This does appear to be allowed, even though they are floating point modifiers and the operand type is b32. To do this, I added src0_modifiers and src1_modifiers to the MachineInstr, which involved fixing up several places in codegen and mir tests. Differential Revision: https://reviews.llvm.org/D59191 Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea llvm-svn: 356398	2019-03-18 19:25:39 +00:00
Amara Emerson	8627178d46	Revert r356304: remove subreg parameter from MachineIRBuilder::buildCopy() After review comments, it was preferred to not teach MachineIRBuilder about non-generic instructions beyond using buildInstr(). For AArch64 I've changed the buildCopy() calls to buildInstr() + a separate addReg() call. This also relaxes the MachineIRBuilder's COPY checking more because it may not always have a SrcOp given to it. llvm-svn: 356396	2019-03-18 19:20:10 +00:00
Tim Renouf	8723a56551	[MsgPack][AMDGPU] Fix unflushed raw_string_ostream bugs on windows expensive checks bot This fixes a couple of unflushed raw_string_ostream bugs in recent commits that only show up on a bot building on windows with expensive checks. Differential Revision: https://reviews.llvm.org/D59396 Change-Id: I9c6208325503b3ee0786b4b688e13fc24a15babf llvm-svn: 356394	2019-03-18 19:00:46 +00:00
Craig Topper	f07062a798	[X86] Rename imm8_su/imm16_su/imm32_su to relocImm8_su/relocImm16_su/relocImm32_su/ to accurately reflect what they are. llvm-svn: 356393	2019-03-18 18:54:06 +00:00
Adhemerval Zanella	270249de2b	[AArch64] Small fix for getIntImmCost It uses the generic AArch64_IMM::expandMOVImm to get the correct number of instruction used in immediate materialization. Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D58461 llvm-svn: 356391	2019-03-18 18:50:58 +00:00
Adhemerval Zanella	a3cefa5d64	[AArch64] Optimize floating point materialization This patch follows some ideas from r352866 to optimize the floating point materialization even further. It changes isFPImmLegal to considere up to 2 mov instruction or up to 5 in case subtarget has fused literals. The rationale is the cost is the same for mov+fmov vs. adrp+ldr; but the mov+fmov sequence is always better because of the reduced d-cache pressure. The timings are still the same if you consider movw+movk+fmov vs. adrp+ldr will be fused (although one instruction longer). Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D58460 llvm-svn: 356390	2019-03-18 18:45:57 +00:00

... 13 14 15 16 17 ...

52710 Commits