llvm-project

Commit Graph

Author	SHA1	Message	Date
Andrew Ng	03e35b6bc0	[DebugInfo][X86] Improve X86 Optimize LEAs handling of debug values. This is a follow up to the fix in r298360 to improve the handling of debug values when redundant LEAs are removed. The fix in r298360 effectively discarded the debug values. This patch now attempts to preserve the debug values by using the DWARF DW_OP_stack_value operation via prependDIExpr. Moved functions appendOffset and prependDIExpr from Local.cpp to DebugInfoMetadata.cpp and made them available as static member functions of DIExpression. Differential Revision: https://reviews.llvm.org/D31604 llvm-svn: 301630	2017-04-28 08:44:30 +00:00
Craig Topper	053cf4da9d	[WebAssembly] Update calls to computeKnownBits after the changes from r301620. I didn't realize WebAssembly wasn't a default build target so I missed that changes were needed. llvm-svn: 301629	2017-04-28 08:15:33 +00:00
Clement Courbet	5f0ab9e51d	[X86][NFC] Refactor RepMovsRepeats in preparation for D32481. Differential Revision: https://reviews.llvm.org/D32583 llvm-svn: 301628	2017-04-28 07:56:31 +00:00
Craig Topper	d0af7e8ab8	[SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 llvm-svn: 301620	2017-04-28 05:31:46 +00:00
Craig Topper	0e03e74e95	[SelectionDAG] Use various APInt methods to reduce temporary APInt creation This patch uses various APInt methods to reduce the number of temporary APInts. These were all found while working through converting SelectionDAG's computeKnownBits to also use the KnownBits struct recently added to the ValueTracking version. llvm-svn: 301618	2017-04-28 04:57:59 +00:00
Craig Topper	24e71017aa	[APInt] Use inplace shift methods where possible. NFCI llvm-svn: 301612	2017-04-28 03:36:24 +00:00
Sam Kolton	5d99386b4d	[AMDGPU] DPP: add support for GFX9 Reviewers: artem.tamazov Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32588 llvm-svn: 301551	2017-04-27 15:42:38 +00:00
Krzysztof Parzyszek	14f10e03e0	Fix typo and place comment close to its target Patch by Wei-Ren Chen. Differential Revision: https://reviews.llvm.org/D32594 llvm-svn: 301546	2017-04-27 14:38:21 +00:00
Zoran Jovanovic	ffef3e3c6a	[mips][microMIPS] Adding code size reduction pass for MicroMIPS Author: milena.vujosevic.janicic Reviewers: sdardis The code implements size reduction pass for MicroMIPS. Load and store instructions are examined and transformed, if possible. lw32 instruction is transformed into 16-bit instruction lwsp sw32 instruction is transformed into 16-bit instruction swsp Arithmetic instrcutions are examined and transformed, if possible. addu32 instruction is transformed into 16-bit instruction addu16 subu32 instruction is transformed into 16-bit instruction subu16 Differential Revision: https://reviews.llvm.org/D15144 llvm-svn: 301540	2017-04-27 13:10:48 +00:00
Jonas Paulsson	ac4e022d72	[SystemZ] Remove incorrect assert in SystemZTTIImpl In getCmpSelInstrCost(), CondTy may actually be scalar while ValTy is a vector when LoopVectorizer is the caller. Therefore the assert that CondTy must be a vector type if ValTy is was wrong and is now removed. Review: Ulrich Weigand llvm-svn: 301533	2017-04-27 11:01:18 +00:00
Diana Picus	4f46be327c	[ARM] GlobalISel: Fix extended stack operands Fix a crash when trying to extend a value passed as a sign- or zero-extended stack parameter. The cause of the crash was that we were setting the size of the loaded value to 32 bits, and then tyring to extend again to 32 bits. This patch addresses the issue by also introducing a G_TRUNC after the load. This will leave the unused bits to their original values set by the caller, while being consistent about the types. For values that are not extended, we just use a smaller load. llvm-svn: 301531	2017-04-27 10:23:30 +00:00
Igor Breger	360d0f23ee	[GlobalISel][X86] handle not symmetric G_COPY Summary: handle not symmetric G_COPY Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32420 llvm-svn: 301523	2017-04-27 08:02:03 +00:00
Clement Courbet	7b0ec39494	[CodeGen][NFC] Rename 'Src' to 'Val'. 'Src' looks like it was borrowed from memcpy, 'Val' makes more sense for memset and is consistent with naming within the function. Differential Revision: https://reviews.llvm.org/D32580 llvm-svn: 301521	2017-04-27 07:22:30 +00:00
Konstantin Zhuravlyov	97a663b6a2	AMDGPU: Fix assert in scheduler Assert is triggered if DBG_VALUE is first instruction in BB Differential Revision: https://reviews.llvm.org/D32572 llvm-svn: 301511	2017-04-27 03:22:44 +00:00
Matthias Braun	90834df0b4	Lanai: Remove unnecessary canRealignStack() override; NFC It was doing the same as the base implementation and was irritating me when I was searching for backends that have custom behavior for canRealignStack. llvm-svn: 301495	2017-04-26 23:37:01 +00:00
Dmitry Preobrazhensky	43d297eb45	[AMDGPU][MC] Added arg checks for vmcnt, expcnt, lgkmcnt helpers Summary of changes: - corrected vmcnt, expcnt, lgkmcnt helpers to checks their argument for truncation; - added saturated versions of these helpers. See bug 32711 for details: https://bugs.llvm.org//show_bug.cgi?id=32711 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D32546 llvm-svn: 301439	2017-04-26 17:55:50 +00:00
Craig Topper	b45eabcf82	[ValueTracking] Introduce a KnownBits struct to wrap the two APInts for computeKnownBits This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit. Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch. I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases. Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero\|One) so we don't write it out everywhere. Maybe a method for (Zero\|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with. Differential Revision: https://reviews.llvm.org/D32376 llvm-svn: 301432	2017-04-26 16:39:58 +00:00
Sanjoy Das	2cbeb00f38	Reverts commit r301424, r301425 and r301426 Commits were: "Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts" "Add a new WeakVH value handle; NFC" "Rename WeakVH to WeakTrackingVH; NFC" The changes assumed pointers are 8 byte aligned on all architectures. llvm-svn: 301429	2017-04-26 16:37:05 +00:00
Sanjoy Das	01de557738	Rename WeakVH to WeakTrackingVH; NFC Summary: I plan to use WeakVH to mean "nulls itself out on deletion, but does not track RAUW" in a subsequent commit. Reviewers: dblaikie, davide Reviewed By: davide Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D32266 llvm-svn: 301424	2017-04-26 16:20:52 +00:00
Dmitry Preobrazhensky	c7d35a0d6a	[AMDGPU][MC] Added check for truncation of SOPK imm operand See bug 30827: https://bugs.llvm.org//show_bug.cgi?id=30827 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D32535 llvm-svn: 301418	2017-04-26 15:34:19 +00:00
Dylan McKay	828bd6169c	[AVR] Remove an unused local variable llvm-svn: 301413	2017-04-26 14:47:27 +00:00
Sagar Thakur	b458b468a2	[mips] Fix test mips64fpldst.ll with machine verifier enabled Removed micro mips register classes for gp initialization because gp initialization uses pure mips64 instruction. Even when compiling for micro mips, gp initialization can be done with pure mips64 instructions. Reviewed by Simon Dardis Differential: D32286 llvm-svn: 301394	2017-04-26 11:40:12 +00:00
Ayman Musa	11966ab00b	[X86] Add missing mayLoad/mayStore attributes to some X86 instructions (Continue) Complete the patch committed in rL300190. Differential Revision: https://reviews.llvm.org/D32287 llvm-svn: 301393	2017-04-26 11:34:09 +00:00
Simon Dardis	70f79251bc	[mips] Rework a portion of MipsCC interface. (NFC) r299766 contained a "conditional move or jump depends on uninitialized value" fault, identified by valgrind. This occurred as MipsFastISel::finishCall(..) used CCState over MipsCCState. The latter is required for the TableGen'd calling convention logic due to reliance on pre-analyzing type information to lower call results/returns of vectors correctly. This change modifies the MipsCC AnalyzeCallResult to be useful with both the SelectionDAG and FastISel lowering logic. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D32004 llvm-svn: 301392	2017-04-26 11:10:38 +00:00
Andrew V. Tischenko	c3c6723ab5	PR31007 and PR27884 will be closed: a possibility to compile constants like 0bH is now supported in MS asm. llvm-svn: 301390	2017-04-26 09:56:59 +00:00
Ayman Musa	d9fb157845	[X86][SSE2] Fix asm string for movq (Move Quadword) instruction. Replace "mov{d\|q}" with "movq". Differential Revision: https://reviews.llvm.org/D32220 llvm-svn: 301386	2017-04-26 07:08:44 +00:00
Davide Italiano	0316f7ae7b	[AMDGPU] Garbage collect dead code. NFCI. llvm-svn: 301375	2017-04-26 01:00:52 +00:00
Vadzim Dambrouski	d91fb8c367	[MSP430] Fix PR32769: Select8 and Select16 need to have SR in Uses. If Select pseudo instruction doesn't have use SR, then CMP instructions are being marked as dead and later can be removed by MachineCSE pass. This leads to incorrect code generation. Differential Revision: https://reviews.llvm.org/D32473 llvm-svn: 301372	2017-04-26 00:33:59 +00:00
Dylan McKay	ff49a05565	[AVR] Do not kill the dest register for a pseudo instruction It caused the register to later be dead, which would trigger a verifier error. llvm-svn: 301368	2017-04-25 23:58:20 +00:00
Matt Arsenault	36c3122ecd	AMDGPU: Shift down reserved SP register like scratch wave offset llvm-svn: 301367	2017-04-25 23:40:57 +00:00
Matt Arsenault	df58e825ad	AMDGPU: Clean up VOP3NoMods pattern There is no need to copy the operands or inspect the sources. Also remove some unnecessary clamp/omod usage. llvm-svn: 301363	2017-04-25 21:17:38 +00:00
Konstantin Zhuravlyov	54ba4312a3	AMDGPU: Fix ValueKind code object metadata for images Differential Revision: https://reviews.llvm.org/D32504 llvm-svn: 301360	2017-04-25 20:38:26 +00:00
Krzysztof Parzyszek	9ebbe5bf2e	[Hexagon] Only increment debug counters if debug option is present llvm-svn: 301346	2017-04-25 18:56:14 +00:00
Simon Pilgrim	d68785803b	[SelectionDAG] Added getBuildVector(ArrayRef<SDUse>) helper. llvm-svn: 301322	2017-04-25 16:41:28 +00:00
Dylan McKay	8f515b1ef7	[AVR] Support the LDWRdPtr instruction with the same Src+Dst register llvm-svn: 301313	2017-04-25 15:09:04 +00:00
Matt Arsenault	e22184940b	AMDGPU: Slightly simplify prolog reserved register handling Rely on MachineRegisterInfo's knowledge of used physical registers. Move flat_scratch initialization earlier, so the uses are visible when making these decisions. This will make it easier to add another reserved register at the end for the stack pointer rather than handling another special case. llvm-svn: 301254	2017-04-24 21:08:32 +00:00
Krzysztof Parzyszek	c8e8e2a046	Move value type list from TargetRegisterClass to TargetRegisterInfo Differential Revision: https://reviews.llvm.org/D31937 llvm-svn: 301234	2017-04-24 19:51:12 +00:00
Krzysztof Parzyszek	98ab4c64c4	Revert r301231: Accidentally committed stale files I forgot to commit local changes before commit. llvm-svn: 301232	2017-04-24 19:48:51 +00:00
Krzysztof Parzyszek	c0197066d7	Move value type list from TargetRegisterClass to TargetRegisterInfo Differential Revision: https://reviews.llvm.org/D31937 llvm-svn: 301231	2017-04-24 19:43:45 +00:00
Matt Arsenault	0774ea267a	AMDGPU: Select scratch mubuf offsets when pointer is a constant In call sequence setups, there may not be a frame index base and the pointer is a constant offset from the frame pointer / scratch wave offset register. llvm-svn: 301230	2017-04-24 19:40:59 +00:00
Matt Arsenault	df6539f44b	AMDGPU: Set StackGrowsUp in MCAsmInfo Not sure what this does though. llvm-svn: 301229	2017-04-24 19:40:51 +00:00
Stanislav Mekhanoshin	bd5394be3d	[AMDGPU] Merge M0 initializations Merges equivalent initializations of M0 and hoists them into a common dominator block. Technically the same code can be used with any register, physical or virtual. Differential Revision: https://reviews.llvm.org/D32279 llvm-svn: 301228	2017-04-24 19:37:54 +00:00
Krzysztof Parzyszek	44e25f37ae	Move size and alignment information of regclass to TargetRegisterInfo 1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221	2017-04-24 18:55:33 +00:00
Yaxun Liu	fd23a0c095	CodeGen: Add a hook for getFenceOperandTy Currently the operand type for ATOMIC_FENCE assumes value type of a pointer in address space 0. This is fine for most targets. However for amdgcn target, the size of pointer in address space 0 depends on triple environment. For amdgiz environment, it is 64 bit but for other environment it is 32 bit. On the other hand, amdgcn target expects 32 bit fence operands independent of the target triple environment. Therefore a hook is need in target lowering for getting the fence operand type. This patch has no effect on targets other than amdgcn. Differential Revision: https://reviews.llvm.org/D32186 llvm-svn: 301215	2017-04-24 18:26:27 +00:00
Matthias Braun	f9796b76e9	X86RegisterInfo: eliminateFrameIndex: Avoid code duplication; NFC Re-Commit of r300922 and r300923 with less aggressive assert (see discussion at the end of https://reviews.llvm.org/D32205) X86RegisterInfo::eliminateFrameIndex() and X86FrameLowering::getFrameIndexReference() both had logic to compute the base register. This consolidates the code. Also use MachineInstr::isReturn instead of manually enumerating tail call instructions (return instructions were not included in the previous list because they never reference frame indexes). Differential Revision: https://reviews.llvm.org/D32206 llvm-svn: 301211	2017-04-24 18:15:00 +00:00
Matt Arsenault	1c0ae3972f	AMDGPU: Add StackPtr and FramePtr registers to MFI These will be necessary for setting up call sequences. llvm-svn: 301208	2017-04-24 18:05:16 +00:00
Matt Arsenault	3e02538a02	AMDGPU: Move trap lowering to DAG Fixes traps in any block besides the entry block, and fixes depending on a live-in physical register by using a virtual register copy. Also happens to stop emitting a nop in the case debug trap is not supported. llvm-svn: 301206	2017-04-24 17:49:13 +00:00
Nicolai Haehnle	5dea645138	AMDGPU: Move v_readlane lane select from VGPR to SGPR Summary: Fix a compiler bug when the lane select happens to end up in a VGPR. Clarify the semantic of the corresponding intrinsic to be that of the corresponding GLSL: the lane select must be uniform across a wave front, otherwise results are undefined. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32343 llvm-svn: 301197	2017-04-24 17:17:36 +00:00
Igor Breger	87aafa073f	[GlobalISel][X86] Lower FormalArgument/Ret using G_MERGE_VALUES/G_UNMERGE_VALUES. Summary: [GlobalISel][X86] Lower FormalArgument/Ret using G_MERGE_VALUES/G_UNMERGE_VALUES. Reviewers: zvi, t.p.northover, guyblank Reviewed By: t.p.northover Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32288 llvm-svn: 301194	2017-04-24 17:05:52 +00:00
Nicolai Haehnle	ef449787d8	AMDGPU: Fix crash when scheduling non-memory SMRD instructions Summary: Fixes piglit spec/arb_shader_clock/execution/* Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D32345 llvm-svn: 301191	2017-04-24 16:53:52 +00:00
Jonas Paulsson	1e8648577c	[SystemZ] Update kill-flag in splitMove(). EarlierMI needs to clear the kill flag on the first operand in case of a store. Review: Ulrich Weigand llvm-svn: 301177	2017-04-24 12:40:28 +00:00
Diana Picus	f53865daa4	[ARM] GlobalISel: Legalize s8 and s16 G_(S\|U)DIV We have to widen the operands to 32 bits and then we can either use hardware division if it is available or lower to a libcall otherwise. At the moment it is not enough to set the Legalizer action to WidenScalar, since for libcalls it won't know what to do (it won't be able to find what size to widen to, because it will find Libcall and not Legal for 32 bits). To hack around this limitation, we request Custom lowering, and as part of that we widen first and then we run another legalizeInstrStep on the widened DIV. llvm-svn: 301166	2017-04-24 09:12:19 +00:00
Sjoerd Meijer	e5b8557d5b	[Arch64AsmParser] better diagnostic for isb Instruction isb takes as an operand either 'sy' or an immediate value. This improves the diagnostic when the string is not 'sy' and adds a test case for this which was missing. This also adds tests to check invalid inputs for dsb and dmb. Differential Revision: https://reviews.llvm.org/D32227 llvm-svn: 301165	2017-04-24 08:22:20 +00:00
Diana Picus	b70e88bdec	[ARM] GlobalISel: Support G_(S\|U)DIV for s32 Add support for both targets with hardware division and without. For hardware division we have to add support throughout the pipeline (legalizer, reg bank select, instruction select). For targets without hardware division, we only need to mark it as a libcall. llvm-svn: 301164	2017-04-24 08:20:05 +00:00
Diana Picus	95a8aa93e2	[ARM] GlobalISel: Select G_CONSTANT with CImm operands When selecting a G_CONSTANT to a MOVi, we need the value to be an Imm operand. We used to just leave the G_CONSTANT operand unchanged, which works in some cases (such as the GEP offsets that we create when referring to stack slots). However, in many other places the G_CONSTANTs are created with CImm operands. This patch makes sure to handle those as well, and to error out gracefully if in the end we don't end up with an Imm operand. Thanks to Oliver Stannard for reporting this issue. llvm-svn: 301162	2017-04-24 06:30:56 +00:00
Simon Pilgrim	06d6263309	[X86][SSE] Add scheduler class support for SSE42 (PCMPGT) instructions llvm-svn: 301142	2017-04-23 21:23:27 +00:00
Renato Golin	4abfb3d741	Revert "[APInt] Fix a few places that use APInt::getRawData to operate within the normal API." This reverts commit r301105, 4, 3 and 1, as a follow up of the previous revert, which broke even more bots. For reference: Revert "[APInt] Use operator<<= where possible. NFC" Revert "[APInt] Use operator<<= instead of shl where possible. NFC" Revert "[APInt] Use ashInPlace where possible." PR32754. llvm-svn: 301111	2017-04-23 12:15:30 +00:00
Ayman Musa	137c44fe64	[X86][MPX] Add load & store instructions of bnd values to getLoadStoreRegOpcode function. This is needed for a follow up patch that generates the memory folding tables. Differential Revision: https://reviews.llvm.org/D32232 llvm-svn: 301109	2017-04-23 08:28:42 +00:00
Craig Topper	474e5de72d	[APInt] Fix a few places that use APInt::getRawData to operate within the normal API. getRawData exposes the internal type of the APInt class directly to its users. Ideally we wouldn't expose such an implementation detail. This patch fixes a few of the easy cases by using truncate, extract, or a rotate. llvm-svn: 301105	2017-04-23 06:41:11 +00:00
Craig Topper	cdd5ae6676	[APInt] Use operator<<= where possible. NFC llvm-svn: 301104	2017-04-23 05:43:02 +00:00
Craig Topper	5f68af0806	[APInt] Use operator<<= instead of shl where possible. NFC llvm-svn: 301103	2017-04-23 05:18:31 +00:00
Craig Topper	ae9672c96d	[APInt] Use ashInPlace where possible. llvm-svn: 301101	2017-04-23 03:45:59 +00:00
Daniel Sanders	2deea1878e	[globalisel][tablegen] Revise API for ComplexPattern operands to improve flexibility. Summary: Some targets need to be able to do more complex rendering than just adding an operand or two to an instruction. For example, it may need to insert an instruction to extract a subreg first, or it may need to perform an operation on the operand. In SelectionDAG, targets would create SDNode's to achieve the desired effect during the complex pattern predicate. This worked because SelectionDAG had a form of garbage collection that would take care of SDNode's that were created but not used due to a later predicate rejecting a match. This doesn't translate well to GlobalISel and the churn was wasteful. The API changes in this patch enable GlobalISel to accomplish the same thing without the waste. The API is now: InstructionSelector::OptionalComplexRendererFn selectArithImmed(MachineOperand &Root) const; where Root is the root of the match. The return value can be omitted to indicate that the predicate failed to match, or a function with the signature ComplexRendererFn can be returned. For example: return OptionalComplexRendererFn( [=](MachineInstrBuilder &MIB) { MIB.addImm(Immed).addImm(ShVal); }); adds two immediate operands to the rendered instruction. Immed and ShVal are captured from the predicate function. As an added bonus, this also reduces the amount of information we need to provide to GIComplexOperandMatcher. Depends on D31418 Reviewers: aditya_nandakumar, t.p.northover, qcolombet, rovka, ab, javed.absar Reviewed By: ab Subscribers: dberris, kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D31761 llvm-svn: 301079	2017-04-22 15:11:04 +00:00
Matthias Braun	d78597ec08	AArch64FrameLowering: Check if the ExtraCSSpill register is actually unused The code assumed that when saving an additional CSR register (ExtraCSSpill==true) we would have a free register throughout the function. This was not true if this CSR register is also used to pass values as in the swiftself case. rdar://31451816 llvm-svn: 301057	2017-04-21 22:42:08 +00:00
Hans Wennborg	9b9a5358dd	Re-commit r301040 "X86: Don't emit zero-byte functions on Windows" In addition to the original commit, tighten the condition for when to pad empty functions to COFF Windows. This avoids running into problems when targeting e.g. Win32 AMDGPU, which caused test failures when this was committed initially. llvm-svn: 301047	2017-04-21 21:48:41 +00:00
Hans Wennborg	04593000d8	Revert r301040 "X86: Don't emit zero-byte functions on Windows" This broke almost all bots. Reverting while fixing. llvm-svn: 301041	2017-04-21 21:10:37 +00:00
Hans Wennborg	cb3e810714	X86: Don't emit zero-byte functions on Windows Empty functions can lead to duplicate entries in the Guard CF Function Table of a binary due to multiple functions sharing the same RVA, causing the kernel to refuse to load that binary. We had a terrific bug due to this in Chromium. It turns out we were already doing this for Mach-O in certain situations. This patch expands the code for that in AsmPrinter::EmitFunctionBody() and renames TargetInstrInfo::getNoopForMachoTarget() to simply getNoop() since it seems it was used for not just Mach-O anyway. Differential Revision: https://reviews.llvm.org/D32330 llvm-svn: 301040	2017-04-21 20:58:12 +00:00
Tim Northover	e31cf3f824	ARM: make sure we use all entries in a vector before forming a vpaddl. Otherwise there's some mismatch, and we'll either form an illegal type or an illegal node. Thanks to Eli Friedman for pointing out the problem with my original solution. llvm-svn: 301036	2017-04-21 20:35:52 +00:00
Konstantin Zhuravlyov	f628406bbd	AMDGPU/GFX9: Enable FastFMAF32 Differential Revision: https://reviews.llvm.org/D32363 llvm-svn: 301029	2017-04-21 19:57:53 +00:00
Konstantin Zhuravlyov	3d1cc88c68	AMDGPU: Temporarily disable packed inlinable literals (v2f16, v2i16) Differential Revision: https://reviews.llvm.org/D32361 llvm-svn: 301028	2017-04-21 19:45:22 +00:00
Konstantin Zhuravlyov	88938d4e67	AMDGPU: Fix S_PACK_HH_B32_B16 - We really ought to zero out lower 16 bits Differential Revision: https://reviews.llvm.org/D32356 llvm-svn: 301026	2017-04-21 19:35:05 +00:00
Yaxun Liu	15a96b1dc8	[AMDGPU] Handle SI_MASKED_UNREACHABLE in instruction emitter SI_MASKED_UNREACHABLE does not have machine instruction encoding. It needs special handling in AMDGPUAsmPrinter::EmitInstruction like some other pseudo instructions. This patch fixes compilation failure of RadeonRays. Differential Revision: https://reviews.llvm.org/D32364 llvm-svn: 301025	2017-04-21 19:32:02 +00:00
Matthias Braun	1a9062408f	Revert "X86RegisterInfo: eliminateFrameIndex: Avoid code duplication; NFC" It seems we have on situation in a sanitizer enable bootstrap build where the return instruction has a frame index operand that does not point to a fixed object and fails the assert added here. This reverts commit r300923. This reverts commit r300922. llvm-svn: 301024	2017-04-21 19:26:45 +00:00
Konstantin Zhuravlyov	c4b18e7099	AMDGPU: Do not lower fast unsafe div for safe, f32, with fp32 denormals Differential Revision: https://reviews.llvm.org/D32085 llvm-svn: 301023	2017-04-21 19:25:33 +00:00
Akira Hatanaka	22e839f4b2	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300932 and r300930, which was causing dag-combine to loop forever. The problem was that optimizeLogicalImm was returning true even when there was no change to the immediate node (which happened when the immediate was all zeros or ones), which caused dag-combine to push and pop the same node to the work list over and over again without making any progress. This commit fixes the bug by returning false early in optimizeLogicalImm if the immediate is all zeros or ones. Also, it changes the code to compare the immediate with 0 or Mask rather than calling countPopulation. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 301019	2017-04-21 18:53:12 +00:00
Joel Jones	a7c4a52188	[AArch64] Refactor instruction selection lowering for addresses. NFCI Factor out the common code used for generating addresses into common templated functions that call overloaded versions of a new function, getTargetNode. Tested with make check-llvm with targets AArch64. Differential Revision: https://reviews.llvm.org/D32169 llvm-svn: 301005	2017-04-21 17:31:03 +00:00
Tim Northover	1061ccca8c	ARM: don't try to create an i8 -> i32 vpaddl. DAG combine was mistakenly assuming that the step-up it was looking at was always a doubling, but it can sometimes be a larger extension in which case we'd crash. llvm-svn: 301002	2017-04-21 17:21:59 +00:00
Daniel Sanders	e7b0d66080	[globalisel][tablegen] Import SelectionDAG's rule predicates and support the equivalent in GIRule. Summary: The SelectionDAG importer now imports rules with Predicate's attached via Requires, PredicateControl, etc. These predicates are implemented as bitset's to allow multiple predicates to be tested together. However, unlike the MC layer subtarget features, each target only pays for it's own predicates (e.g. AArch64 doesn't have 192 feature bits just because X86 needs a lot). Both AArch64 and X86 derive at least one predicate from the MachineFunction or Function so they must re-initialize AvailableFeatures before each function. They also declare locals in <Target>InstructionSelector so that computeAvailableFeatures() can use the code from SelectionDAG without modification. Reviewers: rovka, qcolombet, aditya_nandakumar, t.p.northover, ab Reviewed By: rovka Subscribers: aemerson, rengolin, dberris, kristof.beyls, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D31418 llvm-svn: 300993	2017-04-21 15:59:56 +00:00
Chad Rosier	428556c536	[AArch64][Falkor] Refine modeling of store-release exclusive instructions. llvm-svn: 300987	2017-04-21 14:58:32 +00:00
Joel Jones	97aaa23aec	[Mips] Document Mips Backend Relocation Principles This revision documents the combination of C++ and table-gen code that handles relocations and addresses. Thanks for Simon Dardis for the careful reviews. Differential Revision: https://reviews.llvm.org/D31628 llvm-svn: 300986	2017-04-21 14:49:27 +00:00
Chad Rosier	d631b9e500	[AArch64][Falkor] Refine resource needs of STRQ with register offset. llvm-svn: 300984	2017-04-21 14:33:13 +00:00
Daniel Sanders	419efdd55b	Revert r300964 + r300970 - [globalisel][tablegen] Import SelectionDAG's rule predicates and support the equivalent in GIRule. It's causing llvm-clang-x86_64-expensive-checks-win to fail to compile and I haven't worked out why. Reverting to make it green while I figure it out. llvm-svn: 300978	2017-04-21 14:09:20 +00:00
Chad Rosier	537defeeb5	[AArch64][Falkor] Refine loads/stores that require an extra LD pipe. llvm-svn: 300976	2017-04-21 13:55:41 +00:00
Chad Rosier	bbcc828833	[AArch64][Falkor] Fix number of microops for WriteSTIdx missed in r300892. llvm-svn: 300975	2017-04-21 13:37:01 +00:00
Chad Rosier	4f2e9e237f	[AArch64] Fix a few missed pre/post-inc in Falkor. llvm-svn: 300974	2017-04-21 13:36:57 +00:00
Diana Picus	64a33431eb	[ARM] GlobalISel: Add support for G_TRUNC Select them as copies. We only select if both the source and the destination are on the same register bank, so this shouldn't cause any trouble. llvm-svn: 300971	2017-04-21 13:16:50 +00:00
Diana Picus	f941ec0ecc	[ARM] GlobalISel: Make struct arguments fail elegantly The condition in isSupportedType didn't handle struct/array arguments properly. Fix the check and add a test to make sure we use the fallback path in this kind of situation. The test deals with some common cases where the call lowering should error out. There are still some issues here that need to be addressed (tail calls come to mind), but they can be addressed in other patches. llvm-svn: 300967	2017-04-21 11:53:01 +00:00
Daniel Sanders	279d03527e	[globalisel][tablegen] Import SelectionDAG's rule predicates and support the equivalent in GIRule. Summary: The SelectionDAG importer now imports rules with Predicate's attached via Requires, PredicateControl, etc. These predicates are implemented as bitset's to allow multiple predicates to be tested together. However, unlike the MC layer subtarget features, each target only pays for it's own predicates (e.g. AArch64 doesn't have 192 feature bits just because X86 needs a lot). Both AArch64 and X86 derive at least one predicate from the MachineFunction or Function so they must re-initialize AvailableFeatures before each function. They also declare locals in <Target>InstructionSelector so that computeAvailableFeatures() can use the code from SelectionDAG without modification. Reviewers: rovka, qcolombet, aditya_nandakumar, t.p.northover, ab Reviewed By: rovka Subscribers: aemerson, rengolin, dberris, kristof.beyls, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D31418 llvm-svn: 300964	2017-04-21 10:27:20 +00:00
Clement Courbet	41b4333066	typo llvm-svn: 300963	2017-04-21 09:21:05 +00:00
Clement Courbet	d5f6182bec	use repmovsb when optimizing forminsize llvm-svn: 300960	2017-04-21 09:20:55 +00:00
Clement Courbet	203fc17797	Rename FastString flag. llvm-svn: 300959	2017-04-21 09:20:50 +00:00
Clement Courbet	1ce3b82dea	X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copies when the subtarget has fast strings. This has two advantages: - Speed is improved. For example, on Haswell thoughput improvements increase linearly with size from 256 to 512 bytes, after which they plateau: (e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes). - Code is much smaller (no need to handle boundaries). llvm-svn: 300957	2017-04-21 09:20:39 +00:00
Clement Courbet	8177fee513	Delete dead code llvm-svn: 300952	2017-04-21 07:40:59 +00:00
Artyom Skrobov	8d9643009f	[Thumb1] The recently added tADCS and tSBCS pseudo-instructions were missing `Uses = [CPSR]` Summary: Thanks to Oliver Stannard for helping catch this. Reviewers: olista01, efriedma Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D31815 llvm-svn: 300951	2017-04-21 07:35:21 +00:00
Akira Hatanaka	78ccba6a20	Revert r300932 and r300930. It seems that r300930 was creating an infinite loop in dag-combine when compling the following file: MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c llvm-svn: 300940	2017-04-21 01:31:50 +00:00
Akira Hatanaka	e52caddae8	[AArch64] Use suffix ULL to shift a 64-bit value. llvm-svn: 300932	2017-04-21 00:35:27 +00:00
Akira Hatanaka	19077aaee0	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300913, which broke bots because I didn't fix a call to ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of TargetLoweringOpt and TargetLowering. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300930	2017-04-21 00:05:16 +00:00
Matthias Braun	9610a26251	X86RegisterInfo: eliminateFrameIndex: Avoid code duplication; NFC X86RegisterInfo::eliminateFrameIndex() and X86FrameLowering::getFrameIndexReference() both had logic to compute the base register. This consolidates the code. Also use MachineInstr::isReturn instead of manually enumerating tail call instructions (return instructions were not included in the previous list because they never reference frame indexes). Differential Revision: https://reviews.llvm.org/D32206 llvm-svn: 300923	2017-04-20 23:34:50 +00:00
Matthias Braun	63e3e8ce72	X86RegisterInfo: eliminateFrameIndex: Force SP for AfterFPPop; NFC AfterFPPop is used for tailcall/tailjump instructions. We shouldn't ever have frame-pointer/base-pointer relative addressing for those. After all the frame/base pointer should already be restored to their previous values at the return. Make this fact explicit in preparation for an upcoming refactoring. Differential Revision: https://reviews.llvm.org/D32205 llvm-svn: 300922	2017-04-20 23:34:46 +00:00
Akira Hatanaka	7b06cebe73	Revert "[AArch64] Improve code generation for logical instructions taking" This reverts r300913. This broke bots. llvm-svn: 300916	2017-04-20 23:03:30 +00:00
Akira Hatanaka	e327f09832	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300913	2017-04-20 22:47:56 +00:00
Tim Northover	100b7f6eae	AArch64: lower "fence singlethread" to a pure compiler barrier. Single-threaded fences aren't required to provide any synchronization with other processing elements so there's no need for a DMB. They should still be a barrier for compiler optimizations though. llvm-svn: 300905	2017-04-20 21:57:45 +00:00
Tim Northover	46e58354da	ARM: lower "fence singlethread" to a pure compiler barrier. Single-threaded fences aren't required to provide any synchronization with other processing elements so there's no need for a DMB. They should still be a barrier for compiler optimizations though. llvm-svn: 300904	2017-04-20 21:56:52 +00:00
Chad Rosier	4279c58ec4	[AArch64] Whitespace/ordering fixes for Falkor machine description. NFC. llvm-svn: 300893	2017-04-20 21:11:17 +00:00
Chad Rosier	a56bdbe62d	[AArch64] Refine Falkor machine description for pre/post-inc and stores. llvm-svn: 300892	2017-04-20 21:11:09 +00:00
Tim Northover	8b1240b0f0	ARM: handle post-indexed NEON ops where the offset isn't the access width. Before, we assumed that any ConstantInt offset was precisely the access width, so we could use the "[rN]!" form. ISelLowering only ever created that kind, but further simplification during combining could lead to unexpected constants and incorrect codegen. Should fix PR32658. llvm-svn: 300878	2017-04-20 19:54:02 +00:00
Chad Rosier	9f25dd56a8	[AArch64] Improve scheduling of logical operations on Falkor. llvm-svn: 300871	2017-04-20 18:50:21 +00:00
Weiming Zhao	962c5a3aec	[Thumb-1] Fix corner cases for compressed jump tables Summary: When synthesized TBB/TBH is expanded, we need to avoid the case of: BaseReg is redefined after the load of branching target. E.g.: %R2 = tLEApcrelJT <jt#1> %R1 = tLDRr %R1, %R2 ==> %R2 = tLEApcrelJT <jt#1> %R2 = tLDRspi %SP, 12 %R2 = tLDRspi %SP, 12 tBR_JTr %R1 tTBB_JT %R2, %R1 ` Reviewers: jmolloy Reviewed By: jmolloy Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D32250 llvm-svn: 300870	2017-04-20 18:37:14 +00:00
Benjamin Kramer	58dadd59d9	Fix use-after-frees on memory allocated in a Recycler. This will become asan errors once the patch lands that poisons the memory after free. The x86 change is a hack, but I don't see how to solve this properly at the moment. llvm-svn: 300867	2017-04-20 18:29:14 +00:00
Sam Clegg	90d99413ac	[WebAssembly] Add known failures for wasm object file backend Subscribers: jfb, dschuff Differential Revision: https://reviews.llvm.org/D32300 llvm-svn: 300859	2017-04-20 17:18:15 +00:00
Craig Topper	bcfd2d1789	[APInt] Rename getSignBit to getSignMask getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask. Differential Revision: https://reviews.llvm.org/D32108 llvm-svn: 300856	2017-04-20 16:56:25 +00:00
Petar Jovanovic	2b6fe3ffa6	[mips][msa] Mask vectors holding shift amounts Masked vectors which hold shift amounts when creating the following nodes: ISD::SHL, ISD::SRL or ISD::SRA. Instructions that use said nodes, which have had their arguments altered are sll, srl, sra, bneg, bclr and bset. For said instructions, the shift amount or the bit position that is specified in the corresponding vector elements will be interpreted as the shift amount/bit position modulo the size of the element in bits. The problem lies in compiling with -O2 enabled, where the instructions for formats .w and .d are not generated, but are instead optimized away. In this case, having shift amounts that are either negative or greater than the element bit size results in generation of incorrect results when constant folding. We remedy this by masking the operands for the nodes mentioned above before actually creating them, so that the final result is correct before placed into the constant pool. Patch by Stefan Maksimovic. Differential Revision: https://reviews.llvm.org/D31331 llvm-svn: 300839	2017-04-20 13:26:46 +00:00
John Brawn	66719f63d0	[ARM] Fix handling of mapping symbols when changing sections ChangeSection incorrectly registers LastEMSInfo as belonging to the previous section, not the current section. This happens to work when changing sections using .section, as the previous section is set to the current section before the call to ChangeSection, but not when using .popsection. Differential Revision: https://reviews.llvm.org/D32225 llvm-svn: 300831	2017-04-20 10:18:13 +00:00
John Brawn	5ca5daa6b9	[AArch64] Fix handling of zero immediate in fmov instructions Currently fmov #0 with a vector destination is handle incorrectly and results in fmov #-1.9375 being emitted but should instead give an error. This is due to the way we cope with fmov #0 with a scalar destination being an alias of fmov zr, so fix this by actually doing it through an alias. Differential Revision: https://reviews.llvm.org/D31949 llvm-svn: 300830	2017-04-20 10:13:54 +00:00
John Brawn	dcf037a6f0	[AArch64] Fix handling of integer fp immediates When an integer is used as an fp immediate we're failing to check the return value of getFP64Imm, so invalid values are silently permitted. Fix this by merging together the integer and real handling. llvm-svn: 300828	2017-04-20 10:10:10 +00:00
Diana Picus	7c6dee9f16	[ARM] Rename HW div feature to HW div Thumb. NFCI. The hardware div feature refers only to Thumb, but because of its name it is tempting to use it to check for hardware division in general, which may cause problems in ARM mode. See https://reviews.llvm.org/D32005. This patch adds "Thumb" to its name, to make its scope clear. One notable place where I haven't made the change is in the feature flag (used with -mattr), which is still hwdiv. Changing it would also require changes in a lot of tests, including clang tests, and it doesn't seem like it's worth the effort. Differential Revision: https://reviews.llvm.org/D32160 llvm-svn: 300827	2017-04-20 09:38:25 +00:00
Kannan Narayanan	2fb5960121	Revert earlier change. ds permute operations affect lgkm counter. Differential Revision: https://reviews.llvm.org/D32254 llvm-svn: 300791	2017-04-19 23:39:19 +00:00
Matthias Braun	372ee59766	X86FrameLowering: Fix getFrameIndexReference() for 'fixed' objects Debug information is calculated with getFrameIndexReference() which was missing some logic for the fixed object cases (= parameters on the stack). rdar://24557797 Differential Revision: https://reviews.llvm.org/D32204 llvm-svn: 300781	2017-04-19 23:10:43 +00:00
Matthias Braun	8aaa368d00	ARMFrameLowering: Reserve emergency spill slot for large arguments Re-commit after revert in r300668. Changed getMaxFPOffset() to a more conservative heuristic instead of trying to be clever and missing for some exotic calling conventions. We need to reserve an emergency spill slot in cases with large argument types that could overflow immediate offsets for FP relative address calculations. rdar://31317893 Differential Revision: https://reviews.llvm.org/D31643 llvm-svn: 300761	2017-04-19 21:11:44 +00:00
Matt Arsenault	4a48623e4f	AMDGPU: Custom lower illegal small select types Promote them to i32 vectors to avoid unpacking and re-packing the vectors. llvm-svn: 300754	2017-04-19 20:53:07 +00:00
Eli Friedman	70ad2751d5	[ARM] Remove redundant computeKnownBits helper. Move the BFI logic to computeKnownBitsForTargetNode, and delete the redundant CMOV logic. This is intended as a cleanup, but it's probably possible to construct a case where moving the BFI logic allows more combines. Differential Revision: https://reviews.llvm.org/D31795 llvm-svn: 300752	2017-04-19 20:50:57 +00:00
Aditya Nandakumar	75ad9ccbfa	[GISEL]: Move getConstantVReg to Utils NFCI llvm-svn: 300751	2017-04-19 20:48:50 +00:00
Eli Friedman	f281d490cc	[ARM] Use TableGen patterns to select vtbl. NFC. Differential Revision: https://reviews.llvm.org/D32103 llvm-svn: 300749	2017-04-19 20:39:39 +00:00
Dehao Chen	58601674d2	PR32710: Disable using PMADDWD for unsigned short. Summary: PMADDWD can only handle signed short. Reviewers: mkuper, wmi Reviewed By: mkuper Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D32236 llvm-svn: 300737	2017-04-19 19:50:34 +00:00
Matt Arsenault	021a218dd2	AMDGPU: Don't emit amd_kernel_code_t for callable functions This is inserted directly in the text section. The relocation for the function ends up resolving to the beginning of the amd_kernel_code_t header rather than the actual function entry point. Also skip some of the comments for initialization that only makes sense for kernels. llvm-svn: 300736	2017-04-19 19:38:10 +00:00
Tim Northover	ff168c68dc	ARM: TLS calling convention doesn't preserve r9 or r12 on Darwin. llvm-svn: 300726	2017-04-19 18:07:54 +00:00
Matt Arsenault	6cb7b8a42f	AMDGPU: Don't align callable functions to 256 llvm-svn: 300720	2017-04-19 17:42:39 +00:00
Matt Arsenault	4c1ecded63	AMDGPU: Change DivergenceAnalysis for function arguments Stop assuming all functions are kernels. llvm-svn: 300719	2017-04-19 17:42:34 +00:00
Krzysztof Parzyszek	333b2bf2ed	[Hexagon] Generate proper offset in opt-addr-mode Also, make a few changes to allow using the pass in .mir testcases. Among other things, change the abbreviation from opt-amode to amode-opt, because otherwise lit would expand the "opt" part to the full path to the opt binary. llvm-svn: 300707	2017-04-19 15:15:51 +00:00
Krzysztof Parzyszek	634f57e0bb	[Hexagon] Remove RDefMap, use Liveness:getNearestAliasedRef instead llvm-svn: 300706	2017-04-19 15:14:30 +00:00
Krzysztof Parzyszek	0de74f315d	[RDF] Switch NodeList to SmallVector from std::vector The list has a single element 75+% of the time, reservation of 4 elements is sufficient in 95% of cases. llvm-svn: 300705	2017-04-19 15:12:44 +00:00
Krzysztof Parzyszek	7c69a3b490	[RDF] Use faster version of findBlock llvm-svn: 300704	2017-04-19 15:11:23 +00:00
Krzysztof Parzyszek	6aa3a3f00b	[RDF] Cache register units for reg masks instead of recalculating them llvm-svn: 300702	2017-04-19 15:10:09 +00:00
Krzysztof Parzyszek	5bfaf56ee5	[Hexagon] Cache reached blocks in bit tracker instead of scanning list llvm-svn: 300701	2017-04-19 15:08:31 +00:00
Igor Breger	4fdf1e489c	[GlobalIsel][X86] support G_TRUNC selection. Summary: [GlobalIsel][X86] support G_TRUNC selection. Add regbank-select and legalizer tests. Currently legalization of trunc i64 on 32bit platform not supported. Reviewers: ab, zvi, rovka Reviewed By: zvi Subscribers: dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D32115 llvm-svn: 300678	2017-04-19 11:34:59 +00:00
Renato Golin	742aed8683	Revert "ARMFrameLowering: Reserve emergency spill slot for large arguments" This reverts commit r300639, as it broke self-hosting on ARM. PR32709. llvm-svn: 300668	2017-04-19 09:02:52 +00:00
Diana Picus	49472ff1cf	[ARM] GlobalISel: Add support for G_MUL Support G_MUL, very similar to G_ADD and G_SUB. The only difference is in the instruction selector, where we have to select either MUL or MULv5 depending on the target. llvm-svn: 300665	2017-04-19 07:29:46 +00:00
Kristof Beyls	0f36e68f62	[GlobalISel] Support vector-of-pointers in LLT This fixes PR32471. As comment 10 on that bug report highlights (https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a few different defendable design tradeoffs that could be made, including not representing pointers at all in LLT. I decided to go for representing vector-of-pointer as a concept in LLT, while keeping the size of the LLT type 64 bits (this is an increase from 48 bits before). My rationale for keeping pointers explicit is that on some targets probably it's very handy to have the distinction between pointer and non-pointer (e.g. 68K has a different register bank for pointers IIRC). If we keep a scalar pointer, it probably is easiest to also have a vector-of-pointers to keep LLT relatively conceptually clean and orthogonal, while we don't have a very strong reason to break that orthogonality. Once we gain more experience on the use of LLT, we can of course reconsider this direction. Rejecting vector-of-pointer types in the IRTranslator is also an option to avoid the crash reported in PR32471, but that is only a very short-term solution; also needs quite a bit of code tweaks in places, and is probably fragile. Therefore I didn't consider this the best option. llvm-svn: 300664	2017-04-19 07:23:57 +00:00
Serge Pavlov	5943a96d81	ARM: Use methods to access data stored with frame instructions In r300196 several methods were added to TarfetInstrInfo to access data stored with call frame setup/destroy instructions. This change replaces calls to getOperand with calls to such special methods in ARM target. Differential Revision: https://reviews.llvm.org/D32127 llvm-svn: 300655	2017-04-19 03:12:05 +00:00
Leslie Zhai	b86e9a1c14	[AVR] Migrate to new MCAsmInfo CodePointerSize Reviewers: dylanmckay, rengolin, kzhuravl, jroelofs Reviewed By: kzhuravl, jroelofs Subscribers: kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D32154 llvm-svn: 300641	2017-04-19 01:20:43 +00:00
Matthias Braun	661d3d4b00	ARMFrameLowering: Reserve emergency spill slot for large arguments We need to reserve an emergency spill slot in cases with large argument types that could overflow immediate offsets for FP relative address calculations. rdar://31317893 Differential Revision: https://reviews.llvm.org/D31643 llvm-svn: 300639	2017-04-19 01:16:07 +00:00
Dylan McKay	eb24b850c5	[AVR] Fix the build 'PointerSize' was renamed to 'CodePointerSize'. llvm-svn: 300629	2017-04-18 23:53:10 +00:00
Sanjoy Das	f09c1e346e	Add a getPointerOperandType() helper to LoadInst and StoreInst; NFC I will use this in a later change. llvm-svn: 300613	2017-04-18 22:00:54 +00:00
Matt Arsenault	3138075dd4	DAG: Make mayBeEmittedAsTailCall parameter const llvm-svn: 300603	2017-04-18 21:16:46 +00:00
Matt Arsenault	aa31dce3c5	Fix typo llvm-svn: 300597	2017-04-18 20:59:46 +00:00
Matt Arsenault	161e2b4223	AMDGPU: Make MFI fields private llvm-svn: 300596	2017-04-18 20:59:40 +00:00
Simon Pilgrim	e8ad1da4e2	[X86] Use for-range loop. NFCI. llvm-svn: 300567	2017-04-18 17:18:54 +00:00
Craig Topper	fc947bcfba	[APInt] Use lshrInPlace to replace lshr where possible This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result. This adds an lshrInPlace(const APInt &) version as well. Differential Revision: https://reviews.llvm.org/D32155 llvm-svn: 300566	2017-04-18 17:14:21 +00:00
Oliver Stannard	7ad2e8aae1	[ARM] Add hardware build attributes in assembler In the assembler, we should emit build attributes based on the target selected with command-line options. This matches the GNU assembler's behaviour. We only do this for build attributes which describe the hardware that is expected to be available, not the ones that describe ABI compatibility. This is done by moving some of the attribute emission code to ARMTargetStreamer, so that it can be shared between the assembly and code-generation code paths. Since the assembler only creates a MCSubtargetInfo, not an ARMSubtarget, the code had to be changed to check raw features, and not use the convenience functions in ARMSubtarget. If different attributes are later specified using the .eabi_attribute directive, then they will take precedence, as happens when the same .eabi_attribute is specified twice. This must be enabled by an option, because we don't want to do this when parsing inline assembly. The attributes would match the ones emitted at the start of the file, so wouldn't actually change the emitted object file, but the extra directives would be added to every inline assembly block when emitting assembly, which we'd like to avoid. The majority of the changes in the build-attributes.ll test are just re-ordering the directives, because the hardware attributes are now emitted before the ABI ones. However, I did fix one bug which I spotted: Tag_CPU_arch_profile was not being emitted for v6M. Differential revision: https://reviews.llvm.org/D31812 llvm-svn: 300547	2017-04-18 12:52:35 +00:00
Diana Picus	a3a0cccb2c	[ARM] GlobalISel: Add support for G_SUB Support G_SUB throughout the GlobalISel pipeline. It is exactly the same as G_ADD, nothing fancy. llvm-svn: 300546	2017-04-18 12:35:28 +00:00
Kristof Beyls	a4e79cca77	Revert "[GlobalISel] Support vector-of-pointers in LLT" This reverts r300535 and r300537. The newly added tests in test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll produces slightly different code between LLVM versions being built with different compilers. E.g., dependent on the compiler LLVM is built with, either one of the following can be produced: remark: <unknown>:0:0: unable to legalize instruction: %vreg0<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg2; (in function: vector_of_pointers_extractelement) remark: <unknown>:0:0: unable to legalize instruction: %vreg2<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg0; (in function: vector_of_pointers_extractelement) Non-determinism like this is clearly a bad thing, so reverting this until I can find and fix the root cause of the non-determinism. llvm-svn: 300538	2017-04-18 09:26:36 +00:00
Diana Picus	e2626bb7c2	[ARM] Check for correct HW div when lowering divmod For subtargets that use the custom lowering for divmod, e.g. gnueabi, we used to check if the subtarget has hardware divide and then lower to a div-mul-sub sequence if true, or to a libcall if false. However, judging by the usage of hasDivide vs hasDivideInARMMode, it seems that hasDivide only refers to Thumb. For instance, in the ARMTargetLowering constructor, the code that specifies whether to use libcalls for (S\|U)DIV looks like this: bool hasDivide = Subtarget->isThumb() ? Subtarget->hasDivide() : Subtarget->hasDivideInARMMode(); In the case of divmod for arm-gnueabi, using only hasDivide() to determine what to do means that instead of lowering to __aeabi_idivmod to get the remainder, we lower to div-mul-sub and then further lower the div to __aeabi_idiv. Even worse, if we have hardware divide in ARM but not in Thumb, we generate a libcall instead of using it (this is not an issue in practice since AFAICT none of the cores that we support have hardware divide in ARM but not Thumb). This patch fixes the code dealing with custom lowering to take into account the mode (Thumb or ARM) when deciding whether or not hardware division is available. Differential Revision: https://reviews.llvm.org/D32005 llvm-svn: 300536	2017-04-18 08:32:27 +00:00
Kristof Beyls	fb73eb0324	[GlobalISel] Support vector-of-pointers in LLT This fixes PR32471. As comment 10 on that bug report highlights (https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a few different defendable design tradeoffs that could be made, including not representing pointers at all in LLT. I decided to go for representing vector-of-pointer as a concept in LLT, while keeping the size of the LLT type 64 bits (this is an increase from 48 bits before). My rationale for keeping pointers explicit is that on some targets probably it's very handy to have the distinction between pointer and non-pointer (e.g. 68K has a different register bank for pointers IIRC). If we keep a scalar pointer, it probably is easiest to also have a vector-of-pointers to keep LLT relatively conceptually clean and orthogonal, while we don't have a very strong reason to break that orthogonality. Once we gain more experience on the use of LLT, we can of course reconsider this direction. Rejecting vector-of-pointer types in the IRTranslator is also an option to avoid the crash reported in PR32471, but that is only a very short-term solution; also needs quite a bit of code tweaks in places, and is probably fragile. Therefore I didn't consider this the best option. llvm-svn: 300535	2017-04-18 08:12:45 +00:00
Leslie Zhai	d6fe0db8eb	test commit llvm-svn: 300532	2017-04-18 07:28:54 +00:00
Davide Italiano	3e9986f368	[Target] Use hasOneUse() instead of getNumUses(). The latter does a liner scan over a linked list, therefore is much more expensive. llvm-svn: 300518	2017-04-18 00:29:54 +00:00
Jacob Gravelle	0bb7541233	[WebAssembly] Fix WebAssemblyOptimizeReturned after r300367 Summary: Refactoring changed paramHasAttr(1 + i) to paramHasAttr(0), fix that to paramHasAttr(i). Add more tests to WebAssemblyOptimizeReturned that catch that regression. Reviewers: dschuff Subscribers: jfb, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D32136 llvm-svn: 300502	2017-04-17 21:40:28 +00:00
Derek Schuff	f7a4f3dd95	[WebAssembly] Encode block signatures as SLEB instead of ULEB Use SLEB (varint) for block_type immediates in accordance with the spec. Patch by Yury Delendik llvm-svn: 300490	2017-04-17 20:28:28 +00:00
Matt Arsenault	a3566f2149	AMDGPU: Use MachineRegisterInfo to find max used register Avoid looping through program to determine register counts. This avoids needing to look at regmask operands. Also fixes some counting errors with flat_scr when there are no stack objects. llvm-svn: 300482	2017-04-17 19:48:30 +00:00
Matt Arsenault	869fec278c	AMDGPU: Change stack alignment While the incoming stack for a kernel is 256-byte aligned, this refers to the base address of the entire wave. This isn't useful information for most of codegen. Fixes unnecessarily aligning stack objects in callees. llvm-svn: 300481	2017-04-17 19:48:24 +00:00
Benjamin Kramer	54c781a0b5	Unbreak build of the wasm backend after r300463. llvm-svn: 300479	2017-04-17 19:08:41 +00:00
Tim Northover	46e36f0953	AArch64: put nonlazybind special handling behind a flag for now. It's basically a terrible idea anyway but objc_msgSend gets emitted like that. We can decide on a better way to deal with it in the unlikely event that anyone actually uses it. llvm-svn: 300474	2017-04-17 18:18:47 +00:00
Konstantin Zhuravlyov	12096848fd	AMDGPU: Set CodePointerSize to 8 for amdgcn llvm-svn: 300470	2017-04-17 18:02:09 +00:00
Konstantin Zhuravlyov	dc77b2e960	Distinguish between code pointer size and DataLayout::getPointerSize() in DWARF info generation llvm-svn: 300463	2017-04-17 17:41:25 +00:00
Tim Northover	879a0b2e1b	AArch64: support nonlazybind It's almost certainly not a good idea to actually use it in most cases (there's a pretty large code size overhead on AArch64), but we can't do those experiments until it's supported. llvm-svn: 300462	2017-04-17 17:27:56 +00:00
Benjamin Kramer	f5f593b674	[X86] Remove special handling for 16 bit for A asm constraints. Our 16 bit support is assembler-only + the terrible hack that is .code16gcc. Simply using 32 bit registers does the right thing for the latter. Fixes PR32681. llvm-svn: 300429	2017-04-16 20:13:08 +00:00
Dimitry Andric	909b3376ba	Use correct registers for "A" inline asm constraint Summary: In PR32594, inline assembly using the 'A' constraint on x86_64 causes llvm to crash with a "Cannot select" stack trace. This is because `X86TargetLowering::getRegForInlineAsmConstraint` hardcodes that 'A' means the EAX and EDX registers. However, on x86_64 it means the RAX and RDX registers, and on 16-bit x86 (ia16?) it means the old AX and DX registers. Add new register classes in `X86RegisterInfo.td` to support these cases, and amend the logic in `getRegForInlineAsmConstraint` to cope with different subtargets. Also add a test case, derived from PR32594. Reviewers: craig.topper, qcolombet, RKSimon, ab Reviewed By: ab Subscribers: ab, emaste, royger, llvm-commits Differential Revision: https://reviews.llvm.org/D31902 llvm-svn: 300404	2017-04-15 22:15:01 +00:00
Krzysztof Parzyszek	9edaea21af	[RDF] No longer ignore implicit defs or uses on any instructions This used to be a Hexagon-specific treatment, but is no longer needed since it's switched to subregister liveness tracking. llvm-svn: 300369	2017-04-14 21:19:17 +00:00
Krzysztof Parzyszek	fabb68fc06	[RDF] Correctly enumerate reg units for reg masks llvm-svn: 300368	2017-04-14 21:17:36 +00:00
Reid Kleckner	fb502d2f5e	[IR] Make paramHasAttr to use arg indices instead of attr indices This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern. Previously we were testing return value attributes with index 0, so I introduced hasReturnAttr() for that use case. llvm-svn: 300367	2017-04-14 20:19:02 +00:00
Stanislav Mekhanoshin	eff0bc7839	[AMDGPU] set read_only access qualifier for pointers If a kernel's pointer argument is known to be readonly set access qualifier accordingly. This allows RT not to flush caches before dispatches. Differential Revision: https://reviews.llvm.org/D32091 llvm-svn: 300362	2017-04-14 19:11:40 +00:00
Krzysztof Parzyszek	74b1f254d4	[RDF] Switch RegisterAggr to a bit vector of register units This avoids many complications related to the complex register aliasing schemes. llvm-svn: 300345	2017-04-14 17:25:13 +00:00
Krzysztof Parzyszek	4fe9d6c640	[RDF] Refine propagation of reached uses in liveness computation llvm-svn: 300337	2017-04-14 16:33:54 +00:00
Krzysztof Parzyszek	f928e24d2a	[Hexagon] Fix a latent problem with interpreting live-in lane masks A non-zero lane mask on a register with no subregister means that the whole register is live-in. It is equivalent to a full mask. llvm-svn: 300335	2017-04-14 16:21:55 +00:00
Krzysztof Parzyszek	643aaea59e	[Hexagon] Make a couple of passes compliant with -opt-bisect-limit llvm-svn: 300329	2017-04-14 15:26:34 +00:00
Simon Pilgrim	5a22eaa2bf	[X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (LLVM) MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics. Clang companion patch: D31766. Differential Revision: https://reviews.llvm.org/D31767 llvm-svn: 300325	2017-04-14 15:05:35 +00:00
Dmitry Preobrazhensky	e6ef099dcd	[AMDGPU][MC] Corrected ds_write_src2_* to require one offset instead of two. Fixed bug 32551: https://bugs.llvm.org//show_bug.cgi?id=32551 Reviewers: vpykhtin Differential Revision: https://reviews.llvm.org/D31809 llvm-svn: 300319	2017-04-14 12:28:07 +00:00
Dmitry Preobrazhensky	5714860ee4	[AMDGPU][MC] Enabled constants for src operands of s_cbranch_g_fork Fixed bug 32619: https://bugs.llvm.org//show_bug.cgi?id=32619 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D31973 llvm-svn: 300318	2017-04-14 11:52:26 +00:00
Andrew V. Tischenko	75745d0c3e	This patch closes PR#32216: Better testing of schedule model instruction latencies/throughputs. The details are here: https://reviews.llvm.org/D30941 llvm-svn: 300311	2017-04-14 07:44:23 +00:00
Stanislav Mekhanoshin	86b0a5465b	[AMDGPU] added SIInstrInfo::getAddNoCarry() helper Addressed rest of post submit comments from D31993. Differential Revision: https://reviews.llvm.org/D32057 llvm-svn: 300288	2017-04-14 00:33:44 +00:00
Adam Nemet	c5779460f4	[AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16 This further improves Ahmed's change in rL299482. See the new comment for the rationale. The patch recovers most of the regression for bzip2 after D31965. We're down to +2.68% from +6.97%. Differential Revision: https://reviews.llvm.org/D32028 llvm-svn: 300276	2017-04-13 23:32:47 +00:00
Konstantin Zhuravlyov	d24aeb20fc	AMDGPU/GFX9: Do not use v_pack_b32_f16 when packing Differential Revision: https://reviews.llvm.org/D31819 llvm-svn: 300275	2017-04-13 23:17:00 +00:00
Reid Kleckner	f021fab2af	[IR] Make getParamAttributes take argument numbers, not ArgNo+1 Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1, Kind) everywhere. The fact that the AttributeList index for an argument is ArgNo+1 should be a hidden implementation detail. NFC llvm-svn: 300272	2017-04-13 23:12:13 +00:00
Alexei Starovoitov	56db145164	[bpf] Fix memory offset check for loads and stores If the offset cannot fit into the instruction, an addition to the pointer is emitted before the actual access. However, BPF offsets are 16-bit but LLVM considers them to be, for the matter of this check, to be 32-bit long. This causes the following program: int bpf_prog1(void ign) { volatile unsigned long t = 0x8983984739ull; return (unsigned long )((0xffffffff8fff0002ull) + t); } To generate the following (wrong) code: 0: 18 01 00 00 39 47 98 83 00 00 00 00 89 00 00 00 r1 = 590618314553ll 2: 7b 1a f8 ff 00 00 00 00 (u64 )(r10 - 8) = r1 3: 79 a1 f8 ff 00 00 00 00 r1 = (u64 )(r10 - 8) 4: 79 10 02 00 00 00 00 00 r0 = (u64 *)(r1 + 2) 5: 95 00 00 00 00 00 00 00 exit Fix it by changing the offset check to 16-bit. Patch by Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Differential Revision: https://reviews.llvm.org/D32055 llvm-svn: 300269	2017-04-13 22:24:13 +00:00
Reid Kleckner	dbc9ba3061	Fix -Wunused-value warning llvm-svn: 300254	2017-04-13 20:32:58 +00:00
Stanislav Mekhanoshin	d026f79bd3	[AMDGPU] Combine DS operations with offsets bigger than byte In many cases ds operations can be combined even if offsets do not fit into 8 bit encoding. What it takes is to adjust base address. Differential Revision: https://reviews.llvm.org/D31993 llvm-svn: 300227	2017-04-13 17:53:07 +00:00
Krzysztof Parzyszek	5619952ee1	[Hexagon] Implement HexagonTargetLowering::CanLowerReturn Patch by Michael Wu. Differential Revision: https://reviews.llvm.org/D32000 llvm-svn: 300199	2017-04-13 15:05:51 +00:00
Krzysztof Parzyszek	3e2046cd1b	[Hexagon] Fix "LowerFormalArguments emitted a value with the wrong type!" assertion Patch by Michael Wu. Differential Revision: https://reviews.llvm.org/D31999 llvm-svn: 300198	2017-04-13 15:00:18 +00:00
Serge Pavlov	49acf9c8eb	Use methods to access data stored with frame instructions Instructions CALLSEQ_START..CALLSEQ_END and their target dependent counterparts keep data like frame size, stack adjustment etc. These data are accessed by getOperand using hard coded indices. It is error prone way. This change implements the access by special methods, which improve readability and allow changing data representation without massive changes of index values. Differential Revision: https://reviews.llvm.org/D31953 llvm-svn: 300196	2017-04-13 14:10:52 +00:00
Ayman Musa	62d1c71676	[X86] Added missing mayLoad/mayStore attributes to some X86 instructions. Throughout the effort of automatically generating the X86 memory folding tables these missing information were encountered. This is a preparation work for a future patch including the automation of these tables. Differential Revision: https://reviews.llvm.org/D31714 llvm-svn: 300190	2017-04-13 10:03:45 +00:00
Ayman Musa	c494718050	[X86] Change instructions names to keep consistency with the naming convention. NFC Differential Revision: https://reviews.llvm.org/D31743 llvm-svn: 300184	2017-04-13 09:12:32 +00:00
Reid Kleckner	7f72033e1c	[IR] Take func, ret, and arg attrs separately in AttributeList::get This seems like a much more natural API, based on Derek Schuff's comments on r300015. It further hides the implementation detail of AttributeList that function attributes come last and appear at index ~0U, which is easy for the user to screw up. git diff says it saves code as well: 97 insertions(+), 137 deletions(-) This also makes it easier to change the implementation, which I want to do next. llvm-svn: 300153	2017-04-13 00:58:09 +00:00
Wei Ding	74da350b85	AMDGPU : Fix common dominator of two incoming blocks terminates with uniform branch issue. Differential Revision: http://reviews.llvm.org/D31350 llvm-svn: 300142	2017-04-12 23:51:47 +00:00
Matt Arsenault	0d0d6c2f25	AMDGPU: Fix invalid copies when copying i1 to phys reg Insert a VReg_1 virtual register so the i1 workaround pass can handle it. llvm-svn: 300113	2017-04-12 21:58:23 +00:00
Stanislav Mekhanoshin	c90347d760	[AMDGPU] Generate range metadata for workitem id If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 llvm-svn: 300102	2017-04-12 20:48:56 +00:00
Dmitry Preobrazhensky	14104e0d0f	[AMDGPU][MC] Added support for several VI-specific opcodes (s_wakeup, etc) Added support for VI: - s_endpgm_saved - s_wakeup - s_rfe_restore_b64 - v_perm_b32 Enabled for VI: - v_mov_fed_b32 - v_mov_fed_b32_e64 See bug 32593: https://bugs.llvm.org//show_bug.cgi?id=32593 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D31931 llvm-svn: 300076	2017-04-12 17:10:07 +00:00
Dmitry Preobrazhensky	5ac9fd64a3	[AMDGPU][MC] Corrected parsing of v_cmp_class* and v_cmpx_class* Fixed bug 32565: https://bugs.llvm.org//show_bug.cgi?id=32565 Reviewers: vpykhtin Differential Revision: https://reviews.llvm.org/D31820 llvm-svn: 300073	2017-04-12 16:31:18 +00:00
Derek Schuff	0db0ca3837	[WebAssembly] Update use of Attributes after r299875 This fixes the failing WebAssemblyLowerEmscriptenEHSjLj tests llvm-svn: 300072	2017-04-12 16:03:00 +00:00
Dmitry Preobrazhensky	3bff0c8c59	[AMDGPU][MC] Corrected encoding of V_MQSAD_U32_U8 for CI Corrected encoding of V_MQSAD_U32_U8 for CI See bug 32552: https://bugs.llvm.org//show_bug.cgi?id=32552 Reviewers: vpykhtin Differential Revision: https://reviews.llvm.org/D31810 llvm-svn: 300070	2017-04-12 15:36:09 +00:00
Easwaran Raman	02a0e91831	Fix the bootstrap failure caused by r299986. llvm-svn: 300069	2017-04-12 15:26:15 +00:00
Dmitry Preobrazhensky	7184c44d66	[AMDGPU][MC] Corrected ds_wrxchg2* to support two offsets Fixed bug 28227: https://bugs.llvm.org//show_bug.cgi?id=28227 Reviewers: vpykhtin Differential Revision: https://reviews.llvm.org/D31808 llvm-svn: 300066	2017-04-12 14:29:45 +00:00
Igor Breger	3b97ea39e7	[GlobalIsel][X86] support G_CONSTANT selection. Summary: [GlobalISel][X86] support G_CONSTANT selection. Add regbank select tests. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: llvm-commits, dberris, rovka, kristof.beyls Differential Revision: https://reviews.llvm.org/D31974 llvm-svn: 300057	2017-04-12 12:54:54 +00:00
Jonas Paulsson	da74ed42da	[LoopVectorizer, TTI] New method supportsEfficientVectorElementLoadStore() Since SystemZ supports vector element load/store instructions, there is no need for extracts/inserts if a vector load/store gets scalarized. This patch lets Target specify that it supports such instructions by means of a new TTI hook that defaults to false. The use for this is in the LoopVectorizer getScalarizationOverhead() method, which will with this patch produce a smaller sum for a vector load/store on SystemZ. New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll Review: Adam Nemet https://reviews.llvm.org/D30680 llvm-svn: 300056	2017-04-12 12:41:37 +00:00
Dmitry Preobrazhensky	12194e9bec	[AMDGPU][MC] Corrected src0 size for s_cbranch_join Fix for bug 28159: https://bugs.llvm.org//show_bug.cgi?id=28159 Reviewers: vpykhtin, arsenm Differential Revision: https://reviews.llvm.org/D31595 llvm-svn: 300055	2017-04-12 12:40:19 +00:00
Jonas Paulsson	fccc7d66c3	[SystemZ] TargetTransformInfo cost functions implemented. getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(), getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(), getInterleavedMemoryOpCost() implemented. Interleaved access vectorization enabled. BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads, in which case the cost of the z/sext instruction becomes 0. Review: Ulrich Weigand, Renato Golin. https://reviews.llvm.org/D29631 llvm-svn: 300052	2017-04-12 11:49:08 +00:00
Sam Kolton	aff8341da2	[AMDGPU] SDWA: make pass global Summary: Remove checks for basic blocks. Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31935 llvm-svn: 300040	2017-04-12 09:36:05 +00:00
Kannan Narayanan	acb089e12a	[AMDGPU] Add a new pass to insert waitcnts. Leave under an option for testing. Based on comments in https://reviews.llvm.org/D31161. llvm-svn: 300023	2017-04-12 03:25:12 +00:00
Derek Schuff	821637aa52	Revert "[WebAssembly] Update use of Attributes after r299875" This reverts commit 2a0eb61dcccb15058d5b2a572bb3da0cf47fd550, r300015 I raced with rnk on the commit. llvm-svn: 300016	2017-04-12 01:17:31 +00:00
Derek Schuff	857a7e5473	[WebAssembly] Update use of Attributes after r299875 This fixes the failing WebAssemblyLowerEmscriptenEHSjLj tests llvm-svn: 300015	2017-04-12 01:09:34 +00:00
Matt Arsenault	9ac40026dd	AMDGPU: Insert wait at start of callee functions llvm-svn: 300000	2017-04-11 22:29:31 +00:00
Matt Arsenault	efa9f4b210	AMDGPU: Refactor SIMachineFunctionInfo slightly Prepare for handling non-entry functions. llvm-svn: 299999	2017-04-11 22:29:28 +00:00
Matt Arsenault	e622dc3803	AMDGPU: Refactor argument lowering Split into smaller functions and prepare for handling non-entry functions. llvm-svn: 299998	2017-04-11 22:29:24 +00:00
Matt Arsenault	fe78ffba92	AMDGPU: Fix folding reg_sequence into copy to phys reg This was producing an illegal reg_sequence defining a physical register with virtual register inputs. llvm-svn: 299997	2017-04-11 22:29:19 +00:00
Matt Arsenault	978b1667d2	AMDGPU: Prune unecessary include llvm-svn: 299996	2017-04-11 22:29:16 +00:00
Balaram Makam	c53c44cec4	[AArch64] Fix scheduling info for INS(vector, general) instruction. llvm-svn: 299994	2017-04-11 22:14:10 +00:00
Easwaran Raman	ddb9ae192a	[x86] Relax the check in areLoadsFromSameBasePtr Check if the scale operand is identical (doesn't have to be 1) and do not check the chaain operand. Differential revision: https://reviews.llvm.org/D31833 llvm-svn: 299986	2017-04-11 21:05:02 +00:00
Evandro Menezes	203eef0ed5	[AArch64] Simplify MacroFusion This patch assumes that the dependents to be scanned for the ExitSU are its predecessors; otherwise, the successors of the instr are scanned. Furthermore, sometimes the ExitSU was being fused twice, since it may be fused once when scanning the successors from the beginning of the BB and then again when scanning the predecessors of ExitSU. Thus, when scanning the successors of an instr, skip the ExitSU. llvm-svn: 299974	2017-04-11 19:13:11 +00:00
Davide Italiano	8455f7d623	[X86] Create the correct ADC/SBB SDNode when lowering add. Differential Revision: https://reviews.llvm.org/D31911 llvm-svn: 299973	2017-04-11 19:11:20 +00:00
Craig Topper	957a94cc03	Fix spelling compliment->complement. Mostly refering to 2s complement. NFC llvm-svn: 299970	2017-04-11 18:47:58 +00:00
Yaxun Liu	e95df719e1	[AMDGPU] Add A5 to data layout for amdgiz environment Differential Revision: https://reviews.llvm.org/D31589 llvm-svn: 299964	2017-04-11 17:18:13 +00:00
Serge Guelton	59a2d7b909	Module::getOrInsertFunction is using C-style vararg instead of variadic templates. From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments. The variadic template is an obvious solution to both issues. Differential Revision: https://reviews.llvm.org/D31070 llvm-svn: 299949	2017-04-11 15:01:18 +00:00
Vassil Vassilev	e1f12fadc0	Remove unused functions. Remove static qualifier from functions in header files. NFC. llvm-svn: 299947	2017-04-11 14:55:32 +00:00
Jonathan Roelofs	5e39c44654	[AVR] Migrate to new MCAsmBackend applyFixup https://reviews.llvm.org/D31875 Patch by Leslie Zhai! llvm-svn: 299946	2017-04-11 14:51:49 +00:00
Sam Parker	83b64654fd	[ARM] Refactor Thumb2 sat instructions Refactor the USAT, SSAT, USAT16 and SSAT16 instruction descriptions for Thumb2. Differential Revision: https://reviews.llvm.org/D31933 llvm-svn: 299945	2017-04-11 14:42:08 +00:00
Diana Picus	1314a2889c	GlobalISel: Allow legalizing G_FADD to a libcall Use the same handling in the generic legalizer code as for the other libcalls (G_FREM, G_FPOW). Enable it on ARM for float and double so we can test it. llvm-svn: 299931	2017-04-11 10:52:34 +00:00
Diana Picus	b050c7fbe0	Revert "Turn some C-style vararg into variadic templates" This reverts commit r299925 because it broke the buildbots. See e.g. http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/6008 llvm-svn: 299928	2017-04-11 10:07:12 +00:00
Serge Guelton	5fd75fb72e	Turn some C-style vararg into variadic templates Module::getOrInsertFunction is using C-style vararg instead of variadic templates. From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments. The variadic template is an obvious solution to both issues. llvm-svn: 299925	2017-04-11 08:36:52 +00:00
Hal Finkel	cef9e52736	[PowerPC] multiply-with-overflow might use the CTR register Check the legality of ISD::[US]MULO to see whether Intrinsic::[us]mul_with_overflow will legalize into a function call (and, thus, will use the CTR register). Fixes PR32485. Patch by Tim Neumann! Differential Revision: https://reviews.llvm.org/D31790 llvm-svn: 299910	2017-04-11 02:03:17 +00:00
Matt Arsenault	3c1fc768ed	Allow DataLayout to specify addrspace for allocas. LLVM makes several assumptions about address space 0. However, alloca is presently constrained to always return this address space. There's no real way to avoid using alloca, so without this there is no way to opt out of these assumptions. The problematic assumptions include: - That the pointer size used for the stack is the same size as the code size pointer, which is also the maximum sized pointer. - That 0 is an invalid, non-dereferencable pointer value. These are problems for AMDGPU because alloca is used to implement the private address space, which uses a 32-bit index as the pointer value. Other pointers are 64-bit and behave more like LLVM's notion of generic address space. By changing the address space used for allocas, we can change our generic pointer type to be LLVM's generic pointer type which does have similar properties. llvm-svn: 299888	2017-04-10 22:27:50 +00:00
Eric Christopher	d78bd57b3f	Get the TOC save offset off of PPCFrameLowering rather than a separate copy of the same data. llvm-svn: 299887	2017-04-10 22:22:11 +00:00
Simon Atanasyan	c986eb50ef	[mips] Use Triple::isLittleEndian to check endianness. NFC llvm-svn: 299872	2017-04-10 19:42:44 +00:00
Matthew Simpson	1468d3e04e	[ARM/AArch64] Ensure valid vector element types for interleaved accesses This patch refactors and strengthens the type checks performed for interleaved accesses. The primary functional change is to ensure that the interleaved accesses have valid element types. The added test cases previously failed because the element type is f128. Differential Revision: https://reviews.llvm.org/D31817 llvm-svn: 299864	2017-04-10 18:34:37 +00:00
Matt Arsenault	678e111e11	AMDGPU: Fix crash when disassembling VOP3 mac The unused dummy src2_modifiers is missing, so it crashes when trying to print it. I tried to fully remove src2_modifiers, but there are some irritations in the places where it is converted to mad since it starts to require modifying use lists while iterating over them. llvm-svn: 299861	2017-04-10 17:58:06 +00:00
Simon Pilgrim	b6702eaec3	[X86][MMX] Add fast-isel support for MMX non-temporal writes Differential Revision: https://reviews.llvm.org/D31754 llvm-svn: 299852	2017-04-10 16:58:07 +00:00
Diana Picus	3ff82c8cb7	[ARM] GlobalISel: Support G_FPOW for float and double Legalize to a libcall. llvm-svn: 299841	2017-04-10 09:27:39 +00:00
Matt Arsenault	dd8fd9dcfd	AMDGPU: Actually write nops for writeNopData Before this was just writing 0s, which ends up looking like a v_cndmask_b32 v0, s0, v0, vcc. Write out an encoded s_nop instead. llvm-svn: 299816	2017-04-08 21:28:38 +00:00
Balaram Makam	b4419f9d30	[AArch64] Refine Falkor Machine Model - Part 3 This concludes the refinements to Falkor Machine Model. It includes SchedPredicates for immediate zero and LSL Fast. Forwarding logic is also modeled for vector multiply and accumulate only. llvm-svn: 299810	2017-04-08 03:30:15 +00:00
Eli Friedman	75631c97ba	[ARM] Prefer BIC over BFC in ARM mode. BIC is generally faster, and it can put the output in a different register from the input. We already do this in Thumb2 mode; not sure why the equivalent fix never got applied to ARM mode. Differential Revision: https://reviews.llvm.org/D31797 llvm-svn: 299803	2017-04-07 22:01:23 +00:00
Petr Hosek	c3a9e6db38	[AArch64] Allow global register asm("x18") or asm("w18") under -ffixed-x18 When using -ffixed-x18, the x18 (or w18) register can safely be used with the "global register variable" GCC extension, but the backend fails to recognize it. Patch by Roland McGrath. Differential Revision: https://reviews.llvm.org/D31793 llvm-svn: 299799	2017-04-07 20:41:58 +00:00
Simon Dardis	f7e4388e3b	Revert "[SelectionDAG] Enable target specific vector scalarization of calls and returns" This reverts commit r299766. This change appears to have broken the MIPS buildbots. Reverting while I investigate. Revert "[mips] Remove usage of debug only variable (NFC)" This reverts commit r299769. Follow up commit. llvm-svn: 299788	2017-04-07 17:25:05 +00:00
Stanislav Mekhanoshin	478b81982f	[AMDGPU] Unroll more to eliminate phis and conditions Increase threshold to unroll a loop which contains an "if" statement whose condition defined by a PHI belonging to the loop. This may help to eliminate if region and potentially even PHI itself, saving on both divergence and registers used for the PHI. Add a small bonus for each of such "if" statements. Differential Revision: https://reviews.llvm.org/D31693 llvm-svn: 299779	2017-04-07 16:26:28 +00:00
Dehao Chen	58fa724494	Use PMADDWD to expand reduction in a loop Summary: PMADDWD can help improve 8/16 bit integer mutliply-add operation performance for cases like: for (int i = 0; i < count; i++) a += x[i] * y[i]; Reviewers: wmi, davidxl, hfinkel, RKSimon, zvi, mkuper Reviewed By: mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31679 llvm-svn: 299776	2017-04-07 15:41:52 +00:00
Igor Breger	2953788c36	[GlobalISel] implement narrowing for G_CONSTANT. Summary: [GlobalISel] implement narrowing for G_CONSTANT. Reviewers: bogner, zvi, t.p.northover Reviewed By: t.p.northover Subscribers: llvm-commits, dberris, rovka, kristof.beyls Differential Revision: https://reviews.llvm.org/D31744 llvm-svn: 299772	2017-04-07 14:41:59 +00:00
Simon Dardis	9f6a5cd91d	[mips] Remove usage of debug only variable (NFC) Fix the lld-x86_64-darwin13 buildbot by removing the declaration of a debug only variable and instead moving the value into the debug statement. llvm-svn: 299769	2017-04-07 13:49:12 +00:00
Petar Jovanovic	bc54eb89ad	[mips][msa] Fix generation of bm(n)zi and bins[lr]i instructions We have two cases here, the first one being the following instruction selection from the builtin function: bm(n)zi builtin -> vselect node -> bins[lr]i machine instruction In case of bm(n)zi having an immediate which has either its high or low bits set, a bins[lr] instruction can be selected through the selectVSplatMask[LR] function. The function counts the number of bits set, and that value is being passed to the bins[lr]i instruction as its immediate, which in turn copies immediate modulo the size of the element in bits plus 1 as per specs, where we get the off-by-one-error. The other case is: bins[lr]i -> vselect node -> bsel.v In this case, a bsel.v instruction gets selected with a mask having one bit less set than required. Patch by Stefan Maksimovic. Differential Revision: https://reviews.llvm.org/D30579 llvm-svn: 299768	2017-04-07 13:31:36 +00:00
Dmitry Preobrazhensky	e5147247b8	[AMDGPU][MC] Fix for Bug 28211 + LIT tests - corrected DS_GWS_* opcodes (see VI_Shader_Programming#16.pdf for detailed description) - address operand is not used - several opcodes have data operand - all opcodes have offset modifier - DS_AND_SRC2_B32: corrected typo in mnemo - DS_WRAP_RTN_F32 replaced with DS_WRAP_RTN_B32 - added CI/VI opcodes: - DS_CONDXCHG32_RTN_B64 - DS_GWS_SEMA_RELEASE_ALL - added VI opcodes: - DS_CONSUME - DS_APPEND - DS_ORDERED_COUNT Differential Revision: https://reviews.llvm.org/D31707 llvm-svn: 299767	2017-04-07 13:07:13 +00:00
Simon Dardis	6470ff0b24	[SelectionDAG] Enable target specific vector scalarization of calls and returns By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown, backends can request that LLVM to scalarize vector types for calls and returns. The MIPS vector ABI requires that vector arguments and returns are passed in integer registers. With SelectionDAG's new hooks, the MIPS backend can now handle LLVM-IR with vector types in calls and returns. E.g. 'call @foo(<4 x i32> %4)'. Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for calls and returns if vector types were not legal. If vector types were legal, a single 128bit vector argument would be assigned to a single 32 bit / 64 bit integer register. By teaching the MIPS backend to inspect the original types, it can now implement the MIPS vector ABI which requires a particular method of scalarizing vectors. Previously, the MIPS backend relied on clang to scalarize types such as "call @foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3, i32 inreg %4)". This patch enables the MIPS backend to take either form for vector types. Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur Differential Revision: https://reviews.llvm.org/D27845 llvm-svn: 299766	2017-04-07 13:03:52 +00:00
Jonas Paulsson	cad72efee6	[SystemZ] Check for presence of vector support in SystemZISelLowering A test case was found with llvm-stress that caused DAGCombiner to crash when compiling for an older subtarget without vector support. SystemZTargetLowering::combineTruncateExtract() should do nothing for older subtargets. This check was placed in canTreatAsByteVector(), which also helps in a few other places. Review: Ulrich Weigand llvm-svn: 299763	2017-04-07 12:35:11 +00:00
Jonas Paulsson	16100c637e	[SystemZ] Remove confusing comment in combineEXTRACT_VECTOR_ELT() It isn't just one-element vectors that can appear here. llvm-svn: 299762	2017-04-07 12:11:41 +00:00
Sam Kolton	6e79529db4	[AMDGPU] Move SiShrinkInstruction and SDWAPeephole to SSAOptimization passes Summary: Difference beetween PreRegAlloc() and MachineSSAOptimization() are that the former is run despite of -O0 optimization level. In my undestanding SiShrinkInstructions and SDWAPeephole shouldn't run when optimizations are disabled. With this change order of passes will not change. Reviewers: arsenm, vpykhtin, rampitec Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31705 llvm-svn: 299757	2017-04-07 10:53:12 +00:00
Diana Picus	3c608448e1	[ARM] GlobalISel: Support frem for 64-bit values Legalize to a libcall. llvm-svn: 299756	2017-04-07 10:50:02 +00:00
Diana Picus	a5bab61a8d	[ARM] GlobalISel: Support frem for 32-bit values Legalize to a libcall. On this occasion, also start allowing soft float subtargets. For the moment G_FREM is the only legal floating point operation for them. llvm-svn: 299753	2017-04-07 09:41:39 +00:00
Derek Schuff	9bb494caf4	[WebAssembly] Fix -Wcovered-switch-default warning llvm-svn: 299736	2017-04-06 23:52:01 +00:00
Konstantin Zhuravlyov	4b3847e865	AMDGPU/GFX9: Fix shared and private aperture queries Differential Revision: https://reviews.llvm.org/D31786 llvm-svn: 299727	2017-04-06 23:02:33 +00:00
Eric Christopher	380611addc	Remove the default subtarget from the Power port. It's unnecessary and harmful if used. llvm-svn: 299726	2017-04-06 23:01:30 +00:00
Yi Kong	60b5a1cd17	Revert "Revert "[ARM] Add Kryo to available targets"" This reverts commit dc9458d5a747a02a9a8f198b84c2b92a6939a8dd. Added missing case for PreISelOperandLatencyAdjustment. llvm-svn: 299724	2017-04-06 22:47:47 +00:00
Michael Kuperstein	6129887d21	[X86] Revert r299387 due to AVX legalization infinite loop. llvm-svn: 299720	2017-04-06 22:33:25 +00:00
Matt Arsenault	21a438255d	AMDGPU: Diagnose illegal SGPR to VGPR copies This is possible in ways that are not compiler bugs, so stop asserting on them. This emits an extra error when emitting objects when it can't encode the new pseudo, but I'm not sure that matters. llvm-svn: 299712	2017-04-06 21:09:53 +00:00
Matt Arsenault	5cf4271883	AMDGPU: Replace fp16SrcZerosHighBits with a whitelist FCOPYSIGN is lowered to bit operations which don't clear the high bits. llvm-svn: 299708	2017-04-06 20:58:30 +00:00
Mehdi Amini	db11fdfda5	Revert "Turn some C-style vararg into variadic templates" This reverts commit r299699, the examples needs to be updated. llvm-svn: 299702	2017-04-06 20:23:57 +00:00
Huihui Zhang	98240e9643	[SelectionDAG] [ARM CodeGen] Fix chain information of LowerMUL In LowerMUL, the chain information is not preserved for the new created Load SDNode. For example, if a Store alias with one of the operand of Mul. The Load for that operand need to be scheduled before the Store. The dependence is recorded in the chain of Store, in TokenFactor. However, when lowering MUL, the SDNodes for the new Loads for VMULL are not updated in the TokenFactor for the Store. Thus the chain is not preserved for the lowered VMULL. llvm-svn: 299701	2017-04-06 20:22:51 +00:00
Mehdi Amini	579540a8f7	Turn some C-style vararg into variadic templates Module::getOrInsertFunction is using C-style vararg instead of variadic templates. From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments. The variadic template is an obvious solution to both issues. Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu> Differential Revision: https://reviews.llvm.org/D31070 llvm-svn: 299699	2017-04-06 20:09:31 +00:00
Yaxun Liu	76ae47cb35	[AMDGPU] Temporarily change constant address space from 4 to 2 Our final address space mapping is to let constant address space to be 4 to match nvptx. However for now we will make it 2 to avoid unnecessary work in FE/BE/devlib about intrinsics returning constant pointers. Differential Revision: https://reviews.llvm.org/D31770 llvm-svn: 299690	2017-04-06 19:17:32 +00:00
Yi Kong	5e7059b702	Revert "[ARM] Add Kryo to available targets" This reverts commit 942d6e6f58bf7e63810dd7cbcbce1fdfa5ebc6d4. Build breakage. llvm-svn: 299689	2017-04-06 19:16:14 +00:00
Yi Kong	2b622b1fc1	[ARM] Add Kryo to available targets Summary: Host CPU detection now supports Kryo, so we need to recognize it in ARM target. Reviewers: mcrosier, t.p.northover, rengolin, echristo, srhines Reviewed By: t.p.northover, echristo Subscribers: aemerson Differential Revision: https://reviews.llvm.org/D31775 llvm-svn: 299674	2017-04-06 18:10:08 +00:00
Matt Arsenault	dd10884e9d	AMDGPU: Stop using CCAssignToRegWithShadow This does not do what it is attempting to use it for and requires working around in LowerFormalArguments. llvm-svn: 299667	2017-04-06 17:37:27 +00:00
Krzysztof Parzyszek	058abf1a4a	[Hexagon] Change the vector scaling for vector offsets Keep full offset value on MI-level instructions, but have it scaled down in the MC-level instructions. llvm-svn: 299664	2017-04-06 17:28:21 +00:00
Stanislav Mekhanoshin	ea57c38521	[AMDGPU] Eliminate barrier if workgroup size is not greater than wavefront size If a workgroup size is known to be not greater than wavefront size the s_barrier instruction is not needed since all threads are guarantied to come to the same point at the same time. Differential Revision: https://reviews.llvm.org/D31731 llvm-svn: 299659	2017-04-06 16:48:30 +00:00
Sam Kolton	9fa169601f	[AMDGPU] Resubmit SDWA peephole: enable by default Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299654	2017-04-06 15:03:28 +00:00
Daniel Sanders	0b5293f6ae	[globalisel][tablegen] Move <Target>InstructionSelector declarations to anonymous namespaces Summary: This resolves the issue of tablegen-erated includes in the headers for non-GlobalISel builds in a simpler way than before. Reviewers: qcolombet, ab Reviewed By: ab Subscribers: igorb, ab, mgorny, dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30998 llvm-svn: 299637	2017-04-06 09:49:34 +00:00
David Green	1b4b59a415	[ARM] Remove a dead ADD during the creation of TBBs During the optimisation of jump tables in the constant island pass, an extra ADD could be left over, now dead but not removed. Differential Revision: https://reviews.llvm.org/D31389 llvm-svn: 299634	2017-04-06 08:32:47 +00:00
Keno Fischer	1ec5dd85a2	[X86 TTI] Implement LSV hook Summary: LSV wants to know the maximum size that can be loaded to a vector register. On X86, this always matches the maximum register width. Implement this accordingly and add a test to make sure that LSV can vectorize up to the maximum permissible width on X86. Reviewers: delena, arsenm Reviewed By: arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D31504 llvm-svn: 299589	2017-04-05 20:51:38 +00:00
Ivan Krasin	d4f70c70b9	Revert r299536. [AMDGPU] SDWA peephole: enable by default. Reason: breaks multiple bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988 http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173 Original Review URL: https://reviews.llvm.org/D31671 llvm-svn: 299583	2017-04-05 19:58:12 +00:00
Dmitry Preobrazhensky	3ac6311a8d	[AMDGPU][MC] Fix for Bug 28158 + LIT tests Added support of the following instructions: - s_cbranch_cdbgsys - s_cbranch_cdbgsys_and_user - s_cbranch_cdbgsys_or_user - s_cbranch_cdbguser - s_setkill Reviewers: vpykhtin Differential Revision: https://reviews.llvm.org/D31469 llvm-svn: 299567	2017-04-05 17:26:45 +00:00
Matthias Braun	44047427b1	ARMFrameLowering: Slight cleanups; NFC llvm-svn: 299562	2017-04-05 16:58:41 +00:00
Dmitry Preobrazhensky	45db65037f	[AMDGPU][MC] Fix for Bug 28167 + LIT tests Corrected src0 for v_writelane_b32: - Enabled inline constants and literals for SI/CI (VOP2) - Enabled inline constants for VI (VOP3) Reviewers: vpykhtin, arsenm https://reviews.llvm.org/D31463 llvm-svn: 299555	2017-04-05 16:08:21 +00:00
Nirav Dave	aa65a2beb8	[SystemZ] Prevent Merging Bitcast with non-normal loads Fixes PR32505. Reviewers: uweigand, jonpa Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31609 llvm-svn: 299552	2017-04-05 15:42:48 +00:00
Sanjay Patel	b2f1621bb1	[DAGCombiner] add and use TLI hook to convert and-of-seteq / or-of-setne to bitwise logic+setcc (PR32401) This is a generic combine enabled via target hook to reduce icmp logic as discussed in: https://bugs.llvm.org/show_bug.cgi?id=32401 It's likely that other targets will want to enable this hook for scalar transforms, and there are probably other patterns that can use bitwise logic to reduce comparisons. Note that we are missing an IR canonicalization for these patterns, and we will probably prefer the pair-of-compares form in IR (shorter, more likely to fold). Differential Revision: https://reviews.llvm.org/D31483 llvm-svn: 299542	2017-04-05 14:09:39 +00:00
Sam Kolton	34e29784fb	[AMDGPU] SDWA peephole: enable by default Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299536	2017-04-05 12:00:45 +00:00
Alexander Kornienko	014ac69f2e	Fix WebAssembly after r299529. llvm-svn: 299535	2017-04-05 11:50:43 +00:00
Simon Pilgrim	5fbd93b21a	[X86][SSE] Renamed combine to make it clear that it only handles the vector shift by immediate opcodes. NFCI llvm-svn: 299532	2017-04-05 10:44:42 +00:00
James Molloy	9d42334e02	[AArch64] Crypto requires FP. So if FP is disabled, crypto should also be disabled. llvm-svn: 299531	2017-04-05 10:44:38 +00:00
Alex Bradbury	866113c2ea	Add MCContext argument to MCAsmBackend::applyFixup for error reporting A number of backends (AArch64, MIPS, ARM) have been using MCContext::reportError to report issues such as out-of-range fixup values in their TgtAsmBackend. This is great, but because MCContext couldn't easily be threaded through to the adjustFixupValue helper function from its usual callsite (applyFixup), these backends ended up adding an MCContext* argument and adding another call to applyFixup to processFixupValue. Adding an MCContext parameter to applyFixup makes this unnecessary, and even better - applyFixup can take a reference to MCContext rather than a potentially null pointer. Differential Revision: https://reviews.llvm.org/D30264 llvm-svn: 299529	2017-04-05 10:16:14 +00:00
Ahmed Bougacha	ec8b1fb539	[X86] Relax assert in broadcast-of-subvector lowering. Before r294774, there was a problem when lowering broadcasts to use 128-bit subvectors. When we looked through a bitcast to find the broadcast input, we'd keep using the original type, so you'd end up with things like: (v8f32 (broadcast (v4f32 (extract_subvector (v8i32 V), ...)) )) r294774 fixed it to always emit subvectors with the scalar type of the original source. It also introduced some asserts, to check that we use scalars with the same size, and vectors with the same number of elements. The scalar size equality is checked earlier when looking through bitcasts, and is a useful assert. However, the number of elements don't have to be identical: we're always going to extract a 128-bit subvector, and we can have different size inputs if we looked through a concat_vector to find a 256-bit source. Relax the overzealous assert. Replace it with a check of the original source vector being 256 or 512 bits. If it's 128 bits, we can't extract_subvector from it. Fixes PR32371. llvm-svn: 299490	2017-04-05 00:14:39 +00:00
Ahmed Bougacha	d3c03a5ddd	[AArch64] Avoid partial register deps on insertelt of load into lane 0. This improves upon r246462: that prevented FMOVs from being emitted for the cross-class INSERT_SUBREGs by disabling the formation of INSERT_SUBREGs of LOAD. But the ld1.s that we started selecting caused us to introduce partial dependencies on the vector register. Avoid that by using SCALAR_TO_VECTOR: it's a first-class citizen that is folded away by many patterns, including the scalar LDRS that we want in this case. Credit goes to Adam for finding the issue! llvm-svn: 299482	2017-04-04 22:55:53 +00:00
Balaram Makam	b3120b6d3f	[AArch64] Add missing schedinfo, check completeness for Falkor. llvm-svn: 299468	2017-04-04 21:15:53 +00:00
Petr Hosek	9eb0a1e09b	[AArch64][Fuchsia] Allow -mcmodel=kernel for --target=aarch64-fuchsia This mode is just like -mcmodel=small except that it moves the thread pointer from TPIDR_EL0 to TPIDR_EL1. Patch by Roland McGrath. Differential Revision: https://reviews.llvm.org/D31624 llvm-svn: 299462	2017-04-04 19:51:53 +00:00
Balaram Makam	7b5c098cfa	[AArch64] Refine Falkor Machine Model - Part 2 llvm-svn: 299456	2017-04-04 18:42:14 +00:00
Sanjay Patel	ac618383e3	[x86] remove dead select-of-constants transform; NFCI https://reviews.llvm.org/D30537 / https://reviews.llvm.org/rL296977 added these transforms and other related transforms to the generic DAGCombiner (with a hook that x86 sets to true), so these patterns should not exist by the time we reach the target-specific combiner hook. llvm-svn: 299448	2017-04-04 16:54:58 +00:00
Matt Arsenault	3e90f84806	AMDGPU: Remove legacy export intrinsic llvm-svn: 299444	2017-04-04 16:34:39 +00:00
Matt Arsenault	236da200f1	AMDGPU: Remove legacy image intrinsics llvm-svn: 299443	2017-04-04 16:34:35 +00:00
Coby Tayree	2cb497afa4	[X86][MS-compatability]Allow named synonymous for MS-assembly operators This patch enhances X86AsmParser's immediate expression parsing abilities, to include a named synonymous for selected binary/unary bitwise operators: {and,shl,shr,or,xor,not}, ultimately achieving better MS-compatability MASM reference: https://msdn.microsoft.com/en-us/library/94b6khh4.aspx Differential Revision: D31277 llvm-svn: 299439	2017-04-04 14:43:23 +00:00
Simon Pilgrim	448222d8ba	Strip trailing whitespace llvm-svn: 299438	2017-04-04 14:40:53 +00:00
Michael Zuckerman	88fb171015	[X86][LLVM] Converting __mm{\|256\|512}_movm_epi{8\|16\|32\|64} LLVMIR call into generic intrinsics. This patch is a part one of two reviews, one for the clang and the other for LLVM. The patch deletes the back-end intrinsics and adds support for them in the auto upgrade. Differential Revision: https://reviews.llvm.org/D31393 llvm-svn: 299432	2017-04-04 13:32:14 +00:00
Daniel Sanders	bee5739a7c	[tablegen][globalisel] Add support for nested instruction matching. Summary: Lift the restrictions that prevented the tree walking introduced in the previous change and add support for patterns like: (G_ADD (G_MUL (G_SEXT $src1), (G_SEXT $src2)), $src3) -> SMADDWrrr $dst, $src1, $src2, $src3 Also adds support for G_SEXT and G_ZEXT to support these cases. One particular aspect of this that I should draw attention to is that I've tried to be overly conservative in determining the safety of matches that involve non-adjacent instructions and multiple basic blocks. This is intended to be used as a cheap initial check and we may add a more expensive check in the future. The current rules are: * Reject if any instruction may load/store (we'd need to check for intervening memory operations. * Reject if any instruction has implicit operands. * Reject if any instruction has unmodelled side-effects. See isObviouslySafeToFold(). Reviewers: t.p.northover, javed.absar, qcolombet, aditya_nandakumar, ab, rovka Reviewed By: ab Subscribers: igorb, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30539 llvm-svn: 299430	2017-04-04 13:25:23 +00:00
Simon Dardis	0a47edb153	[mips] Deal with empty blocks in the mips hazard scheduler This patch teaches the hazard scheduler how to handle empty blocks when search for the next real instruction when dealing with forbidden slots. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D31293 llvm-svn: 299427	2017-04-04 11:28:53 +00:00
Oren Ben Simhon	568fb197da	[X86] Add 64 bit pattern matching for PSADBW PSADBW pattern currently supports the 32 bit IR pattern and only GLT (greather than) comparison. The patch extends the pattern to catch also 64 bit IR pattern and includes all other comparison types (not only GLT). Differential Revision: https://reviews.llvm.org/D31577 llvm-svn: 299425	2017-04-04 10:23:18 +00:00
Weiming Zhao	74a7fa0594	Reland r298901 with modifications (reverted in r298932) Dont emit Mapping symbols for sections that contain only data. Summary: Dont emit mapping symbols for sections that contain only data. Reviewers: rengolin, weimingz, kparzysz, t.p.northover, peter.smith Reviewed By: t.p.northover Patched by Shankar Easwaran <shankare@codeaurora.org> Subscribers: alekseyshl, t.p.northover, llvm-commits Differential Revision: https://reviews.llvm.org/D30724 llvm-svn: 299392	2017-04-03 21:50:04 +00:00
Matt Arsenault	b600e138cc	AMDGPU: Remove llvm.SI.vs.load.input llvm-svn: 299391	2017-04-03 21:45:13 +00:00
Simon Pilgrim	af33757b5d	[X86][SSE]] Lower BUILD_VECTOR with repeated elts as BUILD_VECTOR + VECTOR_SHUFFLE It can be costly to transfer from the gprs to the xmm registers and can prevent loads merging. This patch splits vXi16/vXi32/vXi64 BUILD_VECTORS that use the same operand in multiple elements into a BUILD_VECTOR with only a single insertion of each of those elements and then performs an unary shuffle to duplicate the values. There are a couple of minor regressions this patch unearths due to some missing MOVDDUP/BROADCAST folds that I will address in a future patch. Note: Now that vector shuffle lowering and combining is pretty good we should be reusing that instead of duplicating so much in LowerBUILD_VECTOR - this is the first of several patches to address this. Differential Revision: https://reviews.llvm.org/D31373 llvm-svn: 299387	2017-04-03 21:06:51 +00:00
Amjad Aboud	0389f62879	x86 interrupt calling convention: re-align stack pointer on 64-bit if an error code was pushed The x86_64 ABI requires that the stack is 16 byte aligned on function calls. Thus, the 8-byte error code, which is pushed by the CPU for certain exceptions, leads to a misaligned stack. This results in bugs such as Bug 26413, where misaligned movaps instructions are generated. This commit fixes the misalignment by adjusting the stack pointer in these cases. The adjustment is done at the beginning of the prologue generation by subtracting another 8 bytes from the stack pointer. These additional bytes are popped again in the function epilogue. Fixes Bug 26413 Patch by Philipp Oppermann. Differential Revision: https://reviews.llvm.org/D30049 llvm-svn: 299383	2017-04-03 20:28:45 +00:00
Jun Bum Lim	dee5565869	[CodeGenPrep] move aarch64-type-promotion to CGP Summary: Move the aarch64-type-promotion pass within the existing type promotion framework in CGP. This change also support forking sexts when a new sext is required for promotion. Note that change is based on D27853 and I am submitting this out early to provide a better idea on D27853. Reviewers: jmolloy, mcrosier, javed.absar, qcolombet Reviewed By: qcolombet Subscribers: llvm-commits, aemerson, rengolin, mcrosier Differential Revision: https://reviews.llvm.org/D28680 llvm-svn: 299379	2017-04-03 19:20:07 +00:00
Matt Arsenault	754dd3eaef	AMDGPU: Remove legacy bfe intrinsics llvm-svn: 299372	2017-04-03 18:08:08 +00:00
Krzysztof Parzyszek	44173f7d02	[Hexagon] Factor out some common code in HexagonEarlyIfConv.cpp, NFC llvm-svn: 299367	2017-04-03 17:26:40 +00:00
Craig Topper	d33ee1b960	[APInt] Move isMask and isShiftedMask out of APIntOps and into the APInt class. Implement them without memory allocation for multiword This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation. Differential Revision: https://reviews.llvm.org/D31565 llvm-svn: 299362	2017-04-03 16:34:59 +00:00
Sjoerd Meijer	1179470ff8	ARMAsmParser: clean up of isImmediate functions - we are now using immediate AsmOperands so that the range check functions are tablegen'ed. - Big bonus is that error messages become much more accurate, i.e. instead of a useless "invalid operand" error message it will not say that the immediate operand must in range [x,y], which is why regression tests needed updating. More tablegen operand descriptions could probably benefit from using immediateAsmOperand, but this is a first good step to get rid of most of the nearly identical range check functions. I will address the remaining immediate operands in next clean ups. Differential Revision: https://reviews.llvm.org/D31333 llvm-svn: 299358	2017-04-03 14:50:04 +00:00
Simon Pilgrim	0e2f8cd875	[X86][MMX] Improve support for folding fptosi from XMM to MMX llvm-svn: 299338	2017-04-02 17:45:41 +00:00
Simon Pilgrim	ba28263b03	[X86][MMX] Simplify tablegen patterns by always combining MOVDQ2Q from v2i64 llvm-svn: 299336	2017-04-02 16:20:34 +00:00
Simon Pilgrim	e56a2d7b4c	[X86][MMX] Added support for subvector extraction to MMX register llvm-svn: 299335	2017-04-02 15:52:28 +00:00
Davide Italiano	c88169e61b	[AMDGPU] Garbage collect now unused dead code. NFCI. llvm-svn: 299310	2017-04-01 19:30:17 +00:00
Quentin Colombet	49d70d0529	Revert "Feature generic option to setup start/stop-after/before" This reverts commit r299282. Didn't intend to commit this :( llvm-svn: 299288	2017-04-01 01:26:24 +00:00
Quentin Colombet	35a47010b1	Revert "Instrument SDISel C++ patterns" This reverts commit r299284. Didn't intend to commit this :( llvm-svn: 299286	2017-04-01 01:26:17 +00:00
Quentin Colombet	b43da15602	Instrument SDISel C++ patterns llvm-svn: 299284	2017-04-01 01:21:32 +00:00
Quentin Colombet	ffe3053a66	Feature generic option to setup start/stop-after/before This patch refactors the code used in llc such that all the users of the addPassesToEmitFile API have access to a homogeneous way of handling start/stop-after/before options right out of the box. Previously each user would have needed to duplicate this logic and set up its own options. NFC llvm-svn: 299282	2017-04-01 01:21:24 +00:00
Eric Christopher	60a245e0ff	Reduce the number of times we query the subtarget for the same information. llvm-svn: 299278	2017-03-31 23:12:27 +00:00
Eric Christopher	cf965f2f03	Small cleanup to remove extraneous cast. llvm-svn: 299277	2017-03-31 23:12:24 +00:00
Krzysztof Parzyszek	d04c9b999c	[Hexagon] Remove unused variables Found by PVS-Studio. Fixes llvm.org/PR31676. llvm-svn: 299262	2017-03-31 21:03:59 +00:00
Krzysztof Parzyszek	b326411fdc	[Hexagon] Fix typo in HexagonEarlyIfCConv.cpp Found by PVS-Studio. Fixes llvm.org/PR32480. llvm-svn: 299258	2017-03-31 20:36:00 +00:00
Stanislav Mekhanoshin	12aa5b733e	[AMDGPU] Remove assumption that vector and scalar types do not alias Differential Revision: https://reviews.llvm.org/D31547 llvm-svn: 299250	2017-03-31 20:16:54 +00:00
Matt Arsenault	8edfaee7be	AMDGPU: Remove unnecessary ands when f16 is legal Add a new node to act as a fancy bitcast from f16 operations to i32 that implicitly zero the high 16-bits of the result. Alternatively could try making v2f16 legal and canonicalizing on build_vectors. llvm-svn: 299246	2017-03-31 19:53:03 +00:00
Jan Vesely	3c99441ef4	AMDGPU/R600: Fix amdgpu alias analysis pass. R600 uses higher AS number to access kernel parameters Fixes: r298846 Differential Revision: https://reviews.llvm.org/D31520 llvm-svn: 299245	2017-03-31 19:26:23 +00:00
Balaram Makam	2aba753e84	[AArch64] Add new subtarget feature to fold LSL into address mode. Summary: This feature enables folding of logical shift operations of up to 3 places into addressing mode on Kryo and Falkor that have a fastpath LSL. Reviewers: mcrosier, rengolin, t.p.northover Subscribers: junbuml, gberry, llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D31113 llvm-svn: 299240	2017-03-31 18:16:53 +00:00
Craig Topper	9601168670	[AVX-512] Update lowering for gather/scatter prefetch intrinsics to match the immediate encodings the frontend uses based on the _MM_HINT_T0/T1 constant values in clang's headers. Our _MM_HINT_T0/T1 constant values are 3/2 which matches gcc, but not icc or Intel documentation. Interestingly gcc had this same bug on their implementation of the gather/scatter builtins at one point too. Fixes PR32411. llvm-svn: 299234	2017-03-31 17:24:29 +00:00
Petar Jovanovic	9bff3b7818	[mips][msa] Prevent output operand from commuting for dpadd_[su].df ins Implementation of TargetInstrInfo::findCommutedOpIndices for MIPS target, restricting commutativity to second and third operand only for dpaadd_[su].df instructions therein. Prior to this change, there were cases where the vector that is to be added to the dot product of the other two could take a position other than the first one in the instruction, generating false output in the destination vector. Such behavior has been noticed in the two functions generating v2i64 output values so far. Other ones may exhibit such behavior as well, just not for the vector operands which are present in the test at the moment. Tests altered so that the function's first operand is a constant splat so that it can be loaded with a ldi instruction, since that is the case in which the erroneous instruction operand placement has occurred. We check that the register which is present in the ldi instruction is placed as the first operand in the corresponding dpadd instruction. Patch by Stefan Maksimovic. Differential Revision: https://reviews.llvm.org/D30827 llvm-svn: 299223	2017-03-31 14:31:55 +00:00
Jonas Paulsson	c7bb22e75f	[SystemZ] Make sure of correct regclasses in insertSelect() Since LOCR only accepts GR32 virtual registers, its operands must be copied into this regclass in insertSelect(), when an LOCR is built. Otherwise, the case where the source operand was GRX32 will produce invalid IR. Review: Ulrich Weigand llvm-svn: 299220	2017-03-31 14:06:59 +00:00
Simon Pilgrim	3c81c34d8d	[DAGCombiner] Add vector demanded elements support to ComputeNumSignBits Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course. Followup to D25691. Differential Revision: https://reviews.llvm.org/D31311 llvm-svn: 299219	2017-03-31 13:54:09 +00:00
Jonas Paulsson	56bb0857e9	[SystemZ] Skip DAGCombining of vector node for older subtargets. Even on older subtargets that lack vector support, there may be vector values with just one element in the input program. These are converted during DAG legalization to scalar values. The pre-legalize SystemZ DAGCombiner methods should in this circumstance not touch these nodes. This patch adds a check for this in SystemZTargetLowering::combineEXTRACT_VECTOR_ELT(). Review: Ulrich Weigand llvm-svn: 299213	2017-03-31 13:22:59 +00:00
Sam Kolton	27e0f8bc72	[AMDGPU] SDWA Peephole: improve search for immediates in SDWA patterns Previously compiler often extracted common immediates into specific register, e.g.: ``` %vreg0 = S_MOV_B32 0xff; %vreg2 = V_AND_B32_e32 %vreg0, %vreg1 %vreg4 = V_AND_B32_e32 %vreg0, %vreg3 ``` Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands: ``` SDWA src: %vreg2 src_sel:BYTE_0 SDWA src: %vreg4 src_sel:BYTE_0 ``` With this change peephole check if operand is either immediate or register that is copy of immediate. llvm-svn: 299202	2017-03-31 11:42:43 +00:00
Simon Pilgrim	37b536e4b3	[DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes. Differential Revision: https://reviews.llvm.org/D31249 llvm-svn: 299201	2017-03-31 11:24:16 +00:00
Eric Christopher	9fd267c221	Temporarily revert "[PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64" as it's causing test failures, I've given Carrot a testcase offline. This reverts commit r298955. llvm-svn: 299153	2017-03-31 02:16:54 +00:00
Dan Gohman	970d02c42d	[WebAssembly] Initial linking metadata support Add support for the new relocations and linking metadata section support in https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md. In particular, this allows LLVM to indicate which variable is the stack pointer, so that it can be linked with other objects. This also adds support for emitting type relocations for call_indirect instructions. Right now, this is mainly tested by using wabt and hexdump to examine the output on selected testcases. We'll add more tests as the design stablizes and more of the pieces are in place. llvm-svn: 299141	2017-03-30 23:58:19 +00:00
Matt Arsenault	1074cb5420	AMDGPU: Rename isKernel What we really want to do is distinguish functions that may be called by other functions, and graphics shaders are not called kernels. llvm-svn: 299140	2017-03-30 23:58:04 +00:00
Matt Arsenault	79f837c254	AMDGPU: Add all atomicrmw fields to atomic.inc/dec Add scope, order, isVolatile llvm-svn: 299122	2017-03-30 22:21:40 +00:00
Craig Topper	3001b35189	[AVX-512] Fix bad comment from r299112. NFC llvm-svn: 299114	2017-03-30 21:05:33 +00:00
Craig Topper	533b1bde1b	[AVX-512] Fix another case where fastisel was generating a GR8 to VK1 copy. This time after calls returning i1. Fixes PR32472. llvm-svn: 299112	2017-03-30 21:02:52 +00:00
Stanislav Mekhanoshin	89653dfd2a	[AMDGPU] Add GlobalOpt parameter to Always Inliner pass If set to false it does not remove global aliases. With this parameter set to false it should be safe to run the pass before link. Differential Revision: https://reviews.llvm.org/D31489 llvm-svn: 299108	2017-03-30 20:16:02 +00:00
Davide Italiano	a0bd28c4d9	[AArch64ISelLowering] Remove `else` after `return` in LowerGlobalTLSAddress. llvm-svn: 299103	2017-03-30 19:52:31 +00:00
Davide Italiano	de05686ec6	[AArch64] Simplify isSingExtended()/isZeroExtended(). NFCI. llvm-svn: 299102	2017-03-30 19:46:18 +00:00
Simon Pilgrim	68168d17b9	Spelling mistakes in comments. NFCI. Based on corrections mentioned in patch for clang for PR27635 llvm-svn: 299072	2017-03-30 12:59:53 +00:00
Simon Pilgrim	ef4509b36e	Spelling mistakes in comments. NFCI. llvm-svn: 299069	2017-03-30 12:30:15 +00:00
Davide Italiano	e13920f407	[X86IselLowering] Remove extraneous semicolon. NFCI. Unbreaks the build with GCC -Werror. llvm-svn: 299030	2017-03-29 21:34:58 +00:00
Simon Pilgrim	8362c95257	[X86] Tidied up comment - we don't custom lower add/sub i64 on i686 anymore. NFCI. llvm-svn: 299004	2017-03-29 15:41:58 +00:00
Simon Pilgrim	fc97d5049f	Spelling mistakes in comments. NFCI. llvm-svn: 299000	2017-03-29 15:27:24 +00:00
Simon Pilgrim	2845189bd1	[X86][AVX2] Prevent unary interleaving patterns from calling lowerVectorShuffleAsSplitOrBlend (PR32453) llvm-svn: 298993	2017-03-29 13:00:00 +00:00
Simon Pilgrim	b670ba4e87	[AMDGPU] Tidy up computeKnownBitsForTargetNode/ComputeNumSignBitsForTargetNode arguments. NFCI. Based on comment in D31249. llvm-svn: 298991	2017-03-29 12:09:25 +00:00
Simon Pilgrim	ebd433d9fc	[X86] Removed old comment. NFCI. No longer makes sense as the previous opcode mnemonic it was referring to is long gone. llvm-svn: 298988	2017-03-29 10:44:51 +00:00
Eric Christopher	5829741c46	Move the x86 cpu feature rtm from Haswell to Skylake matching clang commit r298956. llvm-svn: 298986	2017-03-29 07:40:44 +00:00
Craig Topper	d9f51350b8	[AVX-512] Remove explicit KMOVWrk from isel patterns. COPY_TO_REGCLASS to GR32 is enough. llvm-svn: 298985	2017-03-29 07:31:56 +00:00
Craig Topper	d284606327	[AVX-512] Remove explicit KMOVWrk/KMOVWKr instructions from patterns where we can just use COPY_TO_REGCLASS instead. This will result in a KMOVW or KMOVD being emitted during register allocation. And in at least some cases this might allow the register coalescer to remove the copy all together. llvm-svn: 298984	2017-03-29 06:55:28 +00:00
Craig Topper	331297c62e	[AVX-512] Punt on fast-isel of truncates to i1 when AVX512 is enabled. We should be masking the value and emitting a register copy like we do in non-fast isel. Instead we were just updating the value map and emitting nothing. After r298928 we started seeing cases where we would create a copy from GR8 to GR32 because the source register in a VK1 to GR32 copy was replaced by the GR8 going into a truncate. This fixes PR32451. llvm-svn: 298957	2017-03-28 23:20:37 +00:00
Guozhi Wei	f8d40181c9	[PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64 In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64. This patch fixed PR32442. Differential Revision: https://reviews.llvm.org/D31407 llvm-svn: 298955	2017-03-28 22:55:01 +00:00
Stanislav Mekhanoshin	baf31ac7c8	[AMDGPU] Boost unroll threshold for loops reading local memory This is less important than increase threshold for private memory, but still brings performance improvements in a wide range of tests. Unrolling more for local memory serves three purposes: it allows to combine ds operations if offset becomes static, saves registers used for offsets in case of static offsets, and allows better lds latency hiding. Differential Revision: https://reviews.llvm.org/D31412 llvm-svn: 298948	2017-03-28 22:13:51 +00:00
Stanislav Mekhanoshin	b933c3f554	[AMDGPU] Fix recorded region boundaries in max-occupancy scheduler This is incorrect to record region boundaries before scheduling, it may change after scheduling. As a result second pass may see less instructions to schedule than it should. Differential Revision: https://reviews.llvm.org/D31434 llvm-svn: 298945	2017-03-28 21:48:54 +00:00
Simon Pilgrim	c7c5aa47cf	[X86][MMX] Match MMX fp_to_sint conversions from XMM registers We currently perform the various fp_to_sint XMM conversion and then transfer to the MMX register (on 32-bit via the stack). This patch improves support for MOVDQ2Q XMM to MMX transfers and adds the XMM->MMX fp_to_sint direct conversion patterns. The SSE2 specifications are the same as for XMM->XMM and XMM->MMX rounding/exceptions/etc. Differential Revision: https://reviews.llvm.org/D30868 llvm-svn: 298943	2017-03-28 21:32:11 +00:00
Stanislav Mekhanoshin	9053f22eeb	[AMDGPU] Split -amdgpu-early-inline-all option Previously it was covered by the internalization. It turns out we cannot run internalizer in FE, it break separate compilation tests. Thus early inliner gets its own option. Differential Revision: https://reviews.llvm.org/D31429 llvm-svn: 298935	2017-03-28 18:23:24 +00:00
Sanjay Patel	f01a1dad7f	[x86] use VPMOVMSK to replace memcmp libcalls for 32-byte equality Follow-up to: https://reviews.llvm.org/rL298775 llvm-svn: 298933	2017-03-28 17:23:49 +00:00
Weiming Zhao	da4d12a8e5	Revert "Dont emit Mapping symbols for sections that contain only data." It breaks some lld tests. This reverts commit 3a50eea6d9732ab40e9a7aebe6be777b53a8b35c. llvm-svn: 298932	2017-03-28 17:15:11 +00:00
Simon Pilgrim	3e2aa7f40e	[X86][AVX2] Add support for combining v16i16 shuffles to VPBLENDW llvm-svn: 298929	2017-03-28 16:40:38 +00:00
Craig Topper	058f2f6d72	[AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register. This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8. I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa. Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition. This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI. Differential Revision: https://reviews.llvm.org/D30968 llvm-svn: 298928	2017-03-28 16:35:29 +00:00
Simon Pilgrim	6b30172372	[X86][SSE] Refactored shuffle BLEND combining to make future 16i16 support easier. NFCI. Call the matchVectorShuffleAsBlend test as early as possible. llvm-svn: 298925	2017-03-28 15:50:23 +00:00
Simon Pilgrim	aa675ca77d	Fix signed/unsigned comparison warning llvm-svn: 298917	2017-03-28 13:40:09 +00:00
Simon Pilgrim	d48f47e25c	[X86][SSE] Begin merging vector shuffle to BLEND for lowering and combining. Split off matchVectorShuffleAsBlend from lowerVectorShuffleAsBlend for reuse in combining. llvm-svn: 298914	2017-03-28 13:05:48 +00:00
Simon Pilgrim	61437ebaf4	Wdocumentation fix llvm-svn: 298911	2017-03-28 12:29:09 +00:00
Simon Pilgrim	6afe0e2833	[X86][SSE] Set second operand to undef instead of first operand in unary shuffle combines. Copy isn't necessary after the matchVectorShuffleWithUNPCK refactor and undef value will make some future undef/zero handling easier. llvm-svn: 298910	2017-03-28 12:16:42 +00:00
Simon Pilgrim	defee5683c	Strip trailing whitespace llvm-svn: 298909	2017-03-28 11:15:17 +00:00
Sanne Wouda	d4658ee634	[AArch64] [Assembler] option to disable negative immediate conversions Summary: Similar to the ARM target in https://reviews.llvm.org/rL298380, this patch adds identical infrastructure for disabling negative immediate conversions, and converts the existing aliases to the new infrastucture. Reviewers: rengolin, javed.absar, olista01, SjoerdMeijer, samparker Reviewed By: samparker Subscribers: samparker, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D31243 llvm-svn: 298908	2017-03-28 10:02:56 +00:00
Igor Breger	f580fce2c3	[GlobalISel][X86] support G_FRAME_INDEX instruction selection. Summary: G_LOAD/G_STORE, add alternative RegisterBank mapping. For G_LOAD, Fast and Greedy mode choose the same RegisterBank mapping (GprRegBank ) for the G_GLOAD + G_FADD , can't get rid of cross register bank copy GprRegBank->VecRegBank. Reviewers: zvi, rovka, qcolombet, ab Reviewed By: zvi Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank Differential Revision: https://reviews.llvm.org/D30979 llvm-svn: 298907	2017-03-28 09:35:06 +00:00
Valery Pykhtin	9f3eca96eb	[AMDGPU] Update SI scheduler colorHighLatenciesGroups Depends on rL298896: MachineScheduler/ScheduleDAG: Add support for GetSubGraph Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30152 llvm-svn: 298902	2017-03-28 07:19:48 +00:00
Weiming Zhao	320848458b	Dont emit Mapping symbols for sections that contain only data. Summary: Dont emit mapping symbols for sections that contain only data. Patched by Shankar Easwaran <shankare@codeaurora.org> Reviewers: rengolin, peter.smith, weimingz, kparzysz, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, llvm-commits Differential Revision: https://reviews.llvm.org/D30724 llvm-svn: 298901	2017-03-28 05:40:36 +00:00
Eric Christopher	f48ef3355f	Remove an oddly unnecessary temporary. llvm-svn: 298888	2017-03-27 22:40:51 +00:00
Javed Absar	3d59437093	Improve machine schedulers for in-order processors This patch enables schedulers to specify instructions that cannot be issued with any other instructions. It also fixes BeginGroup/EndGroup. Reviewed by: Andrew Trick Differential Revision: https://reviews.llvm.org/D30744 llvm-svn: 298885	2017-03-27 20:46:37 +00:00
Valery Pykhtin	fb9905545c	[AMDGPU] SISched: Detect dependency types between blocks Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30153 llvm-svn: 298872	2017-03-27 18:22:39 +00:00
Ahmed Bougacha	f0b22c471b	[GlobalISel][AArch64] Extract a variable out of an NDEBUG block. NFC. r298863 used PtrReg, but that's never defined in release builds. Fix it. llvm-svn: 298869	2017-03-27 18:14:20 +00:00
Ahmed Bougacha	f75782f9dc	[GlobalISel][AArch64] Fold FI into LDR/STR ui addressing mode. A majority of loads and stores at O0 access an alloca. It's trivial to fold the G_FRAME_INDEX into the instruction; do it. llvm-svn: 298864	2017-03-27 17:31:56 +00:00
Ahmed Bougacha	8a654085d0	[GlobalISel][AArch64] Fold G_GEP into LDR/STR ui addressing mode. We're not to the point of supporting the load/store patterns yet (because they extensively use PatFrags). But in the meantime, we can implement some of the simplest addressing modes. llvm-svn: 298863	2017-03-27 17:31:52 +00:00
Ahmed Bougacha	85a66a6d9f	[GlobalISel][AArch64] Select store of zero to WZR/XZR. These occur very frequently, and are quite trivial to catch. llvm-svn: 298862	2017-03-27 17:31:48 +00:00
Valery Pykhtin	ba3a4def29	[AMDGPU] SISched: Update colorEndsAccordingToDependencies Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30150 llvm-svn: 298861	2017-03-27 17:26:40 +00:00
Valery Pykhtin	f70f683670	[AMDGPU] Fix SI scheduler LiveOut Refcount issue Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30145 llvm-svn: 298857	2017-03-27 17:06:36 +00:00
Ahmed Bougacha	641cb203b6	[GlobalISel][AArch64] Select CBZ. CBZ/CBNZ represent a substantial portion of all conditional branches. Look through G_ICMP to select them. We can't use tablegen yet because the existing patterns match an AArch64ISD node. llvm-svn: 298856	2017-03-27 16:35:31 +00:00
Dmitry Preobrazhensky	c512d44845	[AMDGPU][MC] Fix for Bug 28207 + LIT tests Enabled clamp and omod for v_cvt_* opcodes which have src0 of an integer type Reviewers: vpykhtin, arsenm Differential Revision: https://reviews.llvm.org/D31327 llvm-svn: 298852	2017-03-27 15:57:17 +00:00
Chad Rosier	862a41270f	[AArch64] Mark mrs of TPIDR_EL0 (thread pointer) as not having side effects. Among other things, this allows Machine LICM to hoist a costly 'mrs' instruction from within a loop. Differential Revision: http://reviews.llvm.org/D31151 llvm-svn: 298851	2017-03-27 15:52:38 +00:00
Yaxun Liu	1a14bfa022	[AMDGPU] Get address space mapping by target triple environment As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846	2017-03-27 14:04:01 +00:00
Gadi Haber	89d5f9391a	[X86][AVX2] bugzilla bug 21281 Performance regression in vector interleave in AVX2 This is a patch for an on-going bugzilla bug 21281 on the generated X86 code for a matrix transpose8x8 subroutine which requires vector interleaving. The generated code in AVX2 is currently non-optimal and requires 60 instructions as opposed to only 40 instructions generated for AVX1. The patch includes a fix for the AVX2 case where vector unpack instructions use less operations than the vector blend operations available in AVX2. In this case using vector unpack instructions is more efficient. Reviewers: zvi delena igorb craig.topper guyblank eladcohen m_zuckerman aymanmus RKSimon llvm-svn: 298840	2017-03-27 12:13:37 +00:00
Davide Italiano	a2c4e4b929	[Target] Remove some code probably copy/pasted from another backend. llvm-svn: 298825	2017-03-26 21:45:04 +00:00
Davide Italiano	5c2aa5d3e4	[MachineScheduler] Reference the correct header. llvm-svn: 298823	2017-03-26 21:27:21 +00:00
Simon Pilgrim	92925ea701	[X86][SSE] Add computeKnownBitsForTargetNode support for (V)PSLL/(V)PSRL instructions llvm-svn: 298806	2017-03-26 13:17:55 +00:00
Simon Pilgrim	049d9c921f	[X86][AVX512F] Fix reg class for VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk Fixed -verify-machineinstrs errors in fast-isel-select-sse.ll (one of many in PR27481) The VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk instructions were assuming both source registers were V128X when the second is actually supposed to be FR32X/FR64X Differential Revision: https://reviews.llvm.org/D31200 llvm-svn: 298805	2017-03-26 12:52:28 +00:00
Igor Breger	531a203a06	[GlobalISel][X86] support G_FRAME_INDEX instruction selection. Summary: Support G_FRAME_INDEX instruction selection. Reviewers: zvi, rovka, ab, qcolombet Reviewed By: ab Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank Differential Revision: https://reviews.llvm.org/D30980 llvm-svn: 298800	2017-03-26 08:11:12 +00:00
Simon Pilgrim	bec234c970	[X86] Pull out repeated ScalarValueSizeInBits code. NFCI. llvm-svn: 298783	2017-03-25 21:22:12 +00:00
Simon Pilgrim	c0720a4052	[X86][SSE] Combine (VSRLI (VSRAI X, Y), (NumSignBits-1)) -> (VSRLI X, (NumSignBits-1)) Part 3 of 3. Differential Revision: https://reviews.llvm.org/D31347 llvm-svn: 298782	2017-03-25 20:43:01 +00:00
Simon Pilgrim	6397963c81	[X86][SSE] Added ComputeNumSignBitsForTargetNode support for (V)PSRAI Part 2 of 3. Differential Revision: https://reviews.llvm.org/D31347 llvm-svn: 298780	2017-03-25 19:58:36 +00:00
Simon Pilgrim	5400a4d0af	[X86][SSE] Generalised CMP+AND1 combine to ZERO/ALLBITS+MASK Patch to generalize combinePCMPAnd1 (for handling SETCC + ZEXT cases) to work for any input that has zero/all bits set masked with an 'all low bits' mask. Replaced the implicit assumption of shift availability with a call to SupportedVectorShiftWithImm. Part 1 of 3. Differential Revision: https://reviews.llvm.org/D31347 llvm-svn: 298779	2017-03-25 19:50:14 +00:00
Sanjay Patel	9ebb68843e	[x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality, we can replace memcmp calls with inline code that is both smaller and faster. Differential Revision: https://reviews.llvm.org/D31290 llvm-svn: 298775	2017-03-25 16:05:33 +00:00
Balaram Makam	cf0e5e1c62	[AArch64] Refine Falkor Machine Model - Part1 llvm-svn: 298768	2017-03-25 04:02:39 +00:00
Yaxun Liu	14834c3e3d	[AMDGPU] Switch data layout by triple environment amdgiz Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0. amdgiz is for non-OpenCL environment where generic address space is 0. amdgizcl is for OpenCL environment where generic address space is 0. Differential Revision: https://reviews.llvm.org/D31211 llvm-svn: 298758	2017-03-25 02:05:44 +00:00
Eli Friedman	95ddd18703	[ARM] Fix mixup between Lo and Hi in SMLALBB formation. llvm-svn: 298752	2017-03-25 00:13:24 +00:00
Jessica Paquette	eac8633d6d	[Outliner] Revert r298734. When I tested r298734, I thought that red zones were enabled by default like in X86. Since red zones are behind a flag on AArch64 the testing wasn't true. llvm-svn: 298747	2017-03-24 23:00:21 +00:00
Matt Arsenault	0607a4427b	AMDGPU: Fix annotating loops with nested loop conditions If the branch condition for a loop was a phi which itself was fed from a phi from a loop, it isn't safe to try to delete the phi until after the loop is handled. llvm-svn: 298737	2017-03-24 20:57:10 +00:00
Jessica Paquette	167af85ec7	[Outliner] Remove no red zone requirment for AArch64 AArch64 doesn't require -mno-red-zone; stack fixups are sufficient here. This was unnecessarily copied over from the X86 target. (You can now outline with red zones! Yay!) Removing the requirement passes all Single/MultiSource tests. llvm-svn: 298734	2017-03-24 20:47:59 +00:00
Matt Arsenault	b5d23271e2	AMDGPU: Implement f16 fround llvm-svn: 298730	2017-03-24 20:04:18 +00:00
Matt Arsenault	b8f8dbc227	AMDGPU: Unify divergent function exits. StructurizeCFG can't handle cases with multiple returns creating regions with multiple exits. Create a copy of UnifyFunctionExitNodes that only unifies exit nodes that skips exit nodes with uniform branch sources. llvm-svn: 298729	2017-03-24 19:52:05 +00:00
Matt Arsenault	18bb24a1be	TTI: Split IsSimple in MemIntrinsicInfo All this did before was assert in EarlyCSE. llvm-svn: 298724	2017-03-24 18:56:43 +00:00
Stanislav Mekhanoshin	70603dcef2	[AMDGPU] Fold V_CNDMASK with identical source operands Such instructions sometimes appear after lowering and folding. Differential Revision: https://reviews.llvm.org/D31318 llvm-svn: 298723	2017-03-24 18:55:20 +00:00
Konstantin Zhuravlyov	4986d9fb45	[AMDGPU] Rename Kind to ValueKind in metadata to be consistent llvm-svn: 298722	2017-03-24 18:43:15 +00:00
Stanislav Mekhanoshin	a27b2cac03	[AMDGPU] Add AMDGPUAliasAnalysis to opt pipeline Previously it was added only to the BE. Differential Revision: https://reviews.llvm.org/D31323 llvm-svn: 298721	2017-03-24 18:01:14 +00:00
Benjamin Kramer	80e3d5bb24	[AMDGPU] Don't enforce constexpr, there are still old standard libraries around that don't have a constexpr std::pair. llvm-svn: 298719	2017-03-24 17:53:06 +00:00
Valery Pykhtin	e2419dc907	[AMDGPU] Remove double map lookups in SI scheduler Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30382 llvm-svn: 298718	2017-03-24 17:49:05 +00:00
Valery Pykhtin	f7d1023a73	[AMDGPU] Fix SGPR usage count in SI scheduler Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30149 llvm-svn: 298710	2017-03-24 16:45:50 +00:00
Valery Pykhtin	57ab699933	[AMDGPU] Add a new line after a debug message Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30146 llvm-svn: 298708	2017-03-24 16:37:48 +00:00
Simon Pilgrim	6aac646308	[X86][SSE] Generalised lowerTruncate by PACKSS to work with any 'zero/all bits' result, not just comparisons. Added vector compare opcodes to X86TargetLowering::ComputeNumSignBitsForTargetNode Covered by existing tests added for D22814. llvm-svn: 298704	2017-03-24 16:12:31 +00:00
Benjamin Kramer	c06d672a7a	Don't build up std::vectors with constant sizes when an array suffices. NFC. llvm-svn: 298701	2017-03-24 14:11:47 +00:00
Meador Inge	5d3c599e82	[AVR] Fix build after r298178 r298178 capitalized the fields in `ArgListEntry`. All the official targets were updated accordingly, but as an experimental target AVR was missed. llvm-svn: 298677	2017-03-24 01:57:29 +00:00
Krzysztof Parzyszek	10fbac009d	[Hexagon] Avoid infinite loops in HexagonLoopIdiomRecognition - Avoid explosive growth of the simplification queue by not queuing expressions that are alredy in it. - Add an iteration counter and abort after a sufficiently large number of iterations (assuming that it's a symptom of an infinite loop). llvm-svn: 298655	2017-03-23 23:01:22 +00:00
Eric Christopher	c78be4d3be	Kill some trailing whitespace to make some new changes a bit easier. llvm-svn: 298637	2017-03-23 19:41:10 +00:00
Nirav Dave	9ebefeb9b1	[X86] Fix Stale SDNode use in X86ISelDAGtoDAG Summary: Fixes pr32329. Reviewers: spatel, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31286 llvm-svn: 298633	2017-03-23 18:25:17 +00:00
Eric Christopher	cff8492492	Remove the subtarget argument from LowerFP_TO_INT since there's one stored on X86TargetLowering. llvm-svn: 298628	2017-03-23 17:35:08 +00:00
Eric Christopher	a19a14b42f	Remove unused X86Subtarget argument from getOnesVector. llvm-svn: 298627	2017-03-23 17:35:06 +00:00
Pirama Arumuga Nainar	bc26482717	[ARM] Fix computeKnownBits for ARMISD::CMOV Summary: The true and false operands for the CMOV are operands 0 and 1. ARMISelLowering.cpp::computeKnownBits was looking at operands 1 and 2 instead. This can cause CMOV instructions to be incorrectly folded into BFI if value set by the CMOV is another CMOV, whose known bits are computed incorrectly. This patch fixes the issue and adds a test case. Reviewers: kristof.beyls, jmolloy Subscribers: llvm-commits, aemerson, srhines, rengolin Differential Revision: https://reviews.llvm.org/D31265 llvm-svn: 298624	2017-03-23 16:47:47 +00:00
Simon Pilgrim	1c048ab6ba	[X86][SSE] Extract elements from narrower shuffle masks. Add support for widening narrow shuffle masks so we can directly extract from the relevant input vector of the shuffle. llvm-svn: 298616	2017-03-23 16:09:34 +00:00
Igor Breger	a8ba572dcf	[GlobalISel][X86] Support G_STORE/G_LOAD operation Summary: 1. Support pointer type as function argumnet and return value 2. G_STORE/G_LOAD - set legal action for i8/i16/i32/i64/f32/f64/vec128 3. RegisterBank - support typeless operations like G_STORE/G_LOAD, for scalar use GPR bank. 4. Support instruction selection for G_LOAD/G_STORE Reviewers: zvi, rovka, ab, qcolombet Reviewed By: rovka Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank Differential Revision: https://reviews.llvm.org/D30973 llvm-svn: 298609	2017-03-23 15:25:57 +00:00
Zvi Rackover	db4b032205	X86FixupBWInsts: Minor cleanup. NFC Summary: Cleanup some remnants of code from when the X86FixupBWInsts pass did both forward liveness analysis and backward liveness analysis. Reviewers: MatzeB, myatsina, DavidKreitzer Reviewed By: MatzeB Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31264 llvm-svn: 298599	2017-03-23 14:08:26 +00:00
Strahinja Petrovic	f9fa62e576	[Mips] Emit the correct DINS variant This patch fixes emitting of correct variant of DINS instruction. Differential Revision: https://reviews.llvm.org/D30988 llvm-svn: 298596	2017-03-23 13:40:07 +00:00
Simon Pilgrim	8a18299f20	[X86][SSE] Tidyup canWidenShuffleElements. NFCI. Pull out mask elements at the start, allowing us to make the widening pattern matching more readable. llvm-svn: 298594	2017-03-23 13:33:03 +00:00
Strahinja Petrovic	cac14b5334	[Mips] Fix for decoding DINS instruction - disassembler This patch fixes decoding of size and position for DINSM and DINSU instructions. Differential Revision: https://reviews.llvm.org/D31072 llvm-svn: 298593	2017-03-23 13:19:04 +00:00
Igor Breger	8a924bea78	[GlobalISel][X86] clang-format. NFC llvm-svn: 298590	2017-03-23 12:13:29 +00:00
Michael Zuckerman	85436ece89	[X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instruction Up until now, vpmovm2 instruction described its destination operand size by the source operand size. This patch adds new pattern for the vpmovm2 instruction. The node describes new expansion of the destination (from {128\|256} to 512). Differential Revision: https://reviews.llvm.org/D30654 llvm-svn: 298586	2017-03-23 09:57:01 +00:00
Davide Italiano	6974dd6412	[ARM] Reduce code duplication by factoring out in a lambda. NFCI. llvm-svn: 298572	2017-03-23 01:34:45 +00:00
Davide Italiano	0145e751c4	[AArch64] Drive-by cleanup, make this code shorter. NFCI. llvm-svn: 298563	2017-03-22 23:37:58 +00:00
Artyom Skrobov	92c0653095	Reapply r298417 "[ARM] Recommit the glueless lowering of addc/adde in Thumb1" The UB in t2_so_imm_neg conversion has been addressed under D31242 / r298512 This reverts commit r298482. llvm-svn: 298562	2017-03-22 23:35:51 +00:00
Konstantin Zhuravlyov	4cbb68959b	[AMDGPU] Do not emit isa info as code object metadata - It was decided to expose this information through other means (rocr) Differential Revision: https://reviews.llvm.org/D30970 llvm-svn: 298560	2017-03-22 23:27:09 +00:00
Artyom Skrobov	ee66d6e18e	[ARM] simplifying t2_so_imm_neg as suggested by Eli Friedman in D31242 (NFC) llvm-svn: 298559	2017-03-22 23:12:59 +00:00
Konstantin Zhuravlyov	a780ffaac2	[AMDGPU] Emit kernel debug properties as code object metadata Differential Revision: https://reviews.llvm.org/D30969 llvm-svn: 298558	2017-03-22 23:10:46 +00:00
Konstantin Zhuravlyov	ca0e7f6472	[AMDGPU] Emit kernel code properties as code object metadata - These are not required for low level runtime Differential Revision: https://reviews.llvm.org/D29949 llvm-svn: 298556	2017-03-22 22:54:39 +00:00
Eric Christopher	fd8510cfec	Clean up some Subtarget uses and casts in the X86 backend, removing unnecessary work or calls. llvm-svn: 298555	2017-03-22 22:44:52 +00:00
Konstantin Zhuravlyov	7498cd61fb	[AMDGPU] Restructure code object metadata creation - Rename runtime metadata -> code object metadata - Make metadata not flow - Switch enums to use ScalarEnumerationTraits - Cleanup and move AMDGPUCodeObjectMetadata.h to AMDGPU/MCTargetDesc - Introduce in-memory representation for attributes - Code object metadata streamer - Create metadata for isa and printf during EmitStartOfAsmFile - Create metadata for kernel during EmitFunctionBodyStart - Finalize and emit metadata to .note during EmitEndOfAsmFile - Other minor improvements/bug fixes Differential Revision: https://reviews.llvm.org/D29948 llvm-svn: 298552	2017-03-22 22:32:22 +00:00
Konstantin Zhuravlyov	eb685e5f27	[AMDGPU] Fix bug 31610 Differential Revision: https://reviews.llvm.org/D31258 llvm-svn: 298551	2017-03-22 21:48:18 +00:00
Artyom Skrobov	50a066b313	[ARM] t2_so_imm_neg had a subtle bug in the conversion, and could trigger UB by negating (int)-2147483648. By pure luck, none of the pre-existing tests triggered this; so I'm adding one. Summary: Thanks to Vitaly Buka for helping catch this. Reviewers: rengolin, jmolloy, efriedma, vitalybuka Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D31242 llvm-svn: 298512	2017-03-22 15:09:30 +00:00
Dmitry Preobrazhensky	895d377dc7	[AMDGPU][MC] Fix for Bug 28204 + LIT tests Fixed v_mad_i64_i32/u64_u32 encoding Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D30828 llvm-svn: 298502	2017-03-22 13:31:01 +00:00
Simon Pilgrim	b19a507a88	[X86] Remove unnecessary duplicate code (PR30649). NFCI. llvm-svn: 298495	2017-03-22 11:23:49 +00:00
Craig Topper	3eb6ff9d09	[X86] Remove an unused function from release builds. Reported by gccs unused function warning. llvm-svn: 298485	2017-03-22 06:07:58 +00:00
Jonas Paulsson	808c89f467	[SystemZ] Don't drop any operands in expandZExtPseudo() Make sure that any operands, e.g. of an implicit def of a super reg is transferred to the new instruction. Review: Ulrich Weigand llvm-svn: 298484	2017-03-22 06:03:32 +00:00
Vitaly Buka	e69c137f90	Revert "[ARM] Recommit the glueless lowering of addc/adde in Thumb1, including the amended (no UB anymore) fix for adding/subtracting -2147483648." Fails check-llvm with ubsan This reverts commit r298417. llvm-svn: 298482	2017-03-22 05:07:44 +00:00
Matt Arsenault	513cb7a87d	AMDGPU: Remove hasSideEffects from SI_RETURN_TO_EPILOG llvm-svn: 298454	2017-03-21 22:28:48 +00:00
Matt Arsenault	5b20fbb748	AMDGPU: Rename SI_RETURN This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. llvm-svn: 298452	2017-03-21 22:18:10 +00:00
George Burgess IV	56c7e88c2c	Let llvm.objectsize be conservative with null pointers This adds a parameter to @llvm.objectsize that makes it return conservative values if it's given null. This fixes PR23277. Differential Revision: https://reviews.llvm.org/D28494 llvm-svn: 298430	2017-03-21 20:08:59 +00:00
Coby Tayree	07a8974c48	[X86][MS-compatability][llvm] allow MS TYPE/SIZE/LENGTH operators as a part of a compound expression This patch introduces X86AsmParser with the ability to handle the aforementioned ops within compound "MS" arithmetical expressions. Currently - only supported as a stand alone Operand, e.g.: "TYPE X" now allowed : "4 + TYPE X * 128" Clang side: https://reviews.llvm.org/D31174 Differential Revision: https://reviews.llvm.org/D31173 llvm-svn: 298425	2017-03-21 19:31:55 +00:00
Davide Italiano	200e5e184a	[X86] Remove extra semicolon to placate GCC. NFCI. llvm-svn: 298423	2017-03-21 19:17:23 +00:00
Artyom Skrobov	40a4f40679	[ARM] Recommit the glueless lowering of addc/adde in Thumb1, including the amended (no UB anymore) fix for adding/subtracting -2147483648. This reverts r298328 "[ARM] Revert r297443 and r297820." and partially reverts r297842 "Revert "[Thumb1] Fix the bug when adding/subtracting -2147483648"" llvm-svn: 298417	2017-03-21 18:39:41 +00:00
Krzysztof Parzyszek	d033d1fd82	Recommit r298282 with fixes for memory allocation/deallocation [Hexagon] Recognize polynomial-modulo loop idiom again Regain the ability to recognize loops calculating polynomial modulo operation. This ability has been lost due to some changes in the preceding optimizations. Add code to preprocess the IR to a form that the pattern matching code can recognize. llvm-svn: 298400	2017-03-21 17:09:27 +00:00
Marek Olsak	5c7a61d221	AMDGPU: Buffer descriptor changes for GFX9 Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31158 llvm-svn: 298397	2017-03-21 17:00:39 +00:00
Marek Olsak	e22fdb9cac	AMDGPU: Always use VGPR indexing on GFX9 Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31157 llvm-svn: 298396	2017-03-21 17:00:32 +00:00
Reid Kleckner	b518054b87	Rename AttributeSet to AttributeList Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393	2017-03-21 16:57:19 +00:00
Matt Arsenault	5af82a7ae1	AMDGPU: Fix not including v2i16/v2f16 in register class llvm-svn: 298390	2017-03-21 16:42:50 +00:00
Matt Arsenault	f8fb605a68	AMDGPU: Fix asserting on 0 dmask for image intrinsics Fold these to undef during lowering so users get eliminated. llvm-svn: 298387	2017-03-21 16:32:17 +00:00
Sanne Wouda	2409c6403d	[ARM] [Assembler] Support negative immediates for A32, T32 and T16 Summary: To support negative immediates for certain arithmetic instructions, the instruction is converted to the inverse instruction with a negated (or inverted) immediate. For example, "ADD r0, r1, #FFFFFFFF" cannot be encoded as an ADD instruction. However, "SUB r0, r1, #1" is equivalent. These conversions are different from instruction aliases. An alias maps several assembler instructions onto one encoding. A conversion, however, maps an invalid instruction--e.g. with an immediate that cannot be represented in the encoding--to a different (but equivalent) instruction. Several instructions with negative immediates were being converted already, but this was not systematically tested, nor did it cover all instructions. This patch implements all possible substitutions for ARM, Thumb1 and Thumb2 assembler and adds tests. It also adds a feature flag (-mattr=+no-neg-immediates) to turn these substitutions off. This is helpful for users who want their code to assemble to exactly what they wrote. Reviewers: t.p.northover, rovka, samparker, javed.absar, peter.smith, rengolin Reviewed By: javed.absar Subscribers: aadg, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D30571 llvm-svn: 298380	2017-03-21 14:59:17 +00:00
Sanjay Patel	79379cae15	[x86] use PMOVMSK for vector-sized equality comparisons We could do better by splitting any oversized type into whatever vector size the target supports, but I left that for future work if it ever comes up. The motivating case is memcmp() calls on 16-byte structs, so I think we can wire that up with a TLI hook that feeds into this. Differential Revision: https://reviews.llvm.org/D31156 llvm-svn: 298376	2017-03-21 13:50:33 +00:00
Valery Pykhtin	fd4c410f4d	[AMDGPU] Iterative scheduling infrastructure + minimal registry scheduler Differential revision: https://reviews.llvm.org/D31046 llvm-svn: 298368	2017-03-21 13:15:46 +00:00
Sam Kolton	f60ad58dad	[ADMGPU] SDWA peephole optimization pass. Summary: First iteration of SDWA peephole. This pass tries to combine several instruction into one SDWA instruction. E.g. it converts: ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1 V_ADD_I32_e32 %vreg2, %vreg0, %vreg3 V_LSHLREV_B32_e32 %vreg4, 16, %vreg2 ''' Into: ''' V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD ''' Pass structure: 1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''. 2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0''' 3. Iterate over all potential instructions and check if they can be converted into SDWA. 4. Convert instructions to SDWA. This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done). There are several ways this pass can be improved: 1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass. 2. Introduce more SDWA patterns 3. Introduce mnemonics to limit when SDWA patterns should apply Reviewers: vpykhtin, alex-t, arsenm, rampitec Subscribers: wdng, nhaehnle, mgorny Differential Revision: https://reviews.llvm.org/D30038 llvm-svn: 298365	2017-03-21 12:51:34 +00:00
Andrea Di Biagio	7937be7dd3	[DebugInfo][X86] Teach Optimize LEAs pass to handle debug values This patch fixes an issue in the Optimize LEAs pass where redundant LEAs were not removed because they were being used by debug values. The debug values are now ignored when determining whether LEAs are redundant. For now the debug values for the redundant LEAs are marked as undefined, effectively lost. The intention is for a follow up patch which will attempt to preserve the debug values where possible. Patch by Andrew Ng. Differential Revision: https://reviews.llvm.org/D30835 llvm-svn: 298360	2017-03-21 11:36:21 +00:00
Jonas Paulsson	bd65421f08	[SystemZ] Don't drop MO flags in foldMemoryOperandImpl() The def operand of the new LG/LD should have the old def operands flags and subreg index. New test: test/CodeGen/SystemZ/fold-memory-op-impl.ll Review: Ulrich Weigand llvm-svn: 298341	2017-03-21 05:49:40 +00:00
Vitaly Buka	c12716e742	Revert "[Hexagon] Recognize polynomial-modulo loop idiom again" Fix memory leaks on check-llvm tests detected by Asan. This reverts commit r298282. llvm-svn: 298329	2017-03-21 00:59:51 +00:00
Eli Friedman	76732acc23	[ARM] Revert r297443 and r297820. The glueless lowering of addc/adde in Thumb1 has known serious miscompiles (see https://reviews.llvm.org/D31081), and r297820 causes an infinite loop for certain constructs. It's not clear when they will be fixed, so let's just take them out of the tree for now. (I resolved a small conflict with r297453.) llvm-svn: 298328	2017-03-21 00:26:39 +00:00
Vadzim Dambrouski	ba789cbd3d	[ARM] Fix PR32130: Handle promotion of zero sized constants. The special case of zero sized values was previously not handled correctly. This patch handles this by not promoting if the size is zero. Patch by Tim Neumann. Differential Revision: https://reviews.llvm.org/D31116 llvm-svn: 298320	2017-03-20 22:59:57 +00:00
Evgeniy Stepanov	e829eecc05	[Fuchsia] Use %gs for ABI slots under -mcmodel=kernel Make x86_64-fuchsia targets under -mcmodel=kernel use %gs rather than %fs to access ABI slots for stack-protector and safe-stack Patch by Roland McGrath. Differential Revision: https://reviews.llvm.org/D30870 llvm-svn: 298302	2017-03-20 20:35:37 +00:00
Krzysztof Parzyszek	8490251de3	[Hexagon] Recognize polynomial-modulo loop idiom again Regain the ability to recognize loops calculating polynomial modulo operation. This ability has been lost due to some changes in the preceding optimizations. Add code to preprocess the IR to a form that the pattern matching code can recognize. llvm-svn: 298282	2017-03-20 18:12:58 +00:00
Konstantin Zhuravlyov	2534bc07f4	[AMDGPU] Run always inliner early in opt Differential Revision: https://reviews.llvm.org/D31141 llvm-svn: 298281	2017-03-20 18:06:45 +00:00
Dmitry Preobrazhensky	1e124e1825	[AMDGPU][MC] Fix for Bugs 28201, 28199, 28170 + LIT tests This fix enables sp3 abs modifier with constants Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D30825 llvm-svn: 298265	2017-03-20 16:33:20 +00:00
Jessica Paquette	02cbfb2926	[Outliner] ACTUALLY remove the errs output I don't know how to type. This fixes the last commit which would have made all of the overflows legal, and kept the screaming. llvm-svn: 298263	2017-03-20 16:25:04 +00:00
Jessica Paquette	5d59a4ee19	[Outliner] Remove output for offset range check Forgot to remove some output before committing last time. (Instruction fixups don't actually overflow anywhere in the test suite so far, so I missed it). To prevent the outliner from screaming "Overflow!" in the event that that does happen, this commit removes that output. llvm-svn: 298260	2017-03-20 15:51:45 +00:00
Dmitry Preobrazhensky	40af9c35d3	[AMDGPU][MC] Fix for Bugs 28200, 28202 + LIT tests Fixed several related issues with VOP3 fp modifiers. Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D30821 llvm-svn: 298255	2017-03-20 14:50:35 +00:00
Diana Picus	d79253a9f7	[GlobalISel] Use the correct calling conv for calls This commit adds a parameter that lets us pass in the calling convention of the call to CallLowering::lowerCall. This allows us to handle situations where the calling convetion of the callee is different from that of the caller. Differential Revision: https://reviews.llvm.org/D31039 llvm-svn: 298254	2017-03-20 14:40:18 +00:00
Konstantin Zhuravlyov	8a67eb144f	Revert "[AMDGPU] Run always inliner early in opt" This reverts commit r297958, it breaks device-libs build. llvm-svn: 298239	2017-03-20 09:26:08 +00:00
Craig Topper	5992c8d1dc	[AVX-512] Handle kor/kand/kandn/kxor/kxnor/knot intrinsics at lowering time instead of isel Summary: Currently we handle these intrinsics at isel with special patterns. But as they just map to normal logic operations, we should just handle them at lowering. This will expose them to DAG combine optimizations. Right now the kor-sequence test generates a bunch of regclass copies between GR16 and VK16 that the peephole optimizer and/or register coallescing are removing to keep everything in the mask domain. By handling the logic op intrinsics earlier, these copies become bitcasts in the DAG and get removed by DAG combine which seems more robust. This should help enable my plan to stop copying between K registers and GR8/GR16. The peephole optimizer can't remove a chain of copies between K and GR32 with insert_subreg/extract_subreg present in the chain so the kor-sequence test break. But this patch should dodge the problem entirely. Reviewers: zvi, delena, RKSimon, igorb Reviewed By: igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31056 llvm-svn: 298228	2017-03-19 17:11:09 +00:00
Simon Pilgrim	5fa1b9a12f	Fix MSVC warning: "switch statement contains 'default' but no 'case' labels". NFCI. llvm-svn: 298225	2017-03-19 16:39:04 +00:00
Oren Ben Simhon	0ef61ec32a	[MIR] Support Customed Register Mask and CSRs The MIR printer dumps a string that describe the register mask of a function. A static predefined list of register masks matches a static list of strings. However when the register mask is not from the static predefined list, there is no descriptor string and the printer fails. This patch adds support to custom register mask printing and dumping. Also the list of callee saved registers (describing the registers that must be preserved for the caller) might be dynamic. As such this data needs to be dumped and parsed back to the Machine Register Info. Differential Revision: https://reviews.llvm.org/D30971 llvm-svn: 298207	2017-03-19 08:14:18 +00:00
Matthias Braun	e6ff30b696	ExecutionDepsFix: Let targets specialize the pass; NFC Let targets specialize the pass with the register class so we can get a parameterless default constructor and can put the pass into the pass registry to enable testing with -run-pass=. llvm-svn: 298184	2017-03-18 05:08:58 +00:00
Matthias Braun	e9f8209e87	ExecutionDepsFix: Normalize names; NFC Normalize ExeDepsFix, execution-fix, ExecutionDependencyFix and ExecutionDepsFix to the last one. llvm-svn: 298183	2017-03-18 05:05:40 +00:00
Nirav Dave	ac6081cb67	Make library calls sensitive to regparm module flag (Fixes PR3997). Reviewers: mkuper, rnk Subscribers: mehdi_amini, jyknight, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D27050 llvm-svn: 298179	2017-03-18 00:44:07 +00:00
Nirav Dave	6de2c77944	Capitalize ArgListEntry fields. NFC. llvm-svn: 298178	2017-03-18 00:43:57 +00:00
Stanislav Mekhanoshin	8e45acfc38	[AMDGPU] Add address space based alias analysis pass This is direct port of HSAILAliasAnalysis pass, just cleaned for style and renamed. Differential Revision: https://reviews.llvm.org/D31103 llvm-svn: 298172	2017-03-17 23:56:58 +00:00
Jessica Paquette	ea8cc09be0	[Outliner] Add outliner for AArch64 This commit adds the necessary target hooks for outlining in AArch64. It also refactors the switch statement used in `getMemOpBaseRegImmOfsWidth` into a more general function, `getMemOpInfo`. This allows the outliner to share that code without copying and pasting it. The AArch64 outliner can be run using -mllvm -enable-machine-outliner, as with the X86-64 outliner. The test for this pass verifies that the outliner does, in fact outline functions, fixes up the stack accesses properly, and can correctly generate a tail call. In the future, this test should be replaced with a MIR test, so that we can properly test immediate offset overflows in fixed-up instructions. llvm-svn: 298162	2017-03-17 22:26:55 +00:00
Matt Arsenault	59ece95f6c	AMDGPU: Fix broken condition in hazard recognizer Fixes bug 32248. llvm-svn: 298125	2017-03-17 21:36:28 +00:00
Matt Arsenault	e70d5dcf3e	AMDGPU: Fix handling of constant phi input loop conditions If the loop condition was an i1 phi with a constantexpr input, this would add a loop intrinsic fed by a phi dependent on a call to if.break in the same block. Insert the call in the loop header. llvm-svn: 298121	2017-03-17 20:52:21 +00:00
Matt Arsenault	c5b641ac02	AMDGPU: Cleanup control flow intrinsics Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list. It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those. llvm-svn: 298119	2017-03-17 20:41:45 +00:00
Sanjay Patel	455703a0c6	[x86] clean up setcc with negated operand transform and add missing test; NFCI llvm-svn: 298118	2017-03-17 20:29:40 +00:00
Reid Kleckner	edf1cbb580	[X86] Emit fewer instructions to allocate >16GB stack frames Summary: Use this code pattern when RAX is live, instead of emitting up to 2 billion adjustments: pushq %rax movabsq +-$Offset+-8, %rax addq %rsp, %rax xchg %rax, (%rsp) movq (%rsp), %rsp Try to clean this code up a bit while I'm here. In particular, hoist the logic that handles the entire adjustment with `movabsq $imm, %rax` out of the loop. This negates the offset in the prologue and uses ADD because X86 only has a two operand subtract which always subtracts from the destination register, which can no longer be RSP. Fixes PR31962 Reviewers: majnemer, sdardis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30052 llvm-svn: 298116	2017-03-17 20:25:49 +00:00
Sanjay Patel	25bd713d33	[x86] avoid adc/sbb assert when both sides of add are zexted (PR32316) As noted in the comment, we might want to account for this case, but I didn't look at what that would mean for the asm. I'm also not sure why this only reproduces with avx512, but I'm putting a conservative fix in for now to avoid the crash. Also, if both sides of an add are zexted, shouldn't we shrink that add? https://bugs.llvm.org/show_bug.cgi?id=32316 llvm-svn: 298107	2017-03-17 17:27:31 +00:00
Reid Kleckner	98e56430b9	Fix wasm build after arg_begin iterator type change llvm-svn: 298106	2017-03-17 17:24:03 +00:00
Stanislav Mekhanoshin	ee2dd785f6	Only unswitch loops with uniform conditions Loop unswitching can be extremely harmful for a SIMT target. In case if hoisted condition is not uniform a SIMT machine will execute both clones of a loop sequentially. Therefor LoopUnswitch checks if the condition is non-divergent. Since DivergenceAnalysis adds an expensive PostDominatorTree analysis not needed for non-SIMT targets a new option is added to avoid unneded analysis initialization. The method getAnalysisUsage is called when TargetTransformInfo is not yet available and we cannot use it here. For that reason a new field DivergentTarget is added to PassManagerBuilder to control the behavior and set this field from a target. Differential Revision: https://reviews.llvm.org/D30796 llvm-svn: 298104	2017-03-17 17:13:41 +00:00
Chad Rosier	a69dcb6b66	[AArch64] Use alias analysis in the load/store optimization pass. This allows the optimization to rearrange loads and stores more aggressively. Differential Revision: http://reviews.llvm.org/D30903 llvm-svn: 298092	2017-03-17 14:19:55 +00:00
Andre Vieira	913ffeb5ba	[ARM] Fix triple format in test branch disassemble test Fixing triple format in the tests added for the branch label fix for Thumb Targets. Also recommitting previously approved patch, see https://reviews.llvm.org/D30943. Reviewed by: samparker Differential Revision: https://reviews.llvm.org/D30987 llvm-svn: 298056	2017-03-17 09:37:10 +00:00
Craig Topper	a8d4097445	[AVX-512] Make VEX encoded FMA instructions available when AVX512 is enabled regardless of whether +fma was added on the command line. We weren't able to handle isel of the 128/256-bit FMA instructions when AVX512F was enabled but VLX and FMA weren't. I didn't mask FeatureAVX512 imply FeatureFMA as I wasn't sure I wanted disabling FMA to also disable AVX512. Instead we just can't prevent FMA instructions if AVX512 is enabled. Another option would be to promote 128/256-bit to 512-bit, do the operation and extract it. But that requires a lot of extra isel patterns. Since no CPUs exist that support AVX512, but not FMA just using the VEX instructions seems better. llvm-svn: 298051	2017-03-17 07:37:31 +00:00
Craig Topper	02cd0bfa46	[X86] Remove unused predicate. NFC llvm-svn: 298050	2017-03-17 07:37:27 +00:00
Jonas Paulsson	8a7bd24c82	[SystemZ] Add use of super-reg in splitMove() If one of the subregs of the 128 bit reg is undefined when splitMove() splits a store into two instructions, a use of an undefined physical register results. To remedy this, an implicit use of the super register is added onto both new instructions, along with propagated kill and undef flags. This was discovered with llvm-stress, and that test case is attached as test/CodeGen/SystemZ/splitMove_undefReg_mverifier.ll Thanks to Matthias Braun for helping with a nice explanation. Review: Ulrich Weigand llvm-svn: 298047	2017-03-17 06:47:08 +00:00
Craig Topper	6a1290a0fd	[AVX-512] Give priority to EVEX encoded scalar FMA instructions when we have FMA, AVX512 and no VLX. We were giving priority if VLX was enabled. llvm-svn: 298046	2017-03-17 06:10:37 +00:00
Craig Topper	e4d5aa7efc	[X86] Cleanup the AddedComplexity values on move immediate instructions. NFC This makes the values a little more consistent between similar instruction and reduces the values some. This results in better grouping in the isel table saving a few bytes. llvm-svn: 298043	2017-03-17 05:59:54 +00:00
Eric Christopher	53da761570	Remove LessPreciseFPMADOption from TargetOptions along with all of the associated command line options and functions - it's currently unused in all of llvm and clang other than being set and reset. llvm-svn: 298023	2017-03-17 00:38:03 +00:00
Eli Friedman	da228fee0c	[ARM] Use alias analysis in ARMPreAllocLoadStoreOpt. This allows the optimization to rearrange loads and stores more aggressively. This doesn't really affect performance, but it helps codesize. Differential Revision: https://reviews.llvm.org/D30839 llvm-svn: 298021	2017-03-17 00:34:26 +00:00
Jacques Pienaar	da9352c173	clean Lanai namespace Summary: This patch cleans the namespace of the Lanai target. Reviewers: jpienaar Reviewed By: jpienaar Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30955 llvm-svn: 298015	2017-03-16 23:22:10 +00:00
Reid Kleckner	45707d4d5a	Remove getArgumentList() in favor of arg_begin(), args(), etc Users often call getArgumentList().size(), which is a linear way to get the number of function arguments. arg_size(), on the other hand, is constant time. In general, the fact that arguments are stored in an iplist is an implementation detail, so I've removed it from the Function interface and moved all other users to the argument container APIs (arg_begin(), arg_end(), args(), arg_size()). Reviewed By: chandlerc Differential Revision: https://reviews.llvm.org/D31052 llvm-svn: 298010	2017-03-16 22:59:15 +00:00
Derek Schuff	b879539aac	[WebAssembly] Fix some broken type encodings in wasm binary A recent change switch the in-memory wasm value types to be signed integers, but I missing a few cases where these were being writing to the binary. Differential Revision: https://reviews.llvm.org/D31014 Patch by Sam Clegg llvm-svn: 297991	2017-03-16 20:49:48 +00:00
Matthias Braun	e959544517	TargetInstrInfo: Provide default implementation of isTailCall(). In fact this default implementation should be the only implementation, keep it virtual for now to accomodate targets that don't model flags correctly. Differential Revision: https://reviews.llvm.org/D30747 llvm-svn: 297980	2017-03-16 20:02:30 +00:00
Daniel Sanders	0e64202871	[globalisel] Correct G_CONSTANT path of selectArithImmed() Earlier stages of GlobalISel always use ConstantInt in G_CONSTANT so that's what we should check for. This fixes a crash introduced in r297782. llvm-svn: 297968	2017-03-16 18:04:50 +00:00
Hiroshi Inoue	138a3faa3e	Test commit. llvm-svn: 297959	2017-03-16 16:30:06 +00:00
Stanislav Mekhanoshin	f80507979d	[AMDGPU] Run always inliner early in opt We can mark functions to always inline early in the opt. Since we do not have call support this early inlining creates opportunities for inter-procedural optimizations which would not occur otherwise. Differential Revision: https://reviews.llvm.org/D31016 llvm-svn: 297958	2017-03-16 16:11:46 +00:00
Colin LeMahieu	ddebad956e	[Hexagon] Updating inline saturate lanes for v62 version. llvm-svn: 297920	2017-03-16 00:35:28 +00:00
Simon Pilgrim	cee3fc61cb	Remove redundant condition (PR32263). NFCI. llvm-svn: 297915	2017-03-15 23:27:43 +00:00
Matt Arsenault	7dc01c96ae	AMDGPU: Allow sinking of addressing modes for atomic_inc/dec llvm-svn: 297913	2017-03-15 23:15:12 +00:00
Simon Pilgrim	06c70adcf0	[X86] Add missing BITREVERSE costs for SSE2 vectors and i8/i16/i32/i64 scalars Prep work for PR31810 llvm-svn: 297876	2017-03-15 19:34:55 +00:00
Ahmed Bougacha	62cd73d989	[GlobalISel][AArch64] Select ADDXri. We're now able to select ADDWri thanks to the new complex pattern support. Extend that to ADDXri. llvm-svn: 297874	2017-03-15 19:20:59 +00:00
Matt Arsenault	86e02ce2dc	AMDGPU: Fix unnecessary ands when packing f16 vectors computeKnownBits didn't handle fp_to_fp16 to report the high bits as 0. ARM maps the generic node to an instruction that does not modify the high bits of the register, so introduce a target node where the high bits are known 0. llvm-svn: 297873	2017-03-15 19:04:26 +00:00
Tim Northover	0d98b03b9f	ARM: avoid clobbering register in v6 jump-table expansion. If we got unlucky with register allocation and actual constpool placement, we could end up producing a tTBB_JT with an index that's already been clobbered. Technically, we might be able to fix this situation up with a MOV, but I think the constant islands pass is complex enough without having to deal with more weird edge-cases. llvm-svn: 297871	2017-03-15 18:38:13 +00:00
Matt Arsenault	0e6e018054	AMDGPU: Minor SIAnnotateControlFlow cleanups Newline fixes, early return, range loops. llvm-svn: 297865	2017-03-15 18:00:12 +00:00
Nemanja Ivanovic	ffcf0fb1cc	[PowerPC][Altivec] Add mfvrd and mffprd extended mnemonic mfvrd and mffprd are both alias to mfvrsd. This patch enables correct parsing of the aliases, but we still emit a mfvrsd. Committing on behalf of brunoalr (Bruno Rosa). Differential Revision: https://reviews.llvm.org/D29177 llvm-svn: 297849	2017-03-15 16:04:53 +00:00
Sanjay Patel	fa929a2134	Cyle -> Cycle; NFCI llvm-svn: 297846	2017-03-15 15:37:42 +00:00
Artyom Skrobov	e72e1ba434	Revert "[Thumb1] Fix the bug when adding/subtracting -2147483648" This reverts r297820 which apparently fails on A15 hosts. llvm-svn: 297842	2017-03-15 14:50:43 +00:00
Simon Pilgrim	6778b8f715	Reverted unintended commit llvm-svn: 297841	2017-03-15 14:47:30 +00:00
Simon Pilgrim	3804a12fc3	Fix Wint-in-bool-context warning (PR32248) llvm-svn: 297840	2017-03-15 14:38:19 +00:00
Sam Parker	db20d48336	Reverting r297821 due to breaking lld test. llvm-svn: 297838	2017-03-15 14:06:42 +00:00
Simon Pilgrim	493f4462bf	[X86][SSE] Fixed shuffle MOVSS/MOVSD combining of all zeroable inputs Turns out it can happen, so the assertion was too harsh Found during fuzz testing llvm-svn: 297833	2017-03-15 13:16:46 +00:00
Petar Jovanovic	b71386a4a4	[Mips] Add support to match more patterns for DEXT and CINS This patch adds support for recognizing more patterns to match to DEXT and CINS instructions. It finds cases where multiple instructions could be replaced with a single DEXT or CINS instruction. For example, for the following: define i64 @dext_and32(i64 zeroext %a) { entry: %and = and i64 %a, 4294967295 ret i64 %and } instead of generating: 0000000000000088 <dext_and32>: 88: 64010001 daddiu at,zero,1 8c: 0001083c dsll32 at,at,0x0 90: 6421ffff daddiu at,at,-1 94: 03e00008 jr ra 98: 00811024 and v0,a0,at 9c: 00000000 nop the following gets generated: 0000000000000068 <dext_and32>: 68: 03e00008 jr ra 6c: 7c82f803 dext v0,a0,0x0,0x20 Cases that are covered: DEXT: 1. and $src, mask where mask > 0xffff 2. zext $src zero extend from i32 to i64 CINS: 1. and (shl $src, pos), mask 2. shl (and $src, mask), pos 3. zext (shl $src, pos) zero extend from i32 to i64 Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D30464 llvm-svn: 297832	2017-03-15 13:10:08 +00:00
Simon Pilgrim	a0b0b74b9a	Align cost model columns. NFCI. llvm-svn: 297824	2017-03-15 11:57:42 +00:00
Sam Parker	274472f7c5	[ARM] Fix for branch label disassembly for Thumb Different MCInstrAnalysis classes for arm and thumb mode, each with their own evaluateBranch implementation. I added a test case and fixed the coff-relocations test to use '<label>:' rather than '<label>' in the CHECK-LABEL entries, since the ones without the colon would match branch targets. Might be worth noticing that llvm-objdump does not lookup the relocation and thus assigns it a target depending on the encoded immediate which #0, so it thinks it branches to the next instruction. Committed on behalf of Andre Vieira (avieira). Differential Revision: https://reviews.llvm.org/D30943 llvm-svn: 297821	2017-03-15 10:21:23 +00:00
Artyom Skrobov	3fa5fd1dd2	[Thumb1] Fix the bug when adding/subtracting -2147483648 Differential Revision: https://reviews.llvm.org/D30829 llvm-svn: 297820	2017-03-15 10:19:16 +00:00
Sam Parker	654cb8263a	[ARM] Enable SMLAL[B\|T] isel Enable the selection of the 64-bit signed multiply accumulate instructions which operate on 16-bit operands. These are enabled for ARMv5TE onwards for ARM and for V6T2 and other DSP enabled Thumb architectures. Differential Revision: https://reviews.llvm.org/D30044 llvm-svn: 297809	2017-03-15 08:27:11 +00:00
Daniel Sanders	a228df75c0	[globalisel] LLVM_BUILD_GLOBAL_ISEL=OFF should prevent GlobalISel instruction selector from being declared. llvm-svn: 297786	2017-03-14 22:09:29 +00:00
Daniel Sanders	8a4bae9993	[globalisel][tblgen] Add support for ComplexPatterns Summary: Adds a new kind of MachineOperand: MO_Placeholder. This operand must not appear in the MIR and only exists as a way of creating an 'uninitialized' operand until a matcher function overwrites it. Depends on D30046, D29712 Reviewers: t.p.northover, ab, rovka, aditya_nandakumar, javed.absar, qcolombet Reviewed By: qcolombet Subscribers: dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D30089 llvm-svn: 297782	2017-03-14 21:32:08 +00:00
Simon Pilgrim	cf2da96c82	[SelectionDAG] Add a signed integer absolute ISD node Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering. ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns. At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom. Differential Revision: https://reviews.llvm.org/D29639 llvm-svn: 297780	2017-03-14 21:26:58 +00:00
Derek Schuff	e2688c432f	[WebAssembly] Use LEB encoding for value types Previously we were using the encoded LEB hex values for the value types. This change uses the decoded negative value and the LEB encoder to write them out. Differential Revision: https://reviews.llvm.org/D30847 Patch by Sam Clegg llvm-svn: 297777	2017-03-14 20:23:22 +00:00
Evgeniy Stepanov	43dcf4d330	Fix asm printing of associated sections. Make MCSectionELF::AssociatedSection be a link to a symbol, because that's how it works in the assembly, and use it in the asm printer. llvm-svn: 297769	2017-03-14 19:28:51 +00:00
Eli Friedman	caea769f11	[ARM] Replace some C++ selection code with TableGen patterns. NFC. Differential Revision: https://reviews.llvm.org/D30794 llvm-svn: 297768	2017-03-14 18:43:37 +00:00
Krzysztof Parzyszek	9416abbc4a	[Hexagon] Fix a condition in HexagonEarlyIfConv.cpp This fixes llvm.org/PR32265. llvm-svn: 297745	2017-03-14 15:21:33 +00:00
Artyom Skrobov	2de256642b	Fix typo in comment llvm-svn: 297742	2017-03-14 14:13:19 +00:00
Oliver Stannard	6ee22c41f8	[ARM] Diagnose ARM MOVT without :lower16: or :upper16: expression This instruction was missing from the list of opcodes that we check, so we were hitting an llvm_unreachable in ARMMCCodeEmitter.cpp for the ARM MOVT instruction, rather than the diagnostic that is emitted for the other MOVW/MOVT instructions. Differential revision: https://reviews.llvm.org/D30936 llvm-svn: 297739	2017-03-14 13:50:10 +00:00
Artyom Skrobov	283316b5c0	De-duplicate the two implementations of ARMBaseInstrInfo::isProfitableToIfCvt() [NFC] Reviewers: congh, rengolin Subscribers: aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D30934 llvm-svn: 297738	2017-03-14 13:38:45 +00:00
Sam Parker	916b1ba617	[ARM] Move SMULW[B\|T] isel to DAG Combine Create nodes for smulwb and smulwt and move their selection from DAGToDAG to DAG combine. smlawb and smlawt can then be selected using tablegen. Added some helper functions to detect shift patterns as well as a wrapper around SimplifyDemandBits. Added a couple of extra tests. Differential Revision: https://reviews.llvm.org/D30708 llvm-svn: 297716	2017-03-14 09:13:22 +00:00
Oren Ben Simhon	fe34c5e429	Disable Callee Saved Registers Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller. Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list. The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee. The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee. Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span). The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments. The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC. Differential Revision: https://reviews.llvm.org/D28566 llvm-svn: 297715	2017-03-14 09:09:26 +00:00
Craig Topper	7a5ee1c5ed	[AVX-512] Use iPTR instead of i64 in patterns for extract_subvector/insert_subvector index. llvm-svn: 297707	2017-03-14 06:40:04 +00:00
Jonas Paulsson	a48ea231c0	[TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705	2017-03-14 06:35:36 +00:00
Craig Topper	9d50e187cd	[AVX-512] Pre-emptively fix more places in fastisel where we might copy a VK1 register into a AH/BH/CH/DH register. llvm-svn: 297704	2017-03-14 04:18:25 +00:00
Nirav Dave	54e22f33d9	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting with compiler time improvements Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 297695	2017-03-14 00:34:14 +00:00
Artyom Skrobov	bf19d4bc29	[Thumb1] combine ADDC/SUBC with a negative immediate Summary: This simple optimization has been split out of https://reviews.llvm.org/D30400 Reviewers: efriedma, jmolloy Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30829 llvm-svn: 297682	2017-03-13 22:36:14 +00:00
Craig Topper	784f241b59	[AVX-512] Fix another case where we are copying from a mask register using AH/BH/CH/DH with fastisel. Fixes PR32256. Still planning to do an audit for other possible cases. llvm-svn: 297678	2017-03-13 21:58:54 +00:00
Simon Pilgrim	9df7d08cb2	[X86][MMX] Fix folding of shift value loads to cover whole 64-bits rL230225 made the assumption that only the lower 32-bits of an MMX register load is used as a shift value, when in fact the whole 64-bits are reloaded and treated as a i64 to determine the shift value. This patch reverts rL230225 to ensure that the whole 64-bits of memory are folded and ensures that the upper 32-bit are zero'd for cases where the shift value has come from a scalar source. Found during fuzz testing. Differential Revision: https://reviews.llvm.org/D30833 llvm-svn: 297667	2017-03-13 21:23:29 +00:00
Andrew Kaylor	a11d020699	Revert r295004 (Add MXCSR) due to errors reported by MachineVerifier I am leaving the code in clang which filters mxcsr from the clobber list because that is still technically correct and will be useful again when the MXCSR register is reintroduced. llvm-svn: 297664	2017-03-13 20:35:10 +00:00
Matt Arsenault	747bf8afa8	AMDGPU: Re-use TM.getNullPointerValue llvm-svn: 297662	2017-03-13 20:18:14 +00:00
Matt Arsenault	971c85ebb4	AMDGPU: Treat 0 as private null pointer in addrspacecast lowering llvm-svn: 297658	2017-03-13 19:47:31 +00:00
Jessica Paquette	c984e21394	[Outliner] Add tail call support This commit adds tail call support to the MachineOutliner pass. This allows the outliner to insert jumps rather than calls in areas where tail calling is possible. Outlined tail calls include the return or terminator of the basic block being outlined from. Tail call support allows the outliner to take returns and terminators into consideration while finding candidates to outline. It also allows the outliner to save more instructions. For example, in the X86-64 outliner, a tail called outlined function saves one instruction since no return has to be inserted. llvm-svn: 297653	2017-03-13 18:39:33 +00:00
Craig Topper	616641632e	[X86] Lower AVX2 gather intrinsics similar to AVX-512. Apply the same input source optimizations to break execution dependencies. For AVX-512 we force the input to zero if the input is undef or the mask is all ones to break an execution dependency. This patch brings the same behavior to AVX2. llvm-svn: 297652	2017-03-13 18:34:46 +00:00
Craig Topper	eb7ea28bdd	[AVX-512] If gather mask is all ones, force the input to a zero vector. We were already forcing undef inputs to become a zero vector, this now catches an all ones mask too. Ideally we'd use undef and let execution dep fix handle picking the best register/clearance for the undef, but I don't think it can handle the early clobber today. llvm-svn: 297651	2017-03-13 18:17:46 +00:00
Diana Picus	94db2e288b	[ARM] GlobalISel: Support SP in regbankselect We used to hit an unreachable in getRegBankFromRegClass when dealing with the stack pointer. This commit adds support for the GPRsp reg class. llvm-svn: 297621	2017-03-13 14:28:34 +00:00
Balaram Makam	cacc08bb46	[AArch64] Map Sched Read/Write resources for Falkor. llvm-svn: 297611	2017-03-13 10:42:17 +00:00
Sjoerd Meijer	aea3a990a2	ARMDisassembler: loop over ARM decode tables Loop over the ARM decode tables; this is a clean-up to reduce some code duplication. Differential Revision: https://reviews.llvm.org/D30814 llvm-svn: 297608	2017-03-13 09:41:10 +00:00
Craig Topper	48ba1e2d66	[AVX-512] Add VEX_WIG to VEX vcvtsd2ss/vcvtss2sd intrinsic instructions so they can be correctly matched by EVEX2VEX table generation. llvm-svn: 297601	2017-03-13 05:14:47 +00:00
Craig Topper	08b413acf2	[AVX-512] Use sse_loadf32/f64 for vcvtss2sd and vcvtsd2ss intrinsic patterns. llvm-svn: 297600	2017-03-13 05:14:44 +00:00
Craig Topper	5a63ca2ad2	[AVX-512] Use sse_load_f64/f32 in VCVTSS2SI/VCVTSD2SI patterns. llvm-svn: 297599	2017-03-13 03:59:06 +00:00
Craig Topper	111b2d6997	[X86] Remove unused SDTypeProfile. NFC llvm-svn: 297594	2017-03-12 23:05:03 +00:00
Craig Topper	2b92542908	[X86] Lower SSE/AVX cmpps/pd intrinsics directly to X86ISD::CMPP SDNodes. This allows us to remove a duplicate set of patterns. llvm-svn: 297593	2017-03-12 23:05:00 +00:00
Craig Topper	7d56c8315b	[AVX-512] Fix the valid immediates for the scatter/gather prefetch intrinsics. The immediate should be 1 or 2, not 0 or 1. This was found while adding bounds checking to clang. In fact the existing clang builtin test failed if we ran it all the way to assembly. llvm-svn: 297591	2017-03-12 22:29:12 +00:00
Sanjay Patel	f06b963a2b	[x86] don't blindly transform SETB into SBB I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently. This happens because we were transforming any 'setb' - even when we only wanted a single-bit result. This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that existing behavior in this patch. Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files where this transform still fires. The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate issue. Differential Revision: https://reviews.llvm.org/D30611 llvm-svn: 297586	2017-03-12 18:28:48 +00:00
Azharuddin Mohammed	473b75c3d5	Remove CRC32 instructions from AArch64InstrInfo::hasShiftedReg Summary: A53 scheduler causes an assertion failure on all CRC instructions: include/llvm/CodeGen/MachineInstr.h:280: const llvm::MachineOperand &llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i < getNumOperands() && "getOperand() out of range!"' failed. The case statements corresponding to CRC instructions are incorrect and should be removed. Also adding a testcase while on this. Reviewers: t.p.northover, javed.absar, apazos, rengolin Reviewed By: rengolin Subscribers: evandro, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30274 llvm-svn: 297582	2017-03-12 14:02:32 +00:00
Craig Topper	58647b16e5	[AVX-512] Fix a bad use of a high GR8 register after copying from a mask register during fast isel. This ends up extracting from bits 15:8 instead of the lower bits of the mask. I'm pretty sure there are more problems lurking here. But I think this fixes PR32241. I've added the test case from that bug and added asserts that will fail if we ever try to copy between high registers and mask registers again. llvm-svn: 297574	2017-03-12 03:37:37 +00:00
Craig Topper	6ab5edfa73	[AVX-512] Remove unused field in X86VectorVTInfo tablegen class. llvm-svn: 297572	2017-03-12 03:37:32 +00:00
Simon Pilgrim	18debfa5b4	[X86][SSE] Improve extraction of elements from v16i8 (pre-SSE41) Without SSE41 (pextrb) we currently extract byte elements from a vector by spilling to stack and reloading the byte. This patch is an initial attempt at using MOVD/PEXTRW to extract the relevant DWORD/WORD from the vector and then shift+truncate to collect the correct byte. Extraction of multiple bytes this way would result in code bloat, but as explained in the patch we could probably afford to be more aggressive with the supported extractions before again falling back on spilling - possibly through counting the number of extracts and which DWORD/WORD they originate? Differential Revision: https://reviews.llvm.org/D29841 llvm-svn: 297568	2017-03-11 20:42:31 +00:00
Simon Pilgrim	9ff5732c92	Remove unnecessary whitespace. llvm-svn: 297567	2017-03-11 20:23:59 +00:00
Craig Topper	02b463270c	[X86] Remove unnecessary commented out code. NFC llvm-svn: 297563	2017-03-11 18:25:56 +00:00
Matt Arsenault	dd905b0e9b	AMDGPU: Remove packf16 intrinsic llvm-svn: 297557	2017-03-11 05:51:16 +00:00
Matt Arsenault	3cb9ff8863	AMDGPU: Keep track of modifiers when converting v_mac to v_mad Since v_max_f32_e64/v_max_f16_e64 can be folded if the target instruction supports the clamp bit, we also need to maintain modifiers when converting v_mac to v_mad. This fixes a rendering issue with Dirt Rally because a v_mac instruction with the clamp bit set was converted to a v_mad but that bit was lost during the conversion. Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit") Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 297556	2017-03-11 05:40:40 +00:00
Stanislav Mekhanoshin	79da2a7698	[AMDGPU] Remove getBidirectionalReasonRank This method inverts the Reason field of a scheduling candidate. It does right comparison between RegCritical and RegExcess, but everything else is broken. In fact it can prefer less strong reason such as Weak over RegCritical because Weak > -RegCritical. The CandReason enum is properly sorted, so just remove artificial ranking. Differential Revision: https://reviews.llvm.org/D30557 llvm-svn: 297536	2017-03-11 00:29:27 +00:00
Krzysztof Parzyszek	0e7b1f83b7	[RDF] Remove the map of reaching defs from copy propagation Use Liveness::getNearestAliasedRef to find the reaching def instead. llvm-svn: 297526	2017-03-10 22:44:24 +00:00
Krzysztof Parzyszek	0b8f184d12	[RDF] Implement Liveness::getNearestAliasedRef(Reg, Inst) This function will find the closest ref node aliased to Reg that is in an instruction preceding Inst. This could be used to identify the hypothetical reaching def of Reg, if Reg was a member of Inst. llvm-svn: 297524	2017-03-10 22:42:17 +00:00
Simon Pilgrim	128a10a41d	[X86][SSE] Fix load folding for (V)CVTDQ2PD This only requires a 64-bit memory source, not the whole 128-bits. But the 128-bit case is still supported via X86InstrInfo::foldMemoryOperandImpl llvm-svn: 297523	2017-03-10 22:35:07 +00:00
Simon Pilgrim	bfe263352a	[X86] Fix Wunused-lambda-capture warning llvm-svn: 297521	2017-03-10 22:10:34 +00:00
Eric Christopher	f025a89b3c	Sink accessing TII to fix release Werror builds. llvm-svn: 297507	2017-03-10 21:20:17 +00:00
Evandro Menezes	8f70e249a7	[AArch64, X86] Additional debug information for MacroFusion In order to make it easier to parse information about the performance of MacroFusion, this patch adds the function and the instruction names to the debug output of this pass. llvm-svn: 297504	2017-03-10 20:20:04 +00:00
Konstantin Zhuravlyov	ffdb00eda9	[AMDGPU] Split R600/SI getFrameIndexReference and emit stack object offsets for SI Differential Revision: https://reviews.llvm.org/D29674 llvm-svn: 297499	2017-03-10 19:39:07 +00:00
Yaxun Liu	874d26a89d	Rename PT_NOTE namespace name used in AMDGPUPTNote.h Patch by Guansong Zhang. Differential Revision: https://reviews.llvm.org/D30750 llvm-svn: 297498	2017-03-10 19:35:43 +00:00
Simon Pilgrim	b02667c469	[APInt] Add APInt::insertBits() method to insert an APInt into a larger APInt We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64). This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target. Differential Revision: https://reviews.llvm.org/D30780 llvm-svn: 297458	2017-03-10 13:44:32 +00:00
Simon Dardis	7090d145e8	[mips][msa] Accept more values for constant splats This patches teaches the MIPS backend to accept more values for constant splats. Previously, only 10 bit signed immediates or values that could be loaded using an ldi.[bhwd] instruction would be acceptted. This patch relaxes that constraint so that any constant value that be splatted is accepted. As a result, the constant pool is used less for vector operations, and the suite of bit manipulation instructions b(clr\|set\|neg)i can now be used with the full range of their immediate operand. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D30640 llvm-svn: 297457	2017-03-10 13:27:14 +00:00
Artyom Skrobov	94fb0bb65f	imm_comp_XFORM (defined in ARMInstrThumb.td) duplicates imm_not_XFORM (defined in ARMInstrInfo.td) Reviewers: grosbach, rengolin, jmolloy Reviewed By: jmolloy Subscribers: aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D30782 llvm-svn: 297456	2017-03-10 13:21:12 +00:00
Artyom Skrobov	0cc80c1f5a	Refactor the multiply-accumulate combines to act on ARMISD::ADD[CE] nodes, instead of the generic ISD::ADD[CE]. Summary: This allows for some simplification because the combines are no longer limited to just one go at the node before it gets legalized into an ARM target-specific one. Reviewers: jmolloy, rogfer01 Subscribers: aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30401 llvm-svn: 297453	2017-03-10 12:41:33 +00:00
Artyom Skrobov	0c93ceb5d8	For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes, same as already done for ARM and Thumb2. Reviewers: jmolloy, rogfer01, efriedma Subscribers: aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30400 llvm-svn: 297443	2017-03-10 07:40:27 +00:00
Dan Gohman	3a74cfec20	[WebAssembly] Fix the opcode numbers for floating-point le and gt. llvm-svn: 297420	2017-03-09 23:08:21 +00:00
Krzysztof Parzyszek	544210304f	[Hexagon] Fixes to the bitsplit generation - Fix the insertion point, which occasionally could have been incorrect. - Avoid creating multiple bitsplits with the same operands, if an old one could be reused. llvm-svn: 297414	2017-03-09 22:02:14 +00:00
Krzysztof Parzyszek	fe267a37f4	[Hexagon] Refactor the DAG preprocessing code, NFC Extract individual transformations into their own functions. llvm-svn: 297401	2017-03-09 19:14:23 +00:00
Krzysztof Parzyszek	7a0981aa38	[Hexagon] Add -mhvx option to the Hexagon backend llvm-svn: 297393	2017-03-09 17:05:11 +00:00
Krzysztof Parzyszek	78c4fcf12e	[Hexagon] Propagate zext of i1 into arithmetic code in selection DAG (op ... (zext i1 c) ...) -> (select c (op ... 1 ...), (op ... 0 ...)) llvm-svn: 297391	2017-03-09 16:29:30 +00:00
Simon Pilgrim	e86b7e2256	[X86][SSE] Speed up constant pool shuffle mask decoding with direct copy (PR32037). If the constants are already the correct size, we can copy them directly into the shuffle mask. llvm-svn: 297381	2017-03-09 14:06:39 +00:00
Simon Dardis	7577ce2140	[mips] Revert fixes for PR32020. The fix introduces segfaults and clobbers the value to be stored when the atomic sequence loops. Revert "[Target/MIPS] Kill dead code, no functional change intended." This reverts commit r296153. Revert "Recommit "[mips] Fix atomic compare and swap at O0."" This reverts commit r296134. llvm-svn: 297380	2017-03-09 14:03:26 +00:00
Sjoerd Meijer	7f1a982d3d	[ARM] remove FIXMEs and add vcmp MC test Minor cleanup in ARMInstrVFP.td: removed some FIXMEs and added a MC test for vcmp that was actually missing. Differential Revision: https://reviews.llvm.org/D30745 llvm-svn: 297376	2017-03-09 13:28:37 +00:00
Simon Dardis	158956c6cc	[mips] Fix return lowering Fix a machine verifier issue where a instruction was using a invalid register. The return pseudo is expanded and has the return address register added to it. The return register may have been spuriously mark as killed earlier. This partially resolves PR/27458 Thanks to Quentin Colombet for reporting the issue! llvm-svn: 297372	2017-03-09 11:19:48 +00:00
Changpeng Fang	1be9b9f816	AMDGPU/SI: Disable unrolling in the loop vectorizer if the loop is not vectorized. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D30719 llvm-svn: 297328	2017-03-09 00:07:00 +00:00
Krzysztof Parzyszek	1b7197e690	[Hexagon] Use correct offset when extracting from the high word When extracting a bitfield from the high register in a register pair, the final offset should be relative to the high register (for 32-bit extracts). llvm-svn: 297288	2017-03-08 15:46:28 +00:00
Daniel Cederman	9db582a656	[Sparc] Check register use with isPhysRegUsed() instead of reg_nodbg_empty() Summary: By using reg_nodbg_empty() to determine if a function can be treated as a leaf function or not, we miss the case when the register pair L0_L1 is used but not L0 by itself. This has the effect that use_all_i32_regs(), a test in reserved-regs.ll which tries to use all registers, gets treated as a leaf function. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: davide, RKSimon, sepavloff, llvm-commits Differential Revision: https://reviews.llvm.org/D27089 llvm-svn: 297285	2017-03-08 15:23:10 +00:00
Simon Pilgrim	836bcc689f	[X86][SSE] combineX86ShufflesRecursively can handle shuffle masks up to 64 elements wide By defining the mask types as SmallVector<int, 16> we were causing a lot of unnecessary heap usage. llvm-svn: 297267	2017-03-08 09:36:39 +00:00
Tim Shen	c7472d912b	Revert "Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size"" After inspection, it's an UB in our code base. Someone cast a var-arg function pointer to a non-var-arg one. :/ Re-commit r296771 to continue testing on the patch. Sorry for the trouble! llvm-svn: 297256	2017-03-08 02:41:35 +00:00
Justin Lebar	1d1cf7ba5d	[NVPTX] Remove unnecessary isImageReadoOnly(), isImageWriteOnly(), & isImageReadWrite calls This is repetition of isImage() function in NVPTXUtilities.cpp. Patch by Briana Grace! Differential Revision: https://reviews.llvm.org/D30706 llvm-svn: 297252	2017-03-08 01:14:15 +00:00
Matt Arsenault	52d1b62a28	AMDGPU: Don't wait at end of block with a trivial successor If there is only one successor, and that successor only has one predecessor the wait can obviously be delayed until uses or the end of the next block. This avoids code quality regressions when there are trivial fallthrough blocks inserted for structurization. llvm-svn: 297251	2017-03-08 01:06:58 +00:00
Matt Arsenault	d8ed207a20	AMDGPU: Constant fold rcp node When doing arcp optimization with a constant denominator, this was leaving behind rcps with constant inputs. llvm-svn: 297248	2017-03-08 00:48:46 +00:00
Changpeng Fang	6b49fa4ca7	AMDGPU/SI: Do not insert EndCf in an unreachable block Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D22025 llvm-svn: 297243	2017-03-07 23:29:36 +00:00
Daniel Sanders	52b4ce727a	Recommit: [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. The problem with the previous commit appears to have been that TableGen was including CodeGen/LowLevelType.h instead of Support/LowLevelTypeImpl.h. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 297241	2017-03-07 23:20:35 +00:00
Krzysztof Parzyszek	434d50a796	[Hexagon] Check for presence before looking registers up in bit tracker llvm-svn: 297240	2017-03-07 23:12:04 +00:00
Krzysztof Parzyszek	8e4d2e0512	[Hexagon] Generate bitsplit instruction llvm-svn: 297239	2017-03-07 23:08:35 +00:00
Artem Belevich	2524a22562	[NVPTX] Fixed lowering of unaligned loads/stores of f16 scalars and vectors. Differential Revision: https://reviews.llvm.org/D30672 llvm-svn: 297198	2017-03-07 20:33:38 +00:00
Joel Jones	2852088126	[AArch64] Vulcan is now ThunderXT99 Broadcom Vulcan is now Cavium ThunderX2T99. LLVM Bugzilla: http://bugs.llvm.org/show_bug.cgi?id=32113 Minor fixes for the alignments of loops and functions for ThunderX T81/T83/T88 (better performance). Patch was tested with SpecCPU2006. Patch by Stefan Teleman Differential Revision: https://reviews.llvm.org/D30510 llvm-svn: 297190	2017-03-07 19:42:40 +00:00
Daniel Sanders	8ebec37d26	Revert r297177: Change LLT constructor string into an LLT-based object ... More module problems. This time it only showed up in the stage 2 compile of clang-x86_64-linux-selfhost-modules-2 but not the stage 1 compile. Somehow, this change causes the build to need Attributes.gen before it's been generated. llvm-svn: 297188	2017-03-07 19:21:23 +00:00
Sanjoy Das	c08a79fbf2	[X86] Add option to specify preferable loop alignment Summary: Loop alignment can cause a significant change of the perfromance for short loops. To be able to evaluate the impact of loop alignment this change introduces the new option x86-experimental-pref-loop-alignment. The alignment will be 2^Value bytes, the default value is 4. Patch by Serguei Katkov! Reviewers: craig.topper Reviewed By: craig.topper Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D30391 llvm-svn: 297178	2017-03-07 18:47:22 +00:00
Daniel Sanders	8612326a08	[globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 297177	2017-03-07 18:32:25 +00:00
John Brawn	eba9fdac7e	[ARM] Correct handling of LSL #0 in an IT block The check for LSL #0 in an IT block was checking if operand 4 was zero, but operand 4 is the condition code operand so it was actually checking for LSLEQ. Fix this by checking operand 3, which really is the immediate operand, and add some tests. Differential Revision: https://reviews.llvm.org/D30692 llvm-svn: 297142	2017-03-07 14:42:03 +00:00
Krzysztof Parzyszek	3cceffb752	[Hexagon] Do not insert instructions before PHI nodes llvm-svn: 297141	2017-03-07 14:20:19 +00:00
Ranjeet Singh	3d0af578cc	[ARM] Reapply r296865 "[ARM] fpscr read/write intrinsics not aware of each other"" The original patch r296865 was reverted as it broke the chromium builds for Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies r296865 with a fix to make sure it doesn't cause the build regression. The problem was that intrinsic selection on int_arm_get_fpscr was failing in ISel this was because the code to manually select this intrinsic still thought it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as it doesn't semantically match the definition in the tablegen code which says it does have side-effects, I've fixed this by updating the intrinsic type to INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on Hans original reproducer. Differential Revision: https://reviews.llvm.org/D30645 llvm-svn: 297137	2017-03-07 11:17:53 +00:00
Jonas Paulsson	1d33cd3988	[SystemZ] Add check VT.isSimple() in canTreateAsByteVector() Since BB-vectorizer can produce vectors of for example 3 elements, this check is needed. Review: Ulrich Weigand llvm-svn: 297136	2017-03-07 09:49:31 +00:00
Artyom Skrobov	1388e2f792	In Thumb1, materialize a move between low registers as a `movs`, if CPSR isn't live. Summary: Previously, it had always been materialized as a push/pop sequence. Reviewers: labrinea, jroelofs Reviewed By: jroelofs Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30648 llvm-svn: 297134	2017-03-07 09:38:16 +00:00
Ayman Musa	850fc977c8	[X86][AVX512] Adding new LLVM TableGen backend which generates the EVEX2VEX compressing tables. X86EvexToVex machine instruction pass compresses EVEX encoded instructions by replacing them with their identical VEX encoded instructions when possible. It uses manually supported 2 large tables that map the EVEX instructions to their VEX ideticals. This TableGen backend replaces the tables by automatically generating them. Differential Revision: https://reviews.llvm.org/D30451 llvm-svn: 297127	2017-03-07 08:11:19 +00:00
Ayman Musa	ac5a2c43af	[X86][AVX512] Add missing entries to EVEX2VEX tables evex2vex pass defines 2 tables which maps EVEX instructions to their VEX identical when possible. Adding all missing entries. Differential Revision: https://reviews.llvm.org/D30501 llvm-svn: 297126	2017-03-07 08:05:53 +00:00
Tim Shen	70054bb827	Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size" This reverts commit r296771. We found some wide spread test failures internally. I'm working on a testcase. Politely revert the patch in the mean time. :) llvm-svn: 297124	2017-03-07 07:40:10 +00:00
Konstantin Zhuravlyov	e8aaab8abe	Revert "AMDGPU: Set MCAsmInfo::PointerSize" It breaks line tables because the patch is not complete, working on a complete one at the moment This reverts commit r294031. llvm-svn: 297118	2017-03-07 04:44:33 +00:00
Tim Northover	c2c545b8f7	GlobalISel: restrict G_EXTRACT instruction to just one operand. A bit more painful than G_INSERT because it was more widely used, but this should simplify the handling of extract operations in most locations. llvm-svn: 297100	2017-03-06 23:50:28 +00:00
Jessica Paquette	596f483a5e	[Outliner] Fixed Asan bot failure in r296418 Fixed the asan bot failure which led to the last commit of the outliner being reverted. The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector is no longer initialized using reserve but just a standard constructor. llvm-svn: 297081	2017-03-06 21:31:18 +00:00
Chad Rosier	9a70c7c02a	[AArch64][Redundant Copy Elim] Add support for CMN and shifted imm. This patch extends the current functionality of the AArch64 redundant copy elimination pass to handle CMN instructions as well as a shifted immediates. Differential Revision: https://reviews.llvm.org/D30576. llvm-svn: 297078	2017-03-06 21:20:00 +00:00
Jan Vesely	3ea1704434	AMDGPU/R600: Fix ALU clause markers use detection also exit early on kill instead of redefinition. Differential Revision: https://reviews.llvm.org/D30230 llvm-svn: 297060	2017-03-06 20:10:05 +00:00
Reid Kleckner	812191584f	[X86] Fix arg copy elision for illegal types Use the store size of the argument type, which will be a byte-sized quantity, rather than dividing the size in bits by 8. Fixes PR32136 and re-enables copy elision from i64 arguments. Reverts the workaround in from r296950. llvm-svn: 297045	2017-03-06 18:39:39 +00:00
Krzysztof Parzyszek	8a4c601abc	[Hexagon] Early-if-convert branches that may exit the loop Merge the tail block into the loop in cases where the main loop body exits early, subject to profitability constraints. This will coalesce the loop body into fewer blocks. For example: loop: loop: // loop body // loop body if (...) jump exit --> // more body more: if (...) jump exit // more body jump loop jump loop llvm-svn: 297033	2017-03-06 17:24:04 +00:00
Krzysztof Parzyszek	e16ce15687	[Hexagon] Mark dead defs as <dead> in expand-condsets The code in updateDeadFlags removed unnecessary <dead> flags, but there can be cases where such a flag is not set, and yet a register has become dead. For example, if a mux with identical inputs is replaced with a COPY, the predicate register may no longer be used after that. llvm-svn: 297032	2017-03-06 17:09:06 +00:00
Krzysztof Parzyszek	143158b72e	[Hexagon] Pick a dot-old instruction that matches the architecture llvm-svn: 297031	2017-03-06 17:03:16 +00:00
Nemanja Ivanovic	12e67d868a	[PowerPC] Fix failure with STBRX when store is narrower than the bswap Fixes a crash caused by r296811 by truncating the input of the STBRX node when the bswap is wider than i32. Fixes https://bugs.llvm.org/show_bug.cgi?id=32140 Differential Revision: https://reviews.llvm.org/D30615 llvm-svn: 297001	2017-03-06 07:32:13 +00:00
Benjamin Kramer	bb635e034c	[X86] Silence GCC enum compare warning. X86ISelLowering.cpp:26506:36: error: enumeral mismatch in conditional expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType' [-Werror=enum-compare] llvm-svn: 296986	2017-03-05 12:53:20 +00:00
Simon Pilgrim	9f5c251d57	[X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT. We're missing a couple of shuffle combines that will be added in a future patch for review. Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases. Differential Revision: https://reviews.llvm.org/D30549 llvm-svn: 296985	2017-03-05 09:57:20 +00:00
Sanjay Patel	b974be5ef4	[x86] don't require a zext when forming ADC/SBB The larger goal is to move the ADC/SBB transforms currently in combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're creating ADC/SBB in lots of places where we shouldn't. This was intended to be an NFC change, but avx-512 has something strange going on. It doesn't seem like any of the affected tests should really be using SET+TEST or ADC; a simple ADD could replace several instructions. But that's another bug... llvm-svn: 296978	2017-03-04 20:35:19 +00:00
Sanjay Patel	066f3208bf	[DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C) select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook. This is part of the ongoing process to obsolete D24480. The motivation is to canonicalize to select IR in InstCombine whenever possible, so we need to have a way to undo that easily in codegen. PowerPC is an obvious winner for this kind of transform because it has fast and complete bit-twiddling abilities but generally lousy conditional execution perf (although this might have changed in recent implementations). x86 also sees some wins, but the effect is limited because these transforms already mostly exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 changes just shows that that code is a mess of special-case holes. We may be able to remove some of that logic now. My guess is that other targets will want to enable this hook for most cases. The likely follow-ups would be to add value type and/or the constants themselves as parameters for the hook. As the tests in select_const.ll show, we can transform any select-of-constants to math/logic, but the general transform for any 2 constants needs one more instruction (multiply or 'and'). ARM is one target that I think may not want this for most cases. I see infinite loops there because it wants to use selects to enable conditionally executed instructions. Differential Revision: https://reviews.llvm.org/D30537 llvm-svn: 296977	2017-03-04 19:18:09 +00:00
Simon Pilgrim	40a0e66b37	[X86][SSE] Enable post-legalize vXi64 shuffle combining on 32-bit targets Long ago (2010 according to svn blame), combineShuffle probably needed to prevent the accidental creation of illegal i64 types but there doesn't appear to be any combines that can cause this any more as they all have their own legality checks. Differential Revision: https://reviews.llvm.org/D30213 llvm-svn: 296966	2017-03-04 12:50:47 +00:00
Matthias Braun	21f340fd25	X86ISelLowering: Only perform copy elision on legal types. This fixes cases where i1 types were not properly legalized yet and lead to the creating of 0-sized stack slots. This fixes http://llvm.org/PR32136 llvm-svn: 296950	2017-03-04 01:40:40 +00:00
Sanjay Patel	a84fd041c6	[x86] check for commuted add pattern to find ADC/SBB llvm-svn: 296933	2017-03-04 00:18:31 +00:00
Tim Northover	3e6a7afd81	GlobalISel: constrain G_INSERT to inserting just one value per instruction. It's much easier to reason about single-value inserts and no-one was actually using the variadic variants before. llvm-svn: 296923	2017-03-03 23:05:47 +00:00
Sanjay Patel	7ee83b41e0	[x86] refactor combineAddOrSubToADCOrSBB(); NFCI The comments were wrong, and this is not an obvious transform. This hopefully makes it clearer that we're missing the commuted patterns for adds. It's less clear that this is actually a good transform for all micro-arch. This is prep work for trying to clean up the current adc/sbb codegen because it's definitely not happening optimally. llvm-svn: 296918	2017-03-03 22:35:11 +00:00
Krzysztof Parzyszek	cc31871dc4	Make TargetInstrInfo::isPredicable take a const reference, NFC llvm-svn: 296901	2017-03-03 18:30:54 +00:00
Sanjay Patel	58e241896d	[x86] clean up materializeSBB(); NFCI This is producing SBB where it is obviously not necessary, so it needs to be limited. llvm-svn: 296894	2017-03-03 17:58:39 +00:00
Sanjay Patel	e8674825fe	[x86] fix formatting; NFC llvm-svn: 296875	2017-03-03 15:17:41 +00:00
Simon Pilgrim	c37a32d2b9	Use APInt::getHighBitsSet instead of APInt::getBitsSet for upper bit mask creation llvm-svn: 296874	2017-03-03 14:37:57 +00:00
Dmitry Preobrazhensky	03880f8d24	[AMDGPU][MC] Fix for Bug 30829 + LIT tests Added code to check constant bus restrictions for VOP formats (only one SGPR value or literal-constant may be used by the instruction). Note that the same checks are performed by SIInstrInfo::verifyInstruction (used by lowering code). Added LIT tests. llvm-svn: 296873	2017-03-03 14:31:06 +00:00
Chandler Carruth	ce52b80744	[SDAG] Revert r296476 (and r296486, r296668, r296690). This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and several test cases slow down by 2x. I'm sending some of the test cases directly to Nirav and following up with more details in the review log, but this should unblock anyone else hitting this. llvm-svn: 296862	2017-03-03 10:02:25 +00:00
Amjad Aboud	4f97751798	[X86] Generate VZEROUPPER for Skylake-avx512. VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859	2017-03-03 09:03:24 +00:00
Sjoerd Meijer	69bccf96bd	[AArch64AsmParser] rewrite of function parseSysAlias This is a cleanup/rewrite of the parseSysAlias function. It was not using the tablegen instruction descriptions, but was “manually” matching the mnemonics and recreating the operands whereas all this information is already in tablegen; all this code has been replaced with calls to lookupXYZByName tablegen calls. Differential Revision: https://reviews.llvm.org/D30491 llvm-svn: 296857	2017-03-03 08:12:47 +00:00
Igor Breger	321cf3c650	[GlobalISel][X86] Support float/double and vector types. Summary: [GlobalISel][X86] Add support for f32/f64 and vector types in RegisterBank and InstructionSelector. Reviewers: delena, zvi Reviewed By: zvi Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30533 llvm-svn: 296856	2017-03-03 08:06:46 +00:00
Matt Arsenault	31a58c6ac0	AMDGPU: Fix missing dominator tree dependency llvm-svn: 296842	2017-03-02 23:50:51 +00:00
Krzysztof Parzyszek	e720feb1c6	[Hexagon] Pick the right branch opcode depending on branch probabilities Specifically, pick the opcode with the correct branch prediction, i.e. jump:t or jump:nt. llvm-svn: 296821	2017-03-02 21:49:49 +00:00
Eli Friedman	bb821276d0	[ARM] Fix insert point for store rescheduling. In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last operation which we want to merge. If we break out of the loop because an operation has the wrong offset, we shouldn't use that operation as LastOp. This patch fixes some cases where we would move stores to the wrong insert point. Re-commit with a fix to increment NumMove in the right place. Differential Revision: https://reviews.llvm.org/D30124 llvm-svn: 296815	2017-03-02 21:39:39 +00:00
Guozhi Wei	ed28e742ee	[PPC] Fix code generation for bswap(int32) followed by store16 This patch fixes pr32063. Current code in PPCTargetLowering::PerformDAGCombine can transform bswap store into a single PPCISD::STBRX instruction. but it doesn't consider the case that the operand size of bswap may be larger than store size. When it occurs, we need 2 modifications, 1 For the last operand of PPCISD::STBRX, we should not use DAG.getValueType(N->getOperand(1).getValueType()), instead we should use cast<StoreSDNode>(N)->getMemoryVT(). 2 Before PPCISD::STBRX, we need to shift the original operand of bswap to the right side. Differential Revision: https://reviews.llvm.org/D30362 llvm-svn: 296811	2017-03-02 21:07:59 +00:00
Chad Rosier	ea25eca04a	[AArch64] Extend redundant copy elimination pass to handle non-zero stores. This patch extends the current functionality of the AArch64 redundant copy elimination pass to handle non-zero cases such as: BB#0: cmp x0, #1 b.eq .LBB0_1 .LBB0_1: orr x0, xzr, #0x1 ; <-- redundant copy; x0 known to hold #1. Differential Revision: https://reviews.llvm.org/D29344 llvm-svn: 296809	2017-03-02 20:48:11 +00:00
Vadzim Dambrouski	eafb805506	[MSP430] Add SRet support to MSP430 target This patch adds support for struct return values to the MSP430 target backend. It also reverses the order of argument and return registers in the calling convention to bring it into closer alignment with the published EABI from TI. Patch by Andrew Wygle (awygle). Differential Revision: https://reviews.llvm.org/D29069 llvm-svn: 296807	2017-03-02 20:25:10 +00:00
Artem Belevich	ee7dd12ff4	[NVPTX] Reduce amount of boilerplate code used to select load instruction opcode. Make opcode selection code for the load instruction a bit easier to read and maintain. This patch also catches number of f16 load/store variants that were not handled before. Differential Revision: https://reviews.llvm.org/D30513 llvm-svn: 296785	2017-03-02 19:14:14 +00:00
Artem Belevich	5920babc4f	[NVPTX] Added missing LDU/LDG intrinsics for f16. Differential Revision: https://reviews.llvm.org/D30512 llvm-svn: 296784	2017-03-02 19:14:10 +00:00
Simon Pilgrim	b3067dc374	[X86][MMX] Fixed i32 extraction on 32-bit targets MMX extraction often ends up as extract_i32(bitcast_v2i32(extract_i64(bitcast_v1i64(x86mmx v), 0)), 0) which fails to simplify on 32-bit targets as i64 isn't legal llvm-svn: 296782	2017-03-02 18:56:06 +00:00
Krzysztof Parzyszek	056c945a5d	[Hexagon] Skip blocks that define vector predicate registers in early-if llvm-svn: 296777	2017-03-02 18:10:59 +00:00
Krzysztof Parzyszek	fcbb7d10fe	[Hexagon] Properly handle 'q' constraint in 128-byte vector mode llvm-svn: 296772	2017-03-02 17:50:24 +00:00
Nemanja Ivanovic	db8425eff0	[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size This patch reduces the stack frame size by not allocating the parameter area if it is not required. In the current implementation LowerFormalArguments_64SVR4 already handles the parameter area, but LowerCall_64SVR4 does not (when calculating the stack frame size). What this patch does is make LowerCall_64SVR4 consistent with LowerFormalArguments_64SVR4. Committing on behalf of Hiroshi Inoue. Differential Revision: https://reviews.llvm.org/D29881 llvm-svn: 296771	2017-03-02 17:38:59 +00:00
Tim Northover	e80d6d1360	GlobalISel: record correct stack usage for signext parameters. The CallingConv.td rules allocate 8 bytes for these kinds of arguments on AAPCS targets, but we were only recording the smaller amount. The difference is theoretical on AArch64 because we don't actually store more than the smaller amount, but it's still much better to have these two components in agreement. Based on Diana Picus's ARM equivalent patch (where it matters a lot more). llvm-svn: 296754	2017-03-02 15:34:18 +00:00
Matthew Simpson	aee9771ae2	[ARM/AArch64] Update costs for interleaved accesses with wide types After r296750, we're able to match interleaved accesses having types wider than 128 bits. This patch updates the associated TTI costs. Differential Revision: https://reviews.llvm.org/D29675 llvm-svn: 296751	2017-03-02 15:15:35 +00:00
Matthew Simpson	1bfa159db9	[ARM/AArch64] Support wide interleaved accesses This patch teaches (ARM\|AArch64)ISelLowering.cpp to match illegal vector types to interleaved access intrinsics as long as the types are multiples of the vector register width. A "wide" access will now be mapped to multiple interleave intrinsics similar to the way in which non-interleaved accesses with illegal types are legalized into multiple accesses. I'll update the associated TTI costs (in getInterleavedMemoryOpCost) as a follow-on. Differential Revision: https://reviews.llvm.org/D29466 llvm-svn: 296750	2017-03-02 15:11:20 +00:00
Eli Friedman	933863ce61	Revert r296708; causing test failures on ARM hosts. Original commit message: [ARM] Fix insert point for store rescheduling. In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last operation which we want to merge. If we break out of the loop because an operation has the wrong offset, we shouldn't use that operation as LastOp. This patch fixes some cases where we would sink stores for no reason. llvm-svn: 296718	2017-03-02 00:08:50 +00:00
Ahmed Bougacha	120ae22d70	[GlobalISel] Add a way for targets to enable GISel. Until now, we've had to use -global-isel to enable GISel. But using that on other targets that don't support it will result in an abort, as we can't build a full pipeline. Additionally, we want to experiment with enabling GISel by default for some targets: we can't just enable GISel by default, even among those target that do have some support, because the level of support varies. This first step adds an override for the target to explicitly define its level of support. For AArch64, do that using a new command-line option (I know..): -aarch64-enable-global-isel-at-O=<N> Where N is the opt-level below which GISel should be used. Default that to -1, so that we still don't enable GISel anywhere. We're not there yet! While there, remove a couple LLVM_UNLIKELYs. Building the pipeline is such a cold path that in practice that shouldn't matter at all. llvm-svn: 296710	2017-03-01 23:33:08 +00:00
Eli Friedman	1c9216b003	[ARM] Fix insert point for store rescheduling. In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last operation which we want to merge. If we break out of the loop because an operation has the wrong offset, we shouldn't use that operation as LastOp. This patch fixes some cases where we would sink stores for no reason. Differential Revision: https://reviews.llvm.org/D30124 llvm-svn: 296708	2017-03-01 23:20:29 +00:00
Eli Friedman	28c2c0e311	[ARM] Check correct instructions for load/store rescheduling. This code starts from the high end of the sorted vector of offsets, and works backwards: it tries to find contiguous offsets, process them, then pops them from the end of the vector. Most of the code agrees with this order of processing, but one loop doesn't: it instead processes elements from the low end of the vector (which are nodes with unrelated offsets). Fix that loop to process the correct elements. This has a few implications. One, we don't incorrectly return early when processing multiple groups of offsets in the same block (which allows rescheduling prera-ldst-insertpt.mir). Two, we pick the correct insert point for loads, so they're correctly sorted (which affects the scheduling of vldm-liveness.ll). I think it might also impact some of the heuristics slightly. Differential Revision: https://reviews.llvm.org/D30368 llvm-svn: 296701	2017-03-01 22:56:20 +00:00
Reid Kleckner	f7c0980c10	Elide argument copies during instruction selection Summary: Avoids tons of prologue boilerplate when arguments are passed in memory and left in memory. This can happen in a debug build or in a release build when an argument alloca is escaped. This will dramatically affect the code size of x86 debug builds, because X86 fast isel doesn't handle arguments passed in memory at all. It only handles the x86_64 case of up to 6 basic register parameters. This is implemented by analyzing the entry block before ISel to identify copy elision candidates. A copy elision candidate is an argument that is used to fully initialize an alloca before any other possibly escaping uses of that alloca. If an argument is a copy elision candidate, we set a flag on the InputArg. If the the target generates loads from a fixed stack object that matches the size and alignment requirements of the alloca, the SelectionDAG builder will delete the stack object created for the alloca and replace it with the fixed stack object. The load is left behind to satisfy any remaining uses of the argument value. The store is now dead and is therefore elided. The fixed stack object is also marked as mutable, as it may now be modified by the user, and it would be invalid to rematerialize the initial load from it. Supersedes D28388 Fixes PR26328 Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D29668 llvm-svn: 296683	2017-03-01 21:42:00 +00:00
Krzysztof Parzyszek	8144f37dd8	[RDF] Replace {} with explicit constructor, since not all compilers like it llvm-svn: 296666	2017-03-01 19:59:28 +00:00
Krzysztof Parzyszek	ebabd99adb	[RDF] Add recursion limit to getAllReachingDefsRec For large programs this function can take significant amounts of time. Let it abort gracefully when the program is too complex. llvm-svn: 296662	2017-03-01 19:30:42 +00:00
Krzysztof Parzyszek	8f23dd6d68	[Hexagon] Fix lowering of formal arguments of type i1 On Hexagon, values of type i1 are passed in registers of type i32, even though i1 is not a legal value for these registers. This is a special case and needs special handling to maintain consistency of the lowering information. This fixes PR32089. llvm-svn: 296645	2017-03-01 17:30:10 +00:00
Diana Picus	3841522259	clang-format r296631 Apparently I forgot to run it after fixing up some things... llvm-svn: 296634	2017-03-01 15:54:21 +00:00
Diana Picus	9c52309b37	[ARM] GlobalISel: Lower call params that need extensions Lower i1, i8 and i16 call parameters by extending them before storing them on the stack. Also make sure we encode the correct, extended size in the corresponding memory operand, and that we compute the correct stack size in the end. The latter is a bit more complicated because we used to compute the stack size in the getStackAddress method, based on the Size and Offset of the parameters. However, if the last parameter is sign extended, we'd be using the wrong, non-extended size, and we'd end up with a smaller stack than we need to hold the extended value. Instead of hacking this up based on the value of Size in getStackAddress, we move our stack size handling logic to assignArg, where we have access to the CCState which knows everything we could possibly want to know about the stack. This way we don't need to duplicate any knowledge or resort to any ugly hacks. On this same occasion, update the IRTranslator test to check the sizes of the stores everywhere, not just for sign extended paramteres. llvm-svn: 296631	2017-03-01 15:35:14 +00:00
Oliver Stannard	5d35b9e56c	[ARM] Fix parsing of special register masks This parsing code was incorrectly checking for invalid characters, so an invalid instruction like: msr spsr_w, r0 would be emitted as: msr spsr_cxsf, r0 Differential revision: https://reviews.llvm.org/D30462 llvm-svn: 296607	2017-03-01 10:51:04 +00:00
Ayman Musa	9b802e4650	[X86] Fix creating vreg def after use. llvm-svn: 296601	2017-03-01 10:20:48 +00:00
Dan Gohman	7d7409e553	[WebAssembly] Convert the remaining unit tests to the new wasm-object-file target. To facilitate this, add a new hidden command-line option to disable the explicit-locals pass. That causes llc to emit invalid code that doesn't have all locals converted to get_local/set_local, however it simplifies testwriting in many cases. llvm-svn: 296540	2017-02-28 23:37:04 +00:00
Eli Friedman	36795239f5	[ARM] Don't generate deprecated T1 STM. This prevents generating stm r1!, {r0, r1} on Thumb1, where value stored for r1 is UNKONWN. Patch by Zhaoshi Zheng. Differential Revision: https://reviews.llvm.org/D27910 llvm-svn: 296538	2017-02-28 23:32:55 +00:00
Krzysztof Parzyszek	33fd0bbbe8	[Hexagon] Generate extract instructions more aggressively llvm-svn: 296537	2017-02-28 23:27:33 +00:00
Krzysztof Parzyszek	f208681731	[Hexagon] Fix instruction selection for sign-extending i1 to i64 llvm-svn: 296532	2017-02-28 22:37:01 +00:00
Matt Arsenault	8f016df1ed	AMDGPU: Fix types for VOP_I16_I16_I16 llvm-svn: 296523	2017-02-28 21:31:45 +00:00
Matt Arsenault	4d263f6f18	AMDGPU: Add definition for v_swap_b32 This is somewhat tricky because there are two pairs of tied operands, and it isn't allowed to be VOP3 encoded. llvm-svn: 296519	2017-02-28 21:09:04 +00:00
Matt Arsenault	03612631cb	AMDGPU: Add definition for v_xad_u32 llvm-svn: 296515	2017-02-28 20:27:30 +00:00
Matt Arsenault	781249833b	AMDGPU: Add ds_nop to assembler llvm-svn: 296513	2017-02-28 20:15:46 +00:00
Matt Arsenault	dedc544ac7	AMDGPU: Add definitions for ds_{read\|write}_b{96\|128} It's not clear to me if this is always better than doing ds_write2_b64 This adds the constraint of a 128-bit register input instead of a pair of 64-bit. llvm-svn: 296512	2017-02-28 20:15:43 +00:00
Stanislav Mekhanoshin	357d3db0a4	[AMDGPU] Add second pass of the scheduler If during scheduling we have identified that we cannot keep optimistic occupancy increase critical register pressure limit and try scheduling of the whole function again. In this case blocks with smaller pressure will have a chance for better scheduling. Differential Revision: https://reviews.llvm.org/D30442 llvm-svn: 296506	2017-02-28 19:20:33 +00:00
Stanislav Mekhanoshin	282e8e4a72	[AMDGPU] New method to estimate register pressure This change introduces new method to estimate register pressure in GCNScheduler. Standard RPTracker gives huge error due to the following reasons: 1. It does not account for live-ins or live-outs if value is not used in the region itself. That creates a huge error in a very common case if there are a lot of live-thu registers. 2. It does not properly count subregs. 3. It assumes a register used as an input operand can be reused as an output. This is not always possible by itself, this is not what RA will finally do in many cases for various reasons not limited to RA's inability to do so, and this is not so if the value is actually a live-thu. In addition we can now see clear separation between live-in pressure which we cannot change with the scheduling and tentative pressure which we can change. Differential Revision: https://reviews.llvm.org/D30439 llvm-svn: 296491	2017-02-28 17:22:39 +00:00
Konstantin Zhuravlyov	182e9cc6d5	[AMDGPU] Change amd_kernel_code_t's minor version to 1 - We do emit amd_kernel_code_t v1.1 Differential Revision: https://reviews.llvm.org/D30433 llvm-svn: 296489	2017-02-28 17:17:52 +00:00
Stanislav Mekhanoshin	080889cad7	[AMDGPU] Fix read-undef flags when schedule is reverted If two subregs of the same register are defined and we need to revert schedule changing def order, we will end up with both instructions having def,read-undef flags because adjustLaneLiveness() will only set this flag but will not remove it. Fix this by removing read-undef flags before calling adjustLaneLiveness. Differential Revision: https://reviews.llvm.org/D30428 llvm-svn: 296484	2017-02-28 16:26:27 +00:00
Simon Dardis	e3cceed3b4	[mips] Fix 64bit slt/sltu/nor with immediates Patch By: Alexander Richardson Reviewers: atanasyan, theraven, sdardis Differential Revision: https://reviews.llvm.org/D30330 llvm-svn: 296482	2017-02-28 15:55:23 +00:00
Daniel Sanders	983c9b98e9	Revert r296474 - [globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it. There's a circular dependency that's only revealed when LLVM_ENABLE_MODULES=1. llvm-svn: 296478	2017-02-28 15:00:27 +00:00
Nirav Dave	f830dec3f2	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296476	2017-02-28 14:24:15 +00:00
Daniel Sanders	a5afdefec6	[globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 296474	2017-02-28 14:21:31 +00:00
Diana Picus	1ffca2aeaf	[ARM] GlobalISel: Lower i32 and fp call parameters on the stack Lower i32, float and double parameters that need to live on the stack. This boils down to creating some G_GEPs starting from the stack pointer and storing the values there. During the process we also keep track of the stack size and use the final value in the ADJCALLSTACKDOWN/UP instructions. We currently assert for smaller types, since they usually require extensions. They will be handled in a separate patch. llvm-svn: 296473	2017-02-28 14:17:53 +00:00
Diana Picus	5a7203a0af	[ARM] GlobalISel: Select 32-bit G_CONSTANT Put it into a register by means of a MOVi. llvm-svn: 296471	2017-02-28 13:05:42 +00:00
Diana Picus	5b8514559e	[ARM] GlobalISel: Add mapping for G_CONSTANT Like G_FRAME_INDEX, G_CONSTANT has one register operand and one non-register operand. llvm-svn: 296469	2017-02-28 12:13:58 +00:00
Diana Picus	e6beac6742	[ARM] GlobalISel: Legalize 32-bit constants llvm-svn: 296468	2017-02-28 11:33:46 +00:00
Diana Picus	9d07094913	[ARM] GlobalISel: Select G_GEP At this point, G_GEP is just an add, so we treat it exactly like a G_ADD. llvm-svn: 296462	2017-02-28 10:14:38 +00:00
Oliver Stannard	85d4d5b493	[ARM] Diagnose PC-writing instructions in IT blocks In Thumb2, instructions which write to the PC are UNPREDICTABLE if they are in an IT block but not the last instruction in the block. Previously, we only diagnosed this for LDM instructions, this patch extends the diagnostic to cover all of the relevant instructions. Differential Revision: https://reviews.llvm.org/D30398 llvm-svn: 296459	2017-02-28 10:04:36 +00:00
Diana Picus	566a15d749	[ARM] GlobalISel: Add reg bank mapping for G_GEP This should be the same as the mapping for G_ADD etc. llvm-svn: 296455	2017-02-28 09:35:10 +00:00
Diana Picus	8598b17076	[ARM] GlobalISel: Legalize G_GEP with 32-bit offsets At the moment we're only interested in GEPs for putting call parameters on the stack, so we'll stick to 32-bit offsets. llvm-svn: 296452	2017-02-28 09:02:42 +00:00
Vadzim Dambrouski	cc33fc8722	Test commit, fix typo, NFC. llvm-svn: 296447	2017-02-28 08:27:43 +00:00
Matthias Braun	81f68ec3a9	Revert "Add MIR-level outlining pass" Revert Machine Outliner for now, as it breaks the asan bot. This reverts commit r296418. llvm-svn: 296426	2017-02-28 02:24:30 +00:00
Matthias Braun	d36410945f	Add MIR-level outlining pass This is a patch for the outliner described in the RFC at: http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html The outliner is a code-size reduction pass which works by finding repeated sequences of instructions in a program, and replacing them with calls to functions. This is useful to people working in low-memory environments, where sacrificing performance for space is acceptable. This adds an interprocedural outliner directly before printing assembly. For reference on how this would work, this patch also includes X86 target hooks and an X86 test. The outliner is run like so: clang -mno-red-zone -mllvm -enable-machine-outliner file.c Patch by Jessica Paquette<jpaquette@apple.com>! rdar://29166825 Differential Revision: https://reviews.llvm.org/D26872 llvm-svn: 296418	2017-02-28 00:33:32 +00:00
Dan Gohman	d37dc2f773	[WebAssembly] Add some comments and tidy up whitespace. llvm-svn: 296402	2017-02-27 22:41:39 +00:00
Matt Arsenault	10268f93e8	AMDGPU: Use v_med3_{f16\|i16\|u16} llvm-svn: 296401	2017-02-27 22:40:39 +00:00
Dan Gohman	f52ee17a09	[WebAssembly] Split CFG-sorting into its own pass. NFC. CFG sorting was already an independent algorithm from block/loop insertion; this change makes it more convenient to debug. llvm-svn: 296399	2017-02-27 22:38:58 +00:00
Matt Arsenault	eb522e68bc	AMDGPU: Support v2i16/v2f16 packed operations llvm-svn: 296396	2017-02-27 22:15:25 +00:00
Sanjay Patel	ae7873fe55	[ARM] don't transform an add(ext Cond), C to select unless there's a setcc of the condition The transform in question claims to be doing: // fold (add (select cc, 0, c), x) -> (select cc, x, (add, x, c)) ...starting in PerformADDCombineWithOperands(), but it wasn't actually checking for a setcc node for the sext/zext patterns. This is exactly the opposite of a transform I'd like to add to DAGCombiner's foldSelectOfConstants(), so I was seeing infinite loops with my draft of a patch applied. The changes in select_const.ll look positive (less instructions). The change in arm-and-tst-peephole.ll is unrelated. We're changing the input IR in that test to preserve the intent of the test, but that's not affected by this code change. Differential Revision: https://reviews.llvm.org/D30355 llvm-svn: 296389	2017-02-27 21:30:54 +00:00
Matt Arsenault	c9f2517e96	AMDGPU: Add some of the new gfx9 VOP3 instructions llvm-svn: 296382	2017-02-27 21:04:41 +00:00
Simon Pilgrim	5c4efcdddf	[X86][SSE] Attempt to extract vector elements through target shuffles DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered. This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it. Differential Revision: https://reviews.llvm.org/D30176 llvm-svn: 296381	2017-02-27 21:01:57 +00:00
Matt Arsenault	7596f13d15	AMDGPU: Support inlineasm for packed instructions Add packed types as legal so they may be used with inlineasm. Keep all operations expanded for now. llvm-svn: 296379	2017-02-27 20:52:10 +00:00
Matt Arsenault	2ed2193218	AMDGPU: Don't fold immediate if clamp/omod are set Doesn't fix any practical problems because clamp/omod are currently folded after peephole optimizer. llvm-svn: 296375	2017-02-27 20:21:31 +00:00
Matt Arsenault	3cb390498e	AMDGPU: Fold omod into instructions llvm-svn: 296372	2017-02-27 19:35:42 +00:00
Matt Arsenault	e2d1d3a940	AMDGPU: Add f16 to shader calling conventions Mostly useful for writing tests for f16 features. llvm-svn: 296370	2017-02-27 19:24:47 +00:00
Matt Arsenault	9be7b0d485	AMDGPU: Add VOP3P instruction format Add a few non-VOP3P but instructions related to packed. Includes hack with dummy operands for the benefit of the assembler llvm-svn: 296368	2017-02-27 18:49:11 +00:00
Krzysztof Parzyszek	e9be35596e	[Hexagon] Defs and clobbers can overlap llvm-svn: 296365	2017-02-27 18:03:35 +00:00
Craig Topper	7502119ce8	[X86] Use APInt instead of SmallBitVector tracking undef elements from getTargetConstantBitsFromNode and getConstVector. Summary: SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc. APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30392 llvm-svn: 296355	2017-02-27 16:15:32 +00:00
Craig Topper	3917ca2af4	[X86] Use APInt instead of SmallBitVector for tracking Zeroable elements in shuffle lowering Summary: SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc. APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30390 llvm-svn: 296354	2017-02-27 16:15:30 +00:00
Craig Topper	e1be95c3d0	[X86] Fix SmallVector sizes in constant pool shuffle decoding to avoid heap allocation Some of the vectors are under sized to avoid heap allocation. In one case the vector was oversized. Differential Revision: https://reviews.llvm.org/D30387 llvm-svn: 296353	2017-02-27 16:15:27 +00:00
Craig Topper	53e5a38da9	[X86] Use APInt instead of SmallBitVector for tracking undef elements in constant pool shuffle decoding Summary: SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc. APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. This will incur a minor increase in stack usage due to APInt storing the bit count separately from the data bits unlike SmallBitVector, but that should be ok. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30386 llvm-svn: 296352	2017-02-27 16:15:25 +00:00
Sjoerd Meijer	32ecac7ac8	AArch64InstPrinter: rewrite of printSysAlias This is a cleanup/rewrite of the printSysAlias function. This was not using the tablegen instruction descriptions, but was "manually" decoding the instructions. This has been replaced with calls to lookup_XYZ_ByEncoding tablegen calls. This revealed several problems. First, instruction IVAU had the wrong encoding. This was cancelled out by the parser that incorrectly matched the wrong encoding. Second, instruction CVAP was missing from the SystemOperands tablegen descriptions, so this has been added. And third, the required target features were not captured in the tablegen descriptions, so support for this has also been added. Differential Revision: https://reviews.llvm.org/D30329 llvm-svn: 296343	2017-02-27 14:45:34 +00:00
John Brawn	c97b714ffb	[ARM] LSL #0 is an alias of MOV Currently we handle this correctly in arm, but in thumb we don't which leads to an unpredictable instruction being emitted for LSL #0 in an IT block and SP not being permitted in some cases when it should be. For the thumb2 LSL we can handle this by making LSL #0 an alias of MOV in the .td file, but for thumb1 we need to handle it in checkTargetMatchPredicate to get the IT handling right. We also need to adjust the handling of MOV rd, rn, LSL #0 to avoid generating the 16-bit encoding in an IT block. We should also adjust it to allow SP in the same way that it is allowed in MOV rd, rn, but I haven't done that here because it looks like it would take quite a lot of work to get right. Additionally correct the selection of the 16-bit shift instructions in processInstruction, where it was checking if the two registers were equal when it should have been checking if they were low. It appears that previously this code was never executed and the 16-bit encoding was selected by default, but the other changes I've done here have somehow made it start being used. Differential Revision: https://reviews.llvm.org/D30294 llvm-svn: 296342	2017-02-27 14:40:51 +00:00
Sjoerd Meijer	6d171006f4	AArch64AsmParser: don't try to parse “[1]” for non-vector register operands There are no instructions that have "[1]" as part of the assembly string; FMOVXDhighr is out of date. This removes dead code. Differential Revision: https://reviews.llvm.org/D30165 llvm-svn: 296327	2017-02-27 10:51:11 +00:00
Konstantin Zhuravlyov	972948b36e	[AMDGPU] Runtime metadata fixes: - Verify that runtime metadata is actually valid runtime metadata when assembling, otherwise we could accept the following when assembling, but ocl runtime will reject it: .amdgpu_runtime_metadata { amd.MDVersion: [ 2, 1 ], amd.RandomUnknownKey, amd.IsaInfo: ... - Make IsaInfo optional, and always emit it. Differential Revision: https://reviews.llvm.org/D30349 llvm-svn: 296324	2017-02-27 07:55:17 +00:00
Craig Topper	ed0101a0b9	[X86] Check for less than 0 rather than explicit compare with -1. NFC llvm-svn: 296321	2017-02-27 06:05:30 +00:00
Craig Topper	6028584d8c	[X86] Fix execution domain for cmpss/sd instructions. llvm-svn: 296293	2017-02-26 06:45:59 +00:00
Craig Topper	036693302b	[AVX-512] Fix execution domain for scalar commutable min/max instructions. llvm-svn: 296292	2017-02-26 06:45:56 +00:00
Craig Topper	e70231be51	[AVX-512] Fix execution domain for vmovhpd/lpd/hps/lps. llvm-svn: 296291	2017-02-26 06:45:54 +00:00
Craig Topper	fe25988c68	[AVX-512] Fix the execution domain for AVX-512 integer broadcasts. llvm-svn: 296290	2017-02-26 06:45:51 +00:00
Craig Topper	49ba3f5406	[AVX-512] Disable the redundant patterns in the VPBROADCASTBr_Alt and VPBROADCASTWr_Alt instructions. NFC llvm-svn: 296289	2017-02-26 06:45:48 +00:00
Craig Topper	6bf9b809ce	[AVX-512] Fix execution domain for VPMADD52 instructions. llvm-svn: 296288	2017-02-26 06:45:45 +00:00
Craig Topper	aa8e903150	[AVX-512] Fix the execution domain for VSCALEF instructions. llvm-svn: 296286	2017-02-26 06:45:40 +00:00
Craig Topper	cac5d698df	[AVX-512] Fix execution domain of scalar VRANGE/REDUCE/GETMANT with sae. llvm-svn: 296285	2017-02-26 06:45:37 +00:00
Craig Topper	ed64904c74	[X86] Fix the execution domain for scalar SQRT intrinsic instruction. llvm-svn: 296284	2017-02-26 06:45:35 +00:00
Nirav Dave	73cd0194cf	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r296252 until 256-bit operations are more efficiently generated in X86. llvm-svn: 296279	2017-02-26 01:27:32 +00:00
Eric Christopher	4a8208c266	vec perm can go down either pipeline on P8. No observable changes, spotted while looking at the scheduling description. llvm-svn: 296277	2017-02-26 00:11:58 +00:00
Simon Pilgrim	0f5fb5f549	[APInt] Add APInt::extractBits() method to extract APInt subrange (reapplied) The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296272	2017-02-25 20:01:58 +00:00
Craig Topper	2caa97c891	[AVX-512] Fix the execution domain for scalar FMA instructions. llvm-svn: 296271	2017-02-25 19:36:28 +00:00
Craig Topper	176f3310b6	[AVX-512] Fix the execution domain on some instructions. llvm-svn: 296270	2017-02-25 19:18:11 +00:00
Craig Topper	d2011e3612	[AVX-512] Remove unnecessary masked versions of VCVTSS2SD and VCVTSD2SS using the scalar register class. We only have patterns for the masked intrinsics. llvm-svn: 296264	2017-02-25 18:43:42 +00:00
Nirav Dave	beabf456df	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296252	2017-02-25 11:43:58 +00:00
Junmo Park	7ff4c045eb	Minor code cleanup. NFC. llvm-svn: 296207	2017-02-25 00:08:53 +00:00
Dan Gohman	82607f56bd	[WebAssembly] Add support for using a wasm global for the stack pointer. This replaces the __stack_pointer variable which was allocated in linear memory. llvm-svn: 296201	2017-02-24 23:46:05 +00:00
Krzysztof Parzyszek	0d67b10a3c	[Hexagon] Undo shift folding where it could simplify addressing mode For example, avoid (single shift): r0 = and(##536870908,lsr(r0,#3)) r0 = memw(r1+r0<<#0) in favor of (two shifts): r0 = lsr(r0,#5) r0 = memw(r1+r0<<#2) llvm-svn: 296196	2017-02-24 23:34:24 +00:00
Dan Gohman	d934cb8806	[WebAssembly] Basic support for Wasm object file encoding. With the "wasm32-unknown-unknown-wasm" triple, this allows writing out simple wasm object files, and is another step in a larger series toward migrating from ELF to general wasm object support. Note that this code and the binary format itself is still experimental. llvm-svn: 296190	2017-02-24 23:18:00 +00:00
Krzysztof Parzyszek	be5028aed3	[Hexagon] Prettify code in HexagonDAGToDAGISel::Select llvm-svn: 296187	2017-02-24 23:00:40 +00:00
Wei Ding	4d3d4ca1b3	AMDGPU : Replace FMAD with FMA when denormals are enabled. Differential Revision: http://reviews.llvm.org/D29958 llvm-svn: 296186	2017-02-24 23:00:29 +00:00
Stanislav Mekhanoshin	42259cf35e	Revert "Correct register pressure calculation in presence of subregs" This reverts commit r296009. It broke one out of tree target and also does not account for all partial lines added or removed when calculating PressureDiff. llvm-svn: 296182	2017-02-24 21:56:16 +00:00
Dan Gohman	6999c4fd28	[WebAssembly] Handle f16 in fast-isel. llvm-svn: 296172	2017-02-24 21:05:35 +00:00
Davide Italiano	74f27b80d4	[Target/MIPS] Kill dead code, no functional change intended. Hopefully placates gcc with -Werror. llvm-svn: 296153	2017-02-24 18:48:10 +00:00
Simon Pilgrim	cdf2bd656a	Revert: r296141 [APInt] Add APInt::extractBits() method to extract APInt subrange The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296147	2017-02-24 18:31:04 +00:00
Nemanja Ivanovic	195c5452d3	[PowerPC] Use subfic instruction for subtract from immediate Provide a 64-bit pattern to use SUBFIC for subtracting from a 16-bit immediate. The corresponding pattern already exists for 32-bit integers. Committing on behalf of Hiroshi Inoue. Differential Revision: https://reviews.llvm.org/D29387 llvm-svn: 296144	2017-02-24 18:16:06 +00:00
Nemanja Ivanovic	82d53ed492	[PowerPC] Use rldicr instruction for AND with an immediate if possible Emit clrrdi (extended mnemonic for rldicr) for AND-ing with masks that clear bits from the right hand size. Committing on behalf of Hiroshi Inoue. Differential Revision: https://reviews.llvm.org/D29388 llvm-svn: 296143	2017-02-24 18:03:16 +00:00
Simon Pilgrim	bd9fb2ae95	[APInt] Add APInt::extractBits() method to extract APInt subrange The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296141	2017-02-24 17:46:18 +00:00
Simon Dardis	ae6f2bcb25	Recommit "[mips] Fix atomic compare and swap at O0." This time with the missing files. Similar to PR/25526, fast-regalloc introduces spills at the end of basic blocks. When this occurs in between an ll and sc, the store can cause the atomic sequence to fail. This patch fixes the issue by introducing more pseudos to represent atomic operations and moving their lowering to after the expansion of postRA pseudos. This resolves PR/32020. Thanks to James Cowgill for reporting the issue! Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D30257 llvm-svn: 296134	2017-02-24 16:32:18 +00:00
Simon Dardis	3c58c18ff0	Revert "[mips] Fix atomic compare and swap at O0." This reverts r296132. I forgot to include the tests. llvm-svn: 296133	2017-02-24 16:30:27 +00:00
Simon Dardis	cf0e06d375	[mips] Fix atomic compare and swap at O0. Similar to PR/25526, fast-regalloc introduces spills at the end of basic blocks. When this occurs in between an ll and sc, the store can cause the atomic sequence to fail. This patch fixes the issue by introducing more pseudos to represent atomic operations and moving their lowering to after the expansion of postRA pseudos. This resolves PR/32020. Thanks to James Cowgill for reporting the issue! Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D30257 llvm-svn: 296132	2017-02-24 16:27:45 +00:00
Daniel Sanders	066ebbfd46	[globalisel] Decouple src pattern operands from dst pattern operands. Summary: This isn't testable for AArch64 by itself so this patch also adds support for constant immediates in the pattern and physical register uses in the result. The new IntOperandMatcher matches the constant in patterns such as '(set $rd:GPR32, (G_XOR $rs:GPR32, -1))'. It's always safe to fold immediates into an instruction so this is the first rule that will match across multiple BB's. The Renderer hierarchy is responsible for adding operands to the result instruction. Renderers can copy operands (CopyRenderer) or add physical registers (in particular %wzr and %xzr) to the result instruction in any order (OperandMatchers now import the operand names from SelectionDAG to allow renderers to access any operand). This allows us to emit the result instruction for: %1 = G_XOR %0, -1 --> %1 = ORNWrr %wzr, %0 %1 = G_XOR -1, %0 --> %1 = ORNWrr %wzr, %0 although the latter is untested since the matcher/importer has not been taught about commutativity yet. Added BuildMIAction which can build new instructions and mutate them where possible. W.r.t the mutation aspect, MatchActions are now told the name of an instruction they can recycle and BuildMIAction will emit mutation code when the renderers are appropriate. They are appropriate when all operands are rendered using CopyRenderer and the indices are the same as the matcher. This currently assumes that all operands have at least one matcher. Finally, this change also fixes a crash in AArch64InstructionSelector::select() caused by an immediate operand passing isImm() rather than isCImm(). This was uncovered by the other changes and was detected by existing tests. Depends on D29711 Reviewers: t.p.northover, ab, qcolombet, rovka, aditya_nandakumar, javed.absar Reviewed By: rovka Subscribers: aemerson, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29712 llvm-svn: 296131	2017-02-24 15:43:30 +00:00
Simon Pilgrim	7f6a7c97a7	[X86][SSE] Target shuffle combine can try to combine up to 16 vectors Noticed while profiling PR32037, the target shuffle ops were being stored in SmallVector<*,8> types but the combiner could store as many as 16 ops at maximum depth (2 per depth). llvm-svn: 296130	2017-02-24 15:35:52 +00:00
Sanjay Patel	9f0fa52aa2	[x86] use DAG.getAllOnesConstant(); NFCI llvm-svn: 296128	2017-02-24 15:09:59 +00:00
Simon Dardis	aa20881749	[mips] Handle 64 bit immediate in and/or/xor pseudo instructions on mips64 Previously LLVM was assuming 32-bit signed immediates which results in and with a bitmask that has bit 31 set to incorrectly include bits 63-32 in the result. After applying this patch I can now compile all of the FreeBSD mips assembly code with clang. This issue also affects the nor, slt and sltu macros and I will fix those in a separate review. Patch By: Alexander Richardson Commit message reformatted by sdardis. Reviewers: atanasyan, theraven, sdardis Differential Revision: https://reviews.llvm.org/D30298 llvm-svn: 296125	2017-02-24 14:34:32 +00:00
Diana Picus	3b99c64ba1	[ARM] GlobalISel: Select G_STORE Same as selecting G_LOAD. llvm-svn: 296122	2017-02-24 14:01:27 +00:00
Diana Picus	1f432f995a	[ARM] GlobalISel: Add reg bank mappings for stores Same as the ones for loads. llvm-svn: 296115	2017-02-24 13:07:25 +00:00
Diana Picus	a2b632a353	[ARM] GlobalISel: Legalize stores Allow the same types that we allow for loads. llvm-svn: 296108	2017-02-24 11:28:24 +00:00
Simon Dardis	a5f52dc00d	[mips][mc] Fix a crash when disassembling odd sized sections Make the MIPS disassembler consistent with the other targets in returning a Size of zero when the input buffer cannot contain an instruction due to it's size. Previously it reported the minimum instruction size when it failed due to the buffer not being big enough for an instruction causing llvm-objdump to crash when disassembling all sections. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D29984 llvm-svn: 296105	2017-02-24 10:50:27 +00:00
Diana Picus	c21d1e5d94	Revert "[ARM] GlobalISel: Legalize stores" This reverts commit r296103 because the test broke on one of the bots. Sorry! llvm-svn: 296104	2017-02-24 10:35:39 +00:00
Diana Picus	a5f1cfd1a7	[ARM] GlobalISel: Legalize stores Allow the same types that we allow for loads. llvm-svn: 296103	2017-02-24 10:19:23 +00:00
Simon Pilgrim	aed352273e	[APInt] Add APInt::setBits() method to set all bits in range The current pattern for setting bits in range is typically: Mask \|= APInt::getBitsSet(MaskSizeInBits, LoPos, HiPos); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation memory for the temporary variable. This is one of the key compile time issues identified in PR32037. This patch adds the APInt::setBits() helper method which avoids the temporary memory allocation completely, this first implementation uses setBit() internally instead but already significantly reduces the regression in PR32037 (~10% drop). Additional optimization may be possible. I investigated whether there is need for APInt::clearBits() and APInt::flipBits() equivalents but haven't seen these patterns to be particularly common, but reusing the code would be trivial. Differential Revision: https://reviews.llvm.org/D30265 llvm-svn: 296102	2017-02-24 10:15:29 +00:00
Dan Gohman	411dc07aba	[WebAssembly] Add a README.txt entry for mergeable sections. llvm-svn: 296095	2017-02-24 07:33:55 +00:00
Craig Topper	8783bbb598	[AVX-512] Separate the fadd/fsub/fmul/fdiv/fmax/fmin with rounding mode ISD opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC llvm-svn: 296094	2017-02-24 07:21:10 +00:00
Craig Topper	f2529c188b	[AVX-512] Remove lzcnt intrinsics and autoupgrade them to generic ctlz intrinsics with select. Clang has been emitting cltz intrinsics for a while now. llvm-svn: 296091	2017-02-24 05:35:04 +00:00
Petr Hosek	a7d5916308	[Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stack The Fuchsia ABI defines slots from the thread pointer where the stack-guard value for stack-protector, and the unsafe stack pointer for safe-stack, are stored. This parallels the Android ABI support. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D30237 llvm-svn: 296081	2017-02-24 03:10:10 +00:00
Artem Belevich	620db1f3dd	[NVPTX] Added support for .f16x2 instructions. This patch enables support for .f16x2 operations. Added new register type Float16x2. Added support for .f16x2 instructions. Added handling of vectorized loads/stores of v2f16 values. Differential Revision: https://reviews.llvm.org/D30057 Differential Revision: https://reviews.llvm.org/D30310 llvm-svn: 296032	2017-02-23 22:38:24 +00:00
Tim Northover	063a56e81c	ARM: make sure FastISel bails on f64 operations for Cortex-M4. FastISel wasn't checking the isFPOnlySP subtarget feature before emitting double-precision operations, so it got completely invalid CodeGen for doubles on Cortex-M4F. The normal ISel testing wasn't spectacular either so I added a second RUN line to improve that while I was in the area. llvm-svn: 296031	2017-02-23 22:35:00 +00:00
Krzysztof Parzyszek	128e191eac	[Hexagon] Handle saturations in Hexagon bit tracker llvm-svn: 296026	2017-02-23 22:11:52 +00:00
Krzysztof Parzyszek	998e49e5c8	[Hexagon] Allow setting register in BitVal without storing into map In the bit tracker, references to other bit values in which the register is 0 are prohibited. This means that generating self-referential register cells like { w:32 [0-15]:s[0-15] [16-31]:s[15] } is impossible. In order to get a self-referential cell, it had to be stored into a map and then reloaded from it. To avoid this step, add a function that will set the register to a given value without going through the map. llvm-svn: 296025	2017-02-23 22:08:50 +00:00
Stanislav Mekhanoshin	78468e48cf	[AMDGPU] Shut the warning "getRegUnitWeight hides overload...". NFC. Clang issues warning about hidden overload. That was intended, so add "using AMDGPUGenRegisterInfo::getRegUnitWeight;" to mute it. llvm-svn: 296021	2017-02-23 21:51:28 +00:00
Evgeniy Stepanov	ee2d77f6d6	Disable TLS for stack protector on Android API<17. The TLS slot did not exist back then. llvm-svn: 296014	2017-02-23 21:06:35 +00:00
Stanislav Mekhanoshin	ce3ddd2de4	Correct register pressure calculation in presence of subregs If a subreg is used in an instruction it counts as a whole superreg for the purpose of register pressure calculation. This patch corrects improper register pressure calculation by examining operand's lane mask. Differential Revision: https://reviews.llvm.org/D29835 llvm-svn: 296009	2017-02-23 20:19:44 +00:00
Krzysztof Parzyszek	2cfc7a48de	[Hexagon] Avoid IMPLICIT_DEFs as new-value producers llvm-svn: 295997	2017-02-23 17:47:34 +00:00
Jan Vesely	70293a045b	AMDGPU/SI: Fix trunc i16 pattern Hit on ASICs that support 16bit instructions. Differential Revision: https://reviews.llvm.org/D30281 llvm-svn: 295990	2017-02-23 16:12:21 +00:00
Krzysztof Parzyszek	af5ff65d67	[Hexagon] Patterns for CTPOP, BSWAP and BITREVERSE llvm-svn: 295981	2017-02-23 15:02:09 +00:00
Diana Picus	a8cb0cd8f2	[ARM] GlobalISel: Lower call returns Introduce a common ValueHandler for call returns and formal arguments, and inherit two different versions for handling the differences (at the moment the only difference is the way physical registers are marked as used). llvm-svn: 295973	2017-02-23 14:18:41 +00:00
Diana Picus	a606713c33	[ARM] GlobalISel: Lower call parameters in regs Add support for lowering calls with parameters than can fit into regs. Use the same ValueHandler that we used for function returns, but rename it to match its new, extended purpose. llvm-svn: 295971	2017-02-23 13:25:43 +00:00
Ayman Musa	4b2c968c43	[X86][AVX] Disable VCVTSS2SD & VCVTSD2SS memory folding and fix the register class of their first input when creating node in fast-isel. (Quick fix to buildbot failure after rL295940 commit). llvm-svn: 295970	2017-02-23 13:15:44 +00:00
Simon Dardis	d410fc8f28	[mips][ias] Further relax operands of certain assembly instructions This patch adjusts the most relaxed predicate of immediate operands to accept immediate forms such as ~(0xf0000000\|0x000f00000). Previously these forms would be accepted by GAS and rejected by IAS. This partially resolves PR/30383. Thanks to Sean Bruno for reporting the issue! Reviewers: slthakur, seanbruno Differential Revision: https://reviews.llvm.org/D29218 llvm-svn: 295965	2017-02-23 12:40:58 +00:00
Kristof Beyls	5ac6adbb6d	Fix assertion failure in ARMConstantIslandPass. The ARMConstantIslandPass didn't have support for handling accesses to constant island objects through ARM::t2LDRBpci instructions. This adds support for that. This fixes PR31997. llvm-svn: 295964	2017-02-23 12:24:55 +00:00
Ayman Musa	524dbdaa2b	[X86][AVX512] Remove VCVTSS2SDZ & VCVTSD2SSZ from memory folding tables as they introduce new read dependency when folding. (Quick fix to buildbot fail). llvm-svn: 295946	2017-02-23 08:13:36 +00:00
Ayman Musa	6e670cf44f	[X86][AVX512] Change VCVTSS2SD and VCVTSD2SS node types to keep consistency between VEX/EVEX versions. AVX versions of the converts work on f32/f64 types, while AVX512 version work on vectors. Differential Revision: https://reviews.llvm.org/D29988 llvm-svn: 295940	2017-02-23 07:24:21 +00:00
Matt Arsenault	f0a88dbaab	LoadStoreVectorizer: Split even sized illegal chains properly Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933	2017-02-23 03:58:53 +00:00
Matt Arsenault	a9e16e6597	AMDGPU: Add another BFE pattern This is the pattern that falls out of the instruction's definition if offset == 0. llvm-svn: 295912	2017-02-23 00:23:43 +00:00
Matt Arsenault	79a45db7f5	AMDGPU: Use clamp with f64 llvm-svn: 295908	2017-02-22 23:53:37 +00:00
Matt Arsenault	d5c6515b68	AMDGPU: Fold FP clamp as modifier bit The manual is unclear on the details of this. It's not clear to me if denormals are not allowed with clamp, or if that is only omod. Not allowing denorms for fp16 or fp64 isn't useful so I also question if that is really a restriction. Same with whether this is valid without IEEE mode enabled. llvm-svn: 295905	2017-02-22 23:27:53 +00:00
Wei Ding	f2cce02eb2	AMDGPU : Update TrapCode based on Trap Handler ABI. Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295904	2017-02-22 23:22:19 +00:00
Matt Arsenault	f5262256a1	AMDGPU: Add replacement bfe intrinsics llvm-svn: 295899	2017-02-22 23:04:58 +00:00
Krzysztof Parzyszek	ab57c2bad3	[Hexagon] Implement @llvm.readcyclecounter() llvm-svn: 295892	2017-02-22 22:28:47 +00:00
Matt Arsenault	7b6c5d28f5	AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPR This should avoid reporting any stack needs to be allocated in the case where no stack is truly used. An unused stack slot is still left around in other cases where there are real stack objects but no spilling occurs. llvm-svn: 295891	2017-02-22 22:23:32 +00:00
Krzysztof Parzyszek	3596a81c69	[RDF] Support for partial structural aliases in RegisterAggr llvm-svn: 295883	2017-02-22 21:42:15 +00:00
Krzysztof Parzyszek	65971d97b0	[Hexagon] Add intrinsics for masked vector stores Patch by Harsha Jagasia. llvm-svn: 295879	2017-02-22 21:23:09 +00:00
Matt Arsenault	93e65ea733	AMDGPU: Don't look at chain users when adjusting writemask Fixes not adjusting using new intrinsics with chains. llvm-svn: 295878	2017-02-22 21:16:41 +00:00
Matt Arsenault	707780b420	AMDGPU: Always allocate emergency stack slot at offset 0 This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877	2017-02-22 21:05:25 +00:00
Matt Arsenault	61ec6a03ca	AMDGPU: Change exp with compr bit printing llvm-svn: 295873	2017-02-22 20:37:12 +00:00
Wei Ding	6ade56e0a0	Revert "AMDGPU : Update TrapCode based on Trap Handler ABI." This reverts commit r295867. llvm-svn: 295871	2017-02-22 20:29:22 +00:00
Wei Ding	4991d3570f	AMDGPU : Update TrapCode based on Trap Handler ABI. Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867	2017-02-22 20:05:06 +00:00
Geoff Berry	6bb79157dd	[AArch64] Extend AArch64RedundantCopyElimination to do simple copy propagation. Summary: Extend AArch64RedundantCopyElimination to catch cases where the register that is known to be zero is COPY'd in the predecessor block. Before this change, this pass would catch cases like: CBZW %W0, <BB#1> BB#1: %W0 = COPY %WZR // removed After this change, cases like the one below are also caught: %W0 = COPY %W1 CBZW %W1, <BB#1> BB#1: %W0 = COPY %WZR // removed This change results in a 4% increase in static copies removed by this pass when compiling the llvm test-suite. It also fixes regressions caused by doing post-RA copy propagation (a separate change to be put up for review shortly). Reviewers: junbuml, mcrosier, t.p.northover, qcolombet, MatzeB Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D30113 llvm-svn: 295863	2017-02-22 19:10:45 +00:00
Dan Gohman	38b42b4a95	[WebAssembly] Define a table of function signatures for runtime library calls. LLVM CodeGen emits references to external symbols that are never declared in LLVM IR level, so they have no declared signature. However, WebAssembly requires all functions be declared with signatures. This patch adds a table for providing signatures for known runtime libcalls that will be used in subsequent patches to emit declarations for such functions. llvm-svn: 295857	2017-02-22 18:34:16 +00:00
Krzysztof Parzyszek	ace1b89060	[RDF] Skip undef uses when calculating kill flags llvm-svn: 295856	2017-02-22 18:29:16 +00:00
Krzysztof Parzyszek	ba36b92bef	[RDF] Only access block live-ins when tracking liveness llvm-svn: 295855	2017-02-22 18:27:36 +00:00
Dan Gohman	a63e8eb138	[WebAssembly] Configure codegen to legalize f16 values. llvm-svn: 295850	2017-02-22 16:28:00 +00:00
Simon Pilgrim	13cdd57964	[X86][SSE] getTargetConstantBitsFromNode - insert constant bits directly into masks. Minor optimization, don't create temporary mask APInts that are just going to be OR'd into the accumulate masks - insert directly instead. llvm-svn: 295848	2017-02-22 15:38:13 +00:00
Simon Pilgrim	3a895c4873	[X86][SSE] Use APInt::getBitsSet() instead of APInt::getLowBitsSet().shl() separately. NFCI. llvm-svn: 295845	2017-02-22 15:04:55 +00:00
Simon Pilgrim	3b97067ae8	Fix -Wunused-but-set-variable warning by removing unused 'aggregateIsPacked' checking llvm-svn: 295830	2017-02-22 13:37:31 +00:00
Benjamin Kramer	5a7e0f8357	[GlobalISel] Fix compiler warnings and make assert assert something. llvm-svn: 295827	2017-02-22 12:59:47 +00:00
Igor Breger	f7359d893a	[X86][GlobalISel] Initial implementation , select G_ADD gpr, gpr Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr . Reviewers: qcolombet, rovka, zvi, ab Reviewed By: rovka Subscribers: mgorny, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29816 llvm-svn: 295824	2017-02-22 12:25:09 +00:00
Roger Ferrer Ibanez	56db97d4de	[ARM] Fix constant islands pass. The pass tries to fix a spill of LR that turns out to be unnecessary. So it removes the tPOP but forgets to remove tPUSH. This causes the stack be misaligned upon returning the function. Thus, remove the tPUSH as well in this case. Differential Revision: https://reviews.llvm.org/D30207 llvm-svn: 295816	2017-02-22 09:06:21 +00:00
Ayman Musa	ceea56c705	[X86] Fix memory operands definition for some instructions. Change integer memory operands to FP memory operands to some FP instructions. Differential Revision: https://reviews.llvm.org/D30201 llvm-svn: 295813	2017-02-22 08:06:29 +00:00
Javed Absar	b672722810	[ARM] Classification Improvements to ARM Sched-Models. NFCI. This patch adds missing sched classes for Thumb2 instructions. This has been missing so far, and as a consequence, machine scheduler models for individual sub-targets have tended to be larger than they needed to be. These patches should help write schedulers better and faster in the future for ARM sub-targets. Reviewer: Diana Picus Differential Revision: https://reviews.llvm.org/D29953 llvm-svn: 295811	2017-02-22 07:22:57 +00:00
Craig Topper	56d4022997	[AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions when available This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION. I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others. Differential revision: https://reviews.llvm.org/D30186 llvm-svn: 295810	2017-02-22 06:54:18 +00:00
Dan Gohman	18eafb6c68	[WebAssembly] Add skeleton MC support for the Wasm container format This just adds the basic skeleton for supporting a new object file format. All of the actual encoding will be implemented in followup patches. Differential Revision: https://reviews.llvm.org/D26722 llvm-svn: 295803	2017-02-22 01:23:18 +00:00
Matt Arsenault	1f17c66890	AMDGPU: Add cvt.pkrtz intrinsic Convert llvm.SI.packf16 test uses llvm-svn: 295797	2017-02-22 00:27:34 +00:00
Matt Arsenault	9417505f7d	AMDGPU: Remove llvm.AMDGPU.clamp intrinsic llvm-svn: 295789	2017-02-21 23:46:04 +00:00
Matt Arsenault	2fdf2a1a18	AMDGPU: Redefine clamp node as clamp 0.0-1.0 Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788	2017-02-21 23:35:48 +00:00
Artem Belevich	29bbdc1c32	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values. Original code only used vector loads/stores for explicit vector arguments. It could also do more loads/stores than necessary (e.g v5f32 would touch 8 f32 values). Aggregate types were loaded one element at a time, even the vectors contained within. This change attempts to generalize (and simplify) parameter space loads/stores so that vector loads/stores can be used more broadly. Functionality of the patch has been verified by compiling thrust test suite and manually checking the differences between PTX generated by llvm with and without the patch. General algorithm: * ComputePTXValueVTs() flattens input/output argument into a flat list of scalars to load/store and returns their types and offsets. * VectorizePTXValueVTs() uses that data to create vectorization plan which returns an array of flags marking boundaries of vectorized load/stores. Scalars are represented as 1-element vectors. * Code that generates loads/stores implements a simple state machine that constructs a vector according to the plan. Differential Revision: https://reviews.llvm.org/D30011 llvm-svn: 295784	2017-02-21 22:56:05 +00:00
Matt Arsenault	7d6b71db4f	AMDGPU: Formatting fixes llvm-svn: 295783	2017-02-21 22:50:41 +00:00
Evandro Menezes	a8d3301ee1	[AArch64, X86] Add statistics for the MacroFusion pass llvm-svn: 295777	2017-02-21 22:16:13 +00:00
Evandro Menezes	b9b7f4b8d3	[AArch64, X86] Guard against both instrs being wild cards If both instrs are wild cards, the result can be a crash. llvm-svn: 295776	2017-02-21 22:16:11 +00:00
Evgeniy Stepanov	1fd19c6e5d	Fix PR31896. Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset). llvm-svn: 295762	2017-02-21 20:17:34 +00:00
Matt Arsenault	c2a44e4c3c	AMDGPU: Remove llvm.AMDGPU.flbit intrinsic llvm-svn: 295754	2017-02-21 19:27:33 +00:00
Matt Arsenault	e0bf7d02f0	AMDGPU: Don't use stack space for SGPR->VGPR spills Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753	2017-02-21 19:12:08 +00:00
Geoff Berry	5d534b6a11	[CodeGenPrepare] Sink and duplicate more 'and' instructions. Summary: Rework the code that was sinking/duplicating (icmp and, 0) sequences into blocks where they were being used by conditional branches to form more tbz instructions on AArch64. The new code is more general in that it just looks for 'and's that have all icmp 0's as users, with a target hook used to select which subset of 'and' instructions to consider. This change also enables 'and' sinking for X86, where it is more widely beneficial than on AArch64. The 'and' sinking/duplicating code is moved into the optimizeInst phase of CodeGenPrepare, where it can take advantage of the fact the OptimizeCmpExpression has already sunk/duplicated any icmps into the blocks where they are used. One minor complication from this change is that optimizeLoadExt needed to be updated to always mark 'and's it has determined should be in the same block as their feeding load in the InsertedInsts set to avoid an infinite loop of hoisting and sinking the same 'and'. This change fixes a regression on X86 in the tsan runtime caused by moving GVNHoist to a later place in the optimization pipeline (see PR31382). Reviewers: t.p.northover, qcolombet, MatzeB Subscribers: aemerson, mcrosier, sebpop, llvm-commits Differential Revision: https://reviews.llvm.org/D28813 llvm-svn: 295746	2017-02-21 18:53:14 +00:00
Simon Pilgrim	8eb515d8c4	[X86] EltsFromConsecutiveLoads SDLoc argument should be const&. There appears never to have been a time that the reference was updated. llvm-svn: 295739	2017-02-21 17:42:28 +00:00
Simon Pilgrim	791955819c	[X86][AVX2] Fix VPBROADCASTQ folding on 32-bit targets. As i64 isn't a value type on 32-bit targets, we need to fold the VZEXT_LOAD into VPBROADCASTQ. llvm-svn: 295733	2017-02-21 16:41:44 +00:00
John Brawn	a6e95e1652	[ARM] Correct SP/PC handling in t2MOVr PC isn't allowed in the source operand of t2MOVr, so change the register class to one without PC. SP handling is slightly trickier and changes depending on if we're in ARMv8, so do that in checkTargetMatchPredicate. Differential Revision: https://reviews.llvm.org/D30199 llvm-svn: 295732	2017-02-21 16:41:29 +00:00
Simon Pilgrim	3546156122	[X86][SSE] Prefer to combine shuffles to VZEXT over VZEXT_MOVL. This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns. llvm-svn: 295723	2017-02-21 15:09:00 +00:00
Igor Breger	812f319794	[AVX512] Fix EXTRACT_VECTOR_ELT for v2i1/v4i1/v32i1/v64i1 with variable index. Differential Revision: https://reviews.llvm.org/D30189 llvm-svn: 295718	2017-02-21 14:01:25 +00:00
Diana Picus	613b65696a	[ARM] GlobalISel: Lower calls to void() functions For now, we hardcode a BLX instruction, and generate an ADJCALLSTACKDOWN/UP pair with amount 0. llvm-svn: 295716	2017-02-21 11:33:59 +00:00
Craig Topper	d88389aa7e	[X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs Summary: Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency. This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do. Reviewers: zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30181 llvm-svn: 295697	2017-02-21 06:39:13 +00:00
Craig Topper	16d9730b86	[X86] Fix formatting. NFC llvm-svn: 295695	2017-02-21 06:27:13 +00:00
Craig Topper	d9fe664868	[AVX-512] Use sse_load_f32/f64 in place of scalar_to_vector and scalar load in some patterns. llvm-svn: 295693	2017-02-21 04:26:10 +00:00
Craig Topper	d890db6952	[AVX-512] Fix the ExeDomain for vcmpss/vcmpsd. llvm-svn: 295691	2017-02-21 04:26:04 +00:00
Sanjoy Das	90208720e3	Add a wrapper around copy_if in STLExtras; NFC I will add one more use for this in a later change. llvm-svn: 295685	2017-02-21 00:38:44 +00:00
Craig Topper	2012dda9a0	[AVX-512] Add a few more patterns for selecting masked vpternlog with broadcast loads where the passthru operand is not operand 0. llvm-svn: 295673	2017-02-20 17:44:09 +00:00
Simon Pilgrim	2967ed1c7e	[X86] Tidyup combineExtractVectorElt. NFCI. Pull out repeated code for extraction index operand and source vector value type. Use isNullConstant helper to check for zero extraction index. llvm-svn: 295670	2017-02-20 16:09:45 +00:00
Diana Picus	1c33c9f0b0	[ARM] GlobalISel: Don't select atomic loads There used to be a check in the IRTranslator that prevented us from having to deal with atomic loads/stores. That check has been removed in r294993 and the AArch64 backend was updated accordingly. This commit does the same thing for the ARM backend. In general, in the ARM backend we introduce fences during the atomic expand pass, so we don't have to worry about atomics, except for the 32-bit ARMv8 target, which handles atomics more like AArch64. Since we don't want to worry about that yet, just bail out of instruction selection if we find any atomic loads. llvm-svn: 295662	2017-02-20 14:45:58 +00:00
Igor Breger	fda32d266a	[X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector. Its more profitable to go through memory (1 cycles throughput) than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index. IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer) For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. Removing the VINSERT node, we don't need it any more. Differential Revision: https://reviews.llvm.org/D29690 llvm-svn: 295660	2017-02-20 14:16:29 +00:00
Simon Pilgrim	5910ebe720	[X86][AVX512] Add support for ASHR v2i64/v4i64 support without VLX Use v8i64 ASHR instructions if we don't have VLX. Differential Revision: https://reviews.llvm.org/D28537 llvm-svn: 295656	2017-02-20 12:16:38 +00:00
Sjoerd Meijer	e22a79e898	AArch64AsmParser: tablegen the isBranchTarget helper functions Use tablegen to autogenerate isBranchtarget helper functions. This is a cleanup that removes almost identical functions that differ only in a few constants. Differential Revision: https://reviews.llvm.org/D30160 llvm-svn: 295649	2017-02-20 10:57:54 +00:00
Ayman Musa	51ffeab8c8	[X86][AVX] Extend hasVEX_WPrefix bit to accept WIG value (W Ignore) + update all AVX instructions with the new value. Add WIG value to all of AVX instructions which ignore the W-bit in their encoding, instead of giving them the default value of 0. This patch is needed for a follow up work on EVEX2VEX pass (replacing EVEX encoded instructions with their corresponding VEX version when possible). Differential Revision: https://reviews.llvm.org/D29876 llvm-svn: 295643	2017-02-20 08:27:54 +00:00
Craig Topper	c6c68f5958	[AVX-512] Add more patterns to fold masked VPTERNLOG with load when the passthru isn't operand 0. llvm-svn: 295640	2017-02-20 07:00:40 +00:00
Craig Topper	a5fa2e40f9	[AVX-512] Fix mistake in the immediate swizzle for some of the VPTERNLOG patterns. llvm-svn: 295638	2017-02-20 07:00:34 +00:00
Craig Topper	5b4e36aafa	[AVX-512] Add more VPTERNLOG patterns to enable folding of broadcast loads that aren't in operand 2. llvm-svn: 295634	2017-02-20 02:47:42 +00:00
Craig Topper	c184b671d9	[X86] Use memory form of shift right by 1 when the rotl immediate is one less than the operation size. An earlier commit already did this for the register form. llvm-svn: 295626	2017-02-20 00:37:23 +00:00
Craig Topper	63801df251	[AVX-512] Remove AddedComplexity from masked operations. The size of the patterns already increases their priority. llvm-svn: 295619	2017-02-19 21:44:35 +00:00
Simon Pilgrim	14a7eee0b4	[X86] Use peekThroughOneUseBitcasts helper. NFCI. llvm-svn: 295618	2017-02-19 21:40:51 +00:00
Davide Italiano	16b476ffcc	[X86] Prefer static_cast<> to C-style cast. NFCI. llvm-svn: 295617	2017-02-19 21:35:41 +00:00

... 15 16 17 18 19 ...

43055 Commits