llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	0c005be6eb	[X86][SSE] getV4X86ShuffleImm8 - canonicalize broadcast masks If the mask input to getV4X86ShuffleImm8 only refers to a single source element (+ undefs) then canonicalize to a full broadcast. getV4X86ShuffleImm8 defaults to inline values for undefs, which can be useful for shuffle widening/narrowing but does leave SimplifyDemanded* calls thinking the shuffle depends on unnecessary elements. I'm still investigating what we should do more generally to avoid these undemanded elements, but broadcast cases was a simpler win.	2020-07-29 11:32:44 +01:00
Craig Topper	06cf6f770d	[X86] Add FeatureCMPXCHG8B and FeatureSlowUAMem16 to 'lakemont' in X86.td We already had CMPXCH8B feature on this CPU for the frontend so this doesn't have much effect. The FeatureSlowUAMem16 only matters if someone compiles with -march=lakemont -msse which doesn't make sense, but is consistent with all our pre-sse4.2 CPUs. Maybe the feature flag should be FeatureFastUAMem16 and set on the newer CPUs instead.	2020-07-28 18:24:46 -07:00
Roman Lebedev	e1dd212c87	[X86] Remove disabled miscompiling X86CondBrFolding pass As briefly discussed in IRC with @craig.topper, the pass is disabled basically since it's original introduction (nov 2018) due to known correctness issues (miscompilations), and there hasn't been much work done to fix that. While i won't promise that i will "fix" the pass, i have looked at it previously, and i'm sure i won't try to fix it if that requires actually fixing this existing code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D84775	2020-07-28 23:35:04 +03:00
Craig Topper	69152a11cf	[X86] Merge the two 'Emit the normal disp32 encoding' cases in SIB byte handling in emitMemModRMByte. NFCI By repeating the Disp.isImm() check in a couple spots we can make the normal case for immediate and for expression the same. And then always rely on the ForceDisp32 flag to remove a later non-zero immediate check. This should make {disp32} pseudo prefix handling slightly easier as we need the normal disp32 handler to handle a immediate of 0.	2020-07-28 12:12:09 -07:00
Simon Pilgrim	4838cd46a9	[X86][XOP] Shuffle v16i8 using VPPERM(X,Y) instead of OR(PSHUFB(X),PSHUFB(Y))	2020-07-28 19:56:10 +01:00
Craig Topper	91b8c1fd0f	[X86] Simplify some code in emitMemModRMByte. NFCI	2020-07-28 10:46:04 -07:00
Craig Topper	6c3dc6e1d5	[X86] Merge disp8 and cdisp8 handling into a single helper function to reduce some code. We currently handle EVEX and non-EVEX separately in two places. By sinking the EVEX check into the existing helper for CDisp8 we can simplify these two places. Differential Revision: https://reviews.llvm.org/D84730	2020-07-28 10:46:01 -07:00
Simon Pilgrim	182111777b	[X86][SSE] Attempt to match OP(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(HOP(X,Y)) An initial backend patch towards fixing the various poor HADD combines (PR34724, PR41813, PR45747 etc.). This extends isHorizontalBinOp to check if we have per-element horizontal ops (odd+even element pairs), but not in the expected serial order - in which case we build a "post shuffle mask" that we can apply to the HOP result, assuming we have fast-hops/optsize etc. The next step will be to extend the SHUFFLE(HOP(X,Y)) combines as suggested on PR41813 - accepting more post-shuffle masks even on slow-hop targets if we can fold it into another shuffle. Differential Revision: https://reviews.llvm.org/D83789	2020-07-28 10:04:14 +01:00
Craig Topper	647e861e08	[X86] Detect if EFLAGs is live across XBEGIN pseudo instruction. Add it as livein to the basic blocks created when expanding the pseudo XBEGIN causes several based blocks to be inserted. If flags are live across it we need to make eflags live in the new basic blocks to avoid machine verifier errors. Fixes PR46827 Reviewed By: ivanbaev Differential Revision: https://reviews.llvm.org/D84479	2020-07-27 21:15:35 -07:00
Craig Topper	25f193fb46	[X86] Add support for {disp32} to control size of jmp and jcc instructions in the assembler By default we pick a 1 byte displacement and let relaxation enlarge it if necessary. The GNU assembler supports a pseudo prefix to basically pre-relax the instruction the larger size. I plan to add {disp8} and {disp32} support for memory operands in another patch which is why I've included the parsing code and enum for {disp8} pseudo prefix as well. Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D84709	2020-07-27 21:11:48 -07:00
Craig Topper	a0ebac52df	[X86] Properly encode a 32-bit address with an index register and no base register in 16-bit mode. In 16-bit mode we can encode a 32-bit address using 0x67 prefix. We were failing to do this when the index register was a 32-bit register, the base register was not present, and the displacement fit in 16-bits. Fixes PR46866.	2020-07-27 21:11:42 -07:00
Craig Topper	51e1c028d4	[X86] Add back comment inadvertently lost in `1a1448e656`.	2020-07-27 10:02:38 -07:00
Simon Pilgrim	4d84d94969	[X86][SSE] Relax 128-bit restriction on extract_subvector(ext_vector_inreg(X),0) -> ext_vector_inreg(extract_subvector(X,0)) fold We only need to ensure that the source is larger than the subvector result type	2020-07-27 17:50:36 +01:00
Simon Pilgrim	ab4ffa52f0	[X86][AVX] Fold extract_subvector(truncate(x),0) -> truncate(extract_subvector(x),0) This is currently only supported for VLX targets where the op should be legal.	2020-07-27 14:51:29 +01:00
Simon Pilgrim	f720c9c68c	[X86] combineExtractSubvector - pull out repeated getSizeInBits() calls. NFCI.	2020-07-27 14:51:28 +01:00
Craig Topper	df12524e6b	[X86] Turn X86DAGToDAGISel::tryVPTERNLOG into a fully custom instruction selector that can handle bitcasts between logic ops Previously we just matched the logic ops and replaced with an X86ISD::VPTERNLOG node that we would send through the normal pattern match. But that approach couldn't handle a bitcast between the logic ops. Extending that approach would require us to peek through the bitcasts and emit new bitcasts to match the types. Those new bitcasts would then have to be properly topologically sorted. This patch instead switches to directly emitting the MachineSDNode and skips the normal tablegen pattern matching. We do have to handle load folding and broadcast load folding ourselves now. Which also means commuting the immediate control. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D83630	2020-07-26 12:19:08 -07:00
Craig Topper	1a75d88b3e	[X86] Move getGatherOverhead/getScatterOverhead into X86TargetTransformInfo. These cost methods don't make much sense in X86Subtarget. Make them methods in X86's TTI and move the feature checks from the X86Subtarget constructor into these methods. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D84594	2020-07-26 10:38:42 -07:00
Simon Pilgrim	17eafe0841	[X86][SSE] lowerV2I64Shuffle - use undef elements in PSHUFD mask widening If we lower a v2i64 shuffle to PSHUFD, we currently clamp undef elements to 0, (elements 0,1 of the v4i32) which can result in the shuffle referencing more elements of the source vector than expected, affecting later shuffle combines and KnownBits/SimplifyDemanded calls. By ensuring we widen the undef mask element we allow getV4X86ShuffleImm8 to use inline elements as the default, which are more likely to fold.	2020-07-26 16:04:22 +01:00
Craig Topper	1a1448e656	[X86] Merge X86MCInstLowering's maxLongNopLength into emitNop and remove check for FeatureNOPL. The switch in emitNop uses 64-bit registers for nops exceeding 2 bytes. This isn't valid outside 64-bit mode. We could fix this easily enough, but there are no users that ask for more than 2 bytes outside 64-bit mode. Inlining the method to make the coupling between the two methods more explicit.	2020-07-25 22:11:47 -07:00
Craig Topper	14c59b4577	[X86] Remove getProcFamily() method from X86Subtarget. NFC This isn't used and we've decided in the past that a CPU enum for tuning is not a good idea.	2020-07-25 22:11:45 -07:00
Craig Topper	1df8804ce5	[X86] Replace a use of ProcIntelSLM with FeatureFast7ByteNOP.	2020-07-25 20:46:48 -07:00
Simon Pilgrim	3b21823e4a	[X86][SSE] combineX86ShufflesRecursively - move all Root node asserts to the same location. NFCI. Minor tidyup for some upcoming shuffle combine improvements.	2020-07-25 12:48:14 +01:00
Simon Pilgrim	66998ae59f	[X86][SSE] getFauxShuffle - ignore undemanded sources for PACKSS/PACKUS faux shuffles If we don't care about an entire LHS/RHS of the PACK op, then can just treat it the same as undef (we don't care if it saturates) and is safe to treat as a shuffle. This can happen if we attempt to decode as a faux shuffle before SimplifyDemandedVectorElts has been called on the PACK which should replace the source with UNDEF entirely.	2020-07-25 10:51:14 +01:00
Craig Topper	945ed22f33	[X86] Move the implicit enabling of sse2 for 64-bit mode from X86Subtarget::initSubtargetFeatures to X86_MC::ParseX86Triple. ParseX86Triple already checks for 64-bit mode and produces a static string. We can just add +sse2 to the end of that static string. This avoids a potential reallocation when appending it to the std::string at runtime. This is a slight change to the behavior of tools that only use MC layer which weren't implicitly enabling sse2 before, but will now. I don't think we check for sse2 explicitly in any MC layer components so this shouldn't matter in practice. And if it did matter the new behavior is more correct.	2020-07-24 11:14:20 -07:00
Craig Topper	8158f0cefe	[X86] Use X86_MC::ParseX86Triple to add mode features to feature string in X86Subtarget::initSubtargetFeatures. Remove mode flags from constructor and remove calls to ToggleFeature for the mode bits. By adding them to the feature string we handle initializing the mode member variables in X86Subtarget and the feature bits in MCSubtargetInfo in one shot.	2020-07-24 10:48:22 -07:00
Craig Topper	205e8b7e89	[X86] Make the X86ProcFamilyEnum private to X86Subtarget. Removed unneeded 'protected' from X86Subtarget. NFC	2020-07-23 23:42:11 -07:00
Craig Topper	5dbcf5e3cc	[X86] Add Feature64Bit to the 'generic' CPU and remove feature string hacking in X86Subtarget constructor Feature64Bit is only used by a check in the X86Subtarget constructor to ensure that the CPU selected supports 64-bit mode when the triple is for 64-bit mode. 'generic' is the default CPU in llc and so needs to be able to pass this check. Previously we did this by detecting the name and adding the feature to the feature string. But there doesn't seem to be any reason we can't just add the feature to the CPU directly.	2020-07-23 09:16:18 -07:00
Simon Pilgrim	d720ba1e4b	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add SSE shift multiple use handling Add SimplifyMultipleUseDemandedVectorElts peek through for imm/var SSE shifts	2020-07-23 14:39:03 +01:00
Craig Topper	ebe5f17f9c	[X86] Remove the DeprecatedMPX feature flag. We deprecated mpx feature in 10.0. I left this feature flag in case someone still had IR files containing the feature in a target-feature attribute. At the time I think I thought it would fail the test if the feature couldn't be found. Further review suggests that at worst it prints a message to stderr about ignoring the feature.	2020-07-22 17:44:07 -07:00
Craig Topper	b2c65beb14	[X86] Rework the "sahf" feature flag to only apply to 64-bit mode. SAHF/LAHF instructions are always available in 32-bit mode. Early 64-bit capable CPUs made the undefined opcodes in 64-bit mode. This was changed on later CPUs. We have a feature flag to control our usage of these instructions. This feature flag is hooked up to a clang command line option -msahf/-mno-sahf specifically to give control of the 64-bit mode behavior. In the backend X86Subtarget constructor we were explicitly forcing +sahf into the feature flag string if we were not compiling for 64-bit mode. This was intended to make the predicates always allow the instructions outside of 64-bit mode. Unfortunately, the way it was placed into the string allowed -mno-sahf from clang to disable SAHF instructions in 32-bit mode. This causes an assertion to fire if you compile a floating point comparison with something like "-march=pentium -mno-sahf" as our floating point comparison handling on CPUs that don't support FCOMI/FUCOMI instructions requires SAHF. To fix this, this commit restricts the feature flag to only apply to 64-bit mode by ignoring the flag outside 64-bit mode in X86Subtarget::hasLAHFSAHF(). This way we don't need to mess with the feature string at all.	2020-07-22 16:57:46 -07:00
Craig Topper	deeb2fdbf4	[X86] Remove a couple temporary std::string for CPU names that I don't need to exist. The input to these functions is a StringRef. We then convert it to a std::string. Then maybe replace with "generic". I think we can just overwrite the incoming StringRef with "generic" if needed and then pass it along without creating any std::string.	2020-07-22 15:55:04 -07:00
Matt Arsenault	0c92bfa4b8	GlobalISel: Don't use virtual for distinguishing arg handlers There's no reason to involve the hassle of a virtual method targets have to override for a simple boolean. Not sure exactly what's going on with Mips, but it seems to define its own totally separate handler classes.	2020-07-22 14:14:43 -04:00
Sebastian Neubauer	2a6c871596	[InstCombine] Move target-specific inst combining For a long time, the InstCombine pass handled target specific intrinsics. Having target specific code in general passes was noted as an area for improvement for a long time. D81728 moves most target specific code out of the InstCombine pass. Applying the target specific combinations in an extra pass would probably result in inferior optimizations compared to the current fixed-point iteration, therefore the InstCombine pass resorts to newly introduced functions in the TargetTransformInfo when it encounters unknown intrinsics. The patch should not have any effect on generated code (under the assumption that code never uses intrinsics from a foreign target). This introduces three new functions: TargetTransformInfo::instCombineIntrinsic TargetTransformInfo::simplifyDemandedUseBitsIntrinsic TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic A few target specific parts are left in the InstCombine folder, where it makes sense to share code. The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. This allows to move about 3000 lines out from InstCombine to the targets. Differential Revision: https://reviews.llvm.org/D81728	2020-07-22 15:59:49 +02:00
Simon Pilgrim	5b5dc2442a	[X86][AVX] getTargetShuffleMask - don't decode VBROADCAST(EXTRACT_SUBVECTOR(X,0)) patterns. getTargetShuffleMask is used by the various "SimplifyDemanded" folds so we can't assume that the bypassed extract_subvector can be safely simplified - getFauxShuffleMask performs a more general decode that allows us to more safely catch many of these cases so the impact is minimal.	2020-07-21 21:55:44 +01:00
Sanjay Patel	50afa18772	[x86] split FMA with fast-math-flags to avoid libcall fma reassoc A, B, C --> fadd (fmul A, B), C (when target has no FMA hardware) C/C++ code may use explicit fma() calls (which become LLVM fma intrinsics in IR) but then gets compiled with -ffast-math or similar. For targets that do not have FMA hardware, we don't want to go out to the math library for a precise but slow FMA result. I tried this as a generic DAGCombine, but it caused infinite looping on more than 1 other target, so there's likely some over-reaching fma formation happening. There's also a potential intersection of strict FP with fast-math here. Deferring to current behavior for that case (assuming that strict-ness overrides fast-ness). Differential Revision: https://reviews.llvm.org/D83981	2020-07-19 10:03:55 -04:00
Craig Topper	6bba95831e	[X86] Change the scheduler model for 'pentium4' to SandyBridgeModel. I meant to do this in D83913, but missed it while updating the feature list. Interestingly I think this is disabling the postRA scheduler. But it does match our default 64-bit behavior. Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D83996	2020-07-16 22:04:29 -07:00
Craig Topper	addbf732c8	[X86] Reorder how the subtarget map key is created. We use a SmallString<512> and attempted to reserve enough space for CPU plus Features, but that doesn't account for all the things that get added to the string. Reorder the string so the shortest things go first which shouldn't exceed the small size. Finally add the feature string at the end which might be long. This should ensure at most one heap allocation without needing to use reserve. I don't know if this matters much in practice, but I was looking into something else that will require more code here and noticed the odd reserve call.	2020-07-16 21:41:45 -07:00
Craig Topper	5408024fa8	[X86] Move integer hadd/hsub formation into a helper function shared by combineAdd and combineSub. There was a lot of duplicate code here for checking the VT and subtarget. Moving it into a helper avoids that. It also fixes a bug that combineAdd reused Op0/Op1 after a call to isHorizontalBinOp may have changed it. The new helper function has its own local version of Op0/Op1 that aren't shared by other code. Fixes PR46455. Reviewed By: spatel, bkramer Differential Revision: https://reviews.llvm.org/D83971	2020-07-16 13:27:27 -07:00
Craig Topper	ad171d24b9	[X86] Change the tuning settings for pentium4 to be more modern since its the default 32-bit cpu in clang Alternative to D83897. I believe the big change here is that I removed slow unaligned memory 16 Down side that it may adversely effect tuning if someone explicitly targets -march=pentium4 and expects pentium4 tuned code. Of course pentium4 is so old our default behavior with the previous settings may not have been the best either. Reviewed By: echristo, RKSimon Differential Revision: https://reviews.llvm.org/D83913	2020-07-16 12:51:25 -07:00
Craig Topper	9adf7461f7	[X86] Add test case for PR46455.	2020-07-16 11:06:55 -07:00
Craig Topper	71b49aa438	[X86] Allow lsl/lar to be parsed with a GR16, GR32, or GR64 as source register. This matches GNU assembler behavior. Operand size is determined only from the destination register.	2020-07-15 23:51:37 -07:00
Craig Topper	3c2a56a857	[X86] Teach assembler parser to accept lsl and lar with a 64 or 32 source register when the destination is a 64 register. Previously we only accepted a 32-bit source with a 64-bit dest. Accepting 64-bit as well is more consistent with gas behavior. I think maybe we should accept 16 bit register as well, but I'm not sure.	2020-07-15 15:17:06 -07:00
Krzysztof Pszeniczny	c3e6555616	Call Frame Information (CFI) Handling for Basic Block Sections This patch handles CFI with basic block sections, which unlike DebugInfo does not support ranges. The DWARF standard explicitly requires emitting separate CFI Frame Descriptor Entries for each contiguous fragment of a function. Thus, the CFI information for all callee-saved registers (possibly including the frame pointer, if necessary) have to be emitted along with redefining the Call Frame Address (CFA), viz. where the current frame starts. CFI directives are emitted in FDE’s in the object file with a low_pc, high_pc specification. So, a single FDE must point to a contiguous code region unlike debug info which has the support for ranges. This is what complicates CFI for basic block sections. Now, what happens when we start placing individual basic blocks in unique sections: * Basic block sections allow the linker to randomly reorder basic blocks in the address space such that a given basic block can become non-contiguous with the original function. * The different basic block sections can no longer share the cfi_startproc and cfi_endproc directives. So, each basic block section should emit this independently. * Each (cfi_startproc, cfi_endproc) directive will result in a new FDE that caters to that basic block section. * Now, this basic block section needs to duplicate the information from the entry block to compute the CFA as it is an independent entity. It cannot refer to the FDE of the original function and hence must duplicate all the stuff that is needed to compute the CFA on its own. * We are working on a de-duplication patch that can share common information in FDEs in a CIE (Common Information Entry) and we will present this as a follow up patch. This can significantly reduce the duplication overhead and is particularly useful when several basic block sections are created. * The CFI directives are emitted similarly for registers that are pushed onto the stack, like callee saved registers in the prologue. There are cfi directives that emit how to retrieve the value of the register at that point when the push happened. This has to be duplicated too in a basic block that is floated as a separate section. Differential Revision: https://reviews.llvm.org/D79978	2020-07-14 12:54:12 -07:00
Logan Smith	a19461d9e1	[NFC] Add 'override' keyword where missing in include/ and lib/. This fixes warnings raised by Clang's new -Wsuggest-override, in preparation for enabling that warning in the LLVM build. This patch also removes the virtual keyword where redundant, but only in places where doing so improves consistency within a given file. It also removes a couple unnecessary virtual destructor declarations in derived classes where the destructor inherited from the base class is already virtual. Differential Revision: https://reviews.llvm.org/D83709	2020-07-14 09:47:29 -07:00
Eric Christopher	e958379581	Fold the opt size check into the assert to silence an unused variable warning.	2020-07-13 16:05:24 -07:00
Hiroshi Yamauchi	fb558ccae7	[PGO][PGSO] Add profile guided size optimization to X86ISelDAGToDAG. Differential Revision: https://reviews.llvm.org/D83331	2020-07-13 10:28:09 -07:00
Hiroshi Yamauchi	153a0b8906	[PGO][PGSO] Add profile guided size optimization to the X86 LEA fixup. Differential Revision: https://reviews.llvm.org/D83330	2020-07-13 09:46:22 -07:00
Eric Astor	3aabfa2808	[ms] [llvm-ml] Restore omitted changes requested by reviewer	2020-07-13 10:49:19 -04:00
Eric Astor	f08e8b6d7c	[ms] [llvm-ml] Add support for MASM STRUCT casting field accessors: (<TYPE> PTR <value>).<field> Summary: Add support for MASM STRUCT casting field accessors: (<TYPE> PTR <value>).<field> Since these are operands, we add them to X86AsmParser. If/when we extend MASM support to other architectures (e.g., ARM), we will need similar changes there as well. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D83346	2020-07-13 10:40:47 -04:00
Eric Astor	4cdea5faf9	[ms] [llvm-ml] Improve MASM STRUCT field accessor support Summary: Adds support for several accessors: - `[<identifier>.<struct name>].<field>` - `[<identifier>.<struct name>.<field>].<subfield>` (where `field` has already-defined STRUCT type) - `[<variable>.<field>].<subfield>` (where `field` has already-defined STRUCT type) Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D83344	2020-07-13 10:34:30 -04:00
Craig Topper	f8f007e378	[X86] Consistently use 128 as the PSHUFB/VPPERM index for zero Bit 7 of the index controls zeroing, the other bits are ignored when bit 7 is set. Shuffle lowering was using 128 and shuffle combining was using 255. Seems like we should be consistent. This patch changes shuffle combining to use 128 to match lowering. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D83587	2020-07-12 10:52:43 -07:00
Craig Topper	04013a07ac	[X86] Fix two places that appear to misuse peekThroughOneUseBitcasts peekThroughOneUseBitcasts checks the use count of the operand of the bitcast. Not the bitcast itself. So I think that means we need to do any outside haseOneUse checks before calling the function not after. I was working on another patch where I misused the function and did a very quick audit to see if I there were other similar mistakes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D83598	2020-07-12 10:52:43 -07:00
Wang, Pengfei	e628092524	[X86][MMX] Optimize MMX shift intrinsics. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D83534	2020-07-11 11:16:23 +08:00
Craig Topper	122a45fbac	[X86] Add isel patterns for matching broadcast vpternlog if the ternlog and the broadcast have different types.	2020-07-10 15:15:02 -07:00
Simon Pilgrim	4cc26a44ca	[X86][SSE] Use shouldUseHorizontalOp helper to determine whether to use (F)HADD. NFCI.	2020-07-10 12:13:34 +01:00
Simon Pilgrim	77133cc1e2	[X86][AVX] Attempt to fold PACK(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(PACK(X,Y)). Truncations lowered as shuffles of multiple (concatenated) vectors often leave us with lane-crossing shuffles that feed a PACKSS/PACKUS, if both shuffles are fed from the same 2 vector sources, then we can PACK the sources directly and shuffle the result instead. This is currently limited to whole i128 lanes in a 256-bit vector, but we can extend this if the need arises (but I'm not seeing many examples in real world code).	2020-07-10 09:33:27 +01:00
Craig Topper	918e653186	[X86] Immediately call LowerShift from lowerBuildVectorToBitOp. If we don't immediately lower the vector shift, the splat constant vector we created may get turned into a constant pool load before we get around to lowering the shift. This makes it a lot more difficult to create a shift by constant. Sometimes we fail to see through the constant pool at all and end up trying to lower as if it was a variable shift. This requires custom handling and may create an unsupported vselect on pre-sse-4.1 targets. Since we're after LegalizeVectorOps we are unable to legalize the unsupported vselect as that code is in LegalizeVectorOps rather than LegalizeDAG. So calling LowerShift immediately ensures that we get see the splat constant. Fixes PR46527. Differential Revision: https://reviews.llvm.org/D83455	2020-07-09 10:51:29 -07:00
Hiroshi Yamauchi	2c1a9006dd	[PGO][PGSO] Add profile guided size optimization to X86 ISel Lowering.	2020-07-09 10:43:45 -07:00
Craig Topper	3e75912005	[X86] Directly emit X86ISD::BLENDV instead of VSELECT in a few places that were emitting sign bit tests. Technically a VSELECT expects a vector of all 1s or 0s elements for its condition. But we aren't guaranteeing that the sign bit and the non sign bits match in these locations. So we should use BLENDV which is more relaxed. Differential Revision: https://reviews.llvm.org/D83447	2020-07-09 10:40:09 -07:00
Simon Pilgrim	f54402b63a	[X86][AVX] Attempt to fold extract_subvector(shuffle(X)) -> extract_subvector(X) If we're extracting a subvector from a shuffle that is shuffling entire subvectors we can peek through and extract the subvector from the shuffle source instead. This helps remove some cases where concat_vectors(extract_subvector(),extract_subvector()) legalizations has resulted in BLEND/VPERM2F128 shuffles of the subvectors.	2020-07-09 14:09:24 +01:00
Benjamin Kramer	b44470547e	Make helpers static. NFC.	2020-07-09 13:48:56 +02:00
Simon Pilgrim	800fb68420	[X86][SSE] Pull out PACK(SHUFFLE(),SHUFFLE()) folds into its own function. NFC. Future patches will extend this so declutter combineVectorPack before we start.	2020-07-08 17:42:42 +01:00
Simon Pilgrim	08a2c9ce5c	[X86] Fix copy+paste typo in combineVectorPack assert message. NFC.	2020-07-08 17:42:42 +01:00
Nicolai Hähnle	3fa989d4fd	DomTree: remove explicit use of DomTreeNodeBase::iterator Summary: Almost all uses of these iterators, including implicit ones, really only need the const variant (as it should be). The only exception is in NewGVN, which changes the order of dominator tree child nodes. Change-Id: I4b5bd71e32d71b0c67b03d4927d93fe9413726d4 Reviewers: arsenm, RKSimon, mehdi_amini, courbet, rriddle, aartbik Subscribers: wdng, Prazek, hiraditya, kuhar, rogfer01, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, vkmr, Kayjukh, jurahul, msifontes, cfe-commits, llvm-commits Tags: #clang, #mlir, #llvm Differential Revision: https://reviews.llvm.org/D83087	2020-07-08 18:18:49 +02:00
Sanjay Patel	9114900287	[x86] improve codegen for non-splat bit-masked vector compare and select (PR46531) vselect ((X & Pow2C) == 0), LHS, RHS --> vselect ((shl X, C') < 0), RHS, LHS Follow-up to D83073 - the non-splat mask cases where we actually see an improvement are quite limited from what I can tell. AVX1 needs multiply and blend capabilities and AVX2 needs vector shift and blend capabilities. The intersection of those 2 constraints is only vectors with 32-bit or 64-bit elements. XOP is/was better. Differential Revision: https://reviews.llvm.org/D83181	2020-07-08 08:20:49 -04:00
Simon Pilgrim	9dc250db9d	[X86][AVX] SimplifyDemandedVectorEltsForTargetShuffle - ensure mask is same size as constant size Fixes test regression reported on D81791	2020-07-08 11:47:59 +01:00
Simon Pilgrim	c00a27752e	[X86][AVX] Remove redundant EXTRACT_VECTOR_ELT(VBROADCAST(SCALAR())) fold Noticed while looking for similar cases to rG931ec74f7a29 - SimplifyDemandedVectorElts and shuffle combining both should handle this now.	2020-07-08 10:18:36 +01:00
Eric Astor	bc8e262afe	[ms] [llvm-ml] Add initial MASM STRUCT/UNION support Summary: Add support for user-defined types to MasmParser, including initialization and field access. Known issues: - Omitted entry initializers (e.g., <,0>) do not work consistently for nested structs/arrays. - Size checking/inference for values with known types is not yet implemented. - Some ml64.exe syntaxes for accessing STRUCT fields are not recognized. - `[<register>.<struct name>].<field>` - `[<register>[<struct name>.<field>]]` - `(<struct name> PTR [<register>]).<field>` - `[<variable>.<struct name>].<field>` - `(<struct name> PTR <variable>).<field>` Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D75306	2020-07-07 17:02:10 -04:00
Zola Bridges	9d9e499840	[x86][seses] Add clang flag; Use lvi-cfi with seses This patch creates a clang flag to enable SESES. This flag also ensures that lvi-cfi is on when using seses via clang. SESES should use lvi-cfi to mitigate returns and indirect branches. The flag to enable the SESES functionality only without lvi-cfi is now -x86-seses-enable-without-lvi-cfi to warn users part of the mitigation is not enabled if they use this flag. This is useful in case folks want to see the cost of SESES separate from the LVI-CFI. Reviewed By: sconstab Differential Revision: https://reviews.llvm.org/D79910	2020-07-07 13:20:13 -07:00
Simon Pilgrim	931ec74f7a	[X86][AVX] Don't fold PEXTR(VBROADCAST_LOAD(X)) -> LOAD(X). We were checking the VBROADCAST_LOAD element size against the extraction destination size instead of the extracted vector element size - PEXTRW/PEXTB have implicit zext'ing so have i32 destination sizes for v8i16/v16i8 vectors, resulting in us extracting from the wrong part of a load. This patch bails from the fold if the vector element sizes don't match, and we now use the target constant extraction code later on like the pre-AVX2 targets, fixing the test case. Found by internal fuzzing tests.	2020-07-07 19:10:03 +01:00
Zola Bridges	dfabffb195	[x86][lvi][seses] Use SESES at O0 for LVI mitigation Use SESES as the fallback at O0 where the optimized LVI pass isn't desired due to its effect on build times at O0. I updated the LVI tests since this changes the code gen for the tests touched in the parent revision. This is a follow up to the comments I made here: https://reviews.llvm.org/D80964 Hopefully we can continue the discussion here. Also updated SESES to handle LFENCE instructions properly instead of adding redundant LFENCEs. In particular, 1) no longer add LFENCE if the current instruction being processed is an LFENCE and 2) no longer add LFENCE if the instruction right before the instruction being processed is an LFENCE Reviewed By: sconstab Differential Revision: https://reviews.llvm.org/D82037	2020-07-07 11:05:09 -07:00
Sanjay Patel	642eed3713	[x86] fix miscompile in buildvector v16i8 lowering In the test based on PR46586: https://bugs.llvm.org/show_bug.cgi?id=46586 ...we are inserting 16-bits into the high element of the vector, shuffling it to element 0, and extracting 32-bits. But xmm1 was never initialized, so the top 16-bits of the extract are undef without this patch. (It seems like we could do better than this by recognizing that we only demand a subsection of the build vector, but I want to make sure we fix the miscompile 1st.) This path is only used for pre-SSE4.1, and simpler patterns get squashed somewhere along the way, so the test still includes a 'urem' as it did in the original test from the bug report. Differential Revision: https://reviews.llvm.org/D83319	2020-07-07 13:02:31 -04:00
Liu, Chen3	ea85ff82c8	[X86] Fix a bug that when lowering byval argument When an argument has 'byval' attribute and should be passed on the stack according calling convention, a stack copy would be emitted twice. This will cause the real value will be put into stack where the pointer should be passed. Differential Revision: https://reviews.llvm.org/D83175	2020-07-07 21:49:31 +08:00
Xiang1 Zhang	939d8309db	[X86-64] Support Intel AMX Intrinsic INTEL ADVANCED MATRIX EXTENSIONS (AMX). AMX is a new programming paradigm, it has a set of 2-dimensional registers (TILES) representing sub-arrays from a larger 2-dimensional memory image and operate on TILES. These intrinsics use direct TMM register number as its params. Spec can be found in Chapter 3 here https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D83111	2020-07-07 10:13:40 +08:00
Wolfgang Pieb	129387497e	Correct 3 spelling errors in headers and doc strings.	2020-07-06 17:27:51 -07:00
Craig Topper	e652c0f8f3	[X86] Teach lowerShuffleAsBlend to use bit blend for v16i8/v32i8/v16i16 when avx512vl is enabled but not avx512bw. Probably not super important since there are no real CPUs with avx512vl and not avx512bw. But vpternlog should be better than vblendvb. I do wonder if we should use vpternlog even with BWI. We currently use vblendmb or vpblendmw by putting the mask into a GPR and moving it to a k-register. But I don't think we hoist the GPR to k-register copy in machine LICM. Using VPTERNLOG would use a constant pool load, but has the advantage that we're pretty good at hoisting and rematerializing those. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D83156	2020-07-04 10:26:56 -07:00
Craig Topper	b4eb415a99	[X86] Disable VPBLENDVB formation in combineLogicBlendIntoPBLENDV if VPTERNLOG is supported. VPBLENDVB is multiple uops while VPTERNLOG is a single uop. So we should use that instead. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D83155	2020-07-04 10:12:19 -07:00
Simon Pilgrim	71f342d6c3	[X86][AVX] Fold PACK(LOSUBVECTOR(SHUFFLE(X)),HISUBVECTOR(SHUFFLE(X))) -> SHUFFLE(PACK(LOSUBVECTOR(X),HISUBVECTOR(X))) Using PACK for truncations leaves us with intermediate shuffles that can be tricky to remove while the truncation tree is being formed. This fold helps pull out the PERMQ case which is one of the most common, avoiding some costly lane-crossing shuffles. A future patch will begin adding more general shuffle folding, which we should be able to use for HADD/HSUB as well.	2020-07-04 13:54:30 +01:00
Craig Topper	fed432523e	[X86] Directly emit VPTERNLOG from canonicalizeBitSelect when possible. Seems to produce better results on some rotate tests. And is neutral for other tests.	2020-07-03 22:08:28 -07:00
Craig Topper	e75f2d5a8c	[X86] Add matching support for X86ISD::ANDNP to X86DAGToDAGISel::tryVPTERNLOG.	2020-07-03 17:50:35 -07:00
Sanjay Patel	26543f1c0c	[x86] improve codegen for bit-masked vector compare and select (PR46531) We canonicalize patterns like: %s = lshr i32 %a0, 1 %t = trunc i32 %s to i1 to: %a = and i32 %a0, 2 %c = icmp ne i32 %a, 0 ...in IR, but the bit-shifting original sequence may be better for x86 vector codegen. I tried several variants of the transform, and it's tricky to not induce regressions. In particular, I did not find a way to cleanly handle non-splat constants, so I've left that as a TODO item here (currently negative tests for those are included). AVX512 resulted in some diffs, but didn't look meaningful, so I left that out too. Some of the 256-bit AVX1 diffs are questionable, but close enough that they are probably insignificant. Differential Revision: https://reviews.llvm.org/D83073.	2020-07-03 17:31:57 -04:00
serge-sans-paille	c8ef3d5a2f	Fix stack-clash probing for large static alloca Differential Revision: https://reviews.llvm.org/D82867	2020-07-03 09:22:03 +02:00
Craig Topper	b94e9b7f05	[X86] Remove MODRM_SPLITREGM from the disassembler tables. This offers a very minor table size reduction due to only being used for one AMX opcode.	2020-07-03 00:16:20 -07:00
Craig Topper	52855ed099	[X86] Add back support for matching VPTERNLOG from back to back logic ops. I think this mostly looks ok. The only weird thing I noticed was a couple rotate vXi8 tests picked up an extra logic op where we have (and (or (and), (andn)), X). Previously we matched the (or (and), (andn)) to vpternlog, but now we match the (and (or), X) and leave the and/andn unmatched.	2020-07-02 22:11:52 -07:00
Craig Topper	acf6c94a38	[X86] Teach lower512BitShuffle to try bitmask and bitblend before splitting v32i16/v64i8 on av512f only targets. We consider v32i16/v64i8 to be legal types on avx512f, but we don't have most operations until avx512bw. But we can use and/or/xor operations. So try those before splitting. This is especially helpful since we turn some ands with constant masks into shuffles in early DAG combines. So we should make sure we recover those back to AND.	2020-07-02 15:35:48 -07:00
Craig Topper	912cd8a37f	[X86] Add vpternlog to the broadcast unfolding table.	2020-07-02 13:43:44 -07:00
Craig Topper	204a21317a	[X86] Modify the conditions for when we stop making v16i8/v32i8 rotate Custom based on having avx512 features. The comments here indicate that we prefer to promote the shifts instead of allowing rotate to be pattern matched. But we weren't taking into account whether 512-bit registers are enabled or whethever we have vpsllvw/vpsrlvw instructions. splatvar_rotate_v32i8 is a slight regrssion, but the other cases are neutral or improved.	2020-07-02 13:07:51 -07:00
Guillaume Chatelet	8dbafd24d6	[Alignment][NFC] Transition and simplify calls to DL::getABITypeAlignment This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82977	2020-07-02 11:28:02 +00:00
Craig Topper	0aad82943a	[X86] Enable multibyte NOPs in 64-bit mode for padding/alignment. The default CPU used by llvm-mc doesn't have the NOPL feature, but if we know we're compiling in 64-bit mode we should be able to use nopl.	2020-07-01 23:59:01 -07:00
Xiang1 Zhang	aded4f0cc0	[X86-64] Support Intel AMX instructions Summary: INTEL ADVANCED MATRIX EXTENSIONS (AMX). AMX is a new programming paradigm, it has a set of 2-dimensional registers (TILES) representing sub-arrays from a larger 2-dimensional memory image and operate on TILES. Spec can be found in Chapter 3 here https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewers: LuoYuanke, annita.zhang, pengfei, RKSimon, xiangzhangllvm Reviewed By: xiangzhangllvm Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82705	2020-07-02 08:57:04 +08:00
Craig Topper	c420762172	Revert "[X86] Enable multibyte NOPs in 64-bit mode for padding/alignment." Looks like lld tests need updates too This reverts commit `3367e9dac5`.	2020-07-01 15:20:53 -07:00
Craig Topper	3367e9dac5	[X86] Enable multibyte NOPs in 64-bit mode for padding/alignment. The default CPU used by llvm-mc doesn't have the NOPL feature, but if we know we're compiling in 64-bit mode we should be able to use nopl.	2020-07-01 10:57:24 -07:00
Yuanfang Chen	78c69a00a4	[NFC] Clean up uses of MachineModuleInfoWrapperPass	2020-07-01 09:45:05 -07:00
Eric Astor	353a169cb8	[ms] [llvm-ml] Use default RIP-relative addressing for x64 MASM. Summary: When parsing 64-bit MASM, treat memory operands with unspecified base register as RIP-based. Documented in several places, including https://software.intel.com/en-us/articles/introduction-to-x64-assembly: "Unfortunately, MASM does not allow this form of opcode, but other assemblers like FASM and YASM do. Instead, MASM embeds RIP-relative addressing implicitly." Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D73227	2020-07-01 12:41:07 -04:00
Simon Pilgrim	b485586482	[X86][SSE] Fix targetShrinkDemandedConstant constant vector sign extensions D82257/rG3521ecf1f8a3 was incorrectly sign-extending a constant vector from the lsb, this is fine if all the constant elements are 'allsignbits' in the active bits, but if only some of the elements are, then we are corrupting the constant values for those elements. This fix ensures we sign extend from the msb of the active/demanded bits instead.	2020-07-01 12:12:53 +01:00
Guillaume Chatelet	28de229bc6	[Alignment][NFC] Migrate MachineFrameInfo::CreateStackObject to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82894	2020-07-01 07:28:11 +00:00
Craig Topper	1df1186ab1	[X86] Use some preprocessor macros to reduce the very similar repeated code in getVPTESTMOpc. NFCI This function picks X86 opcode name based on type, masking, and whether not a load or broadcast has been folded using multiple switch statements. The contents of the switches mostly just vary in a few characters in the instruction name. So use some macros to build the instruction names to reduce the repetiveness.	2020-06-30 14:38:22 -07:00
Matt Arsenault	249933f254	X86: Use Register	2020-06-30 12:13:08 -04:00
Simon Pilgrim	82de018954	[X86][SSE] LowerVectorAllZero - add support for masked OR-reductions If we're masking the result of an OR-reduction before comparing against zero, we can fold this into the PTEST() / MOVMSK(CMPEQ()) codegen by pre-masking the source value. This works particularly well on PTEST which performs the AND as part of its operation, but the MOVMSK variant also benefits for non-V2I64 cases. Fixes PR44781	2020-06-30 14:38:52 +01:00
Guillaume Chatelet	c1cd61e02a	[Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemcpy to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82849	2020-06-30 13:12:31 +00:00
Guillaume Chatelet	6a6af30d43	[Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemset to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82851	2020-06-30 12:46:26 +00:00
Guillaume Chatelet	a976ea3209	[Alignment][NFC] Migrate PPC, X86 and XCore backends to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82779	2020-06-30 08:08:45 +00:00
Craig Topper	767c9c5bf5	[X86] Remove an isel pattern than can never match. Remove bitcasts of loads from a few others.	2020-06-30 00:17:56 -07:00
Craig Topper	9b04d69cce	[X86] Prefer AND over PSHUFB for v64i8 when possible If the shuffle is a blend and one input is a 0 vector, we should prefer AND over PSHUFB since its available on more execution ports. Differential Revision: https://reviews.llvm.org/D82798	2020-06-29 16:26:53 -07:00
Matt Arsenault	2790516418	X86: Use MOV32r0 pseudo instead of directly emitting xor This was producing reg = xor undef reg, undef reg. This looks similar to a use of a value to define itself, and I want to disallow undef uses for SSA virtual registers. If this were to use implicit_def, there's no guarantee the two operands end up using the same register (I think no guarantee exists even if the two operands start out as the same register, but this was violated when I switched this to use an explicit implicit_def). The MOV32r0 pseudo evidently exists to handle this case, so use it instead. This was more work than I expected for the 64-bit case, but I didn't see any helper for materializing a 64-bit 0.	2020-06-29 14:45:20 -04:00
Christopher Tetreault	0da1e7ebf9	[SVE] Remove calls to VectorType::getNumElements from X86 Reviewers: efriedma, RKSimon, craig.topper, fpetrogalli, c-rhodes Reviewed By: RKSimon Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82508	2020-06-29 11:10:35 -07:00
Simon Pilgrim	333aa690f4	[X86][SSE] MatchVectorAllZeroTest - handle OR vector reductions (REAPPLIED) This patch extends MatchVectorAllZeroTest to handle OR vector reduction patterns where the result is compared against zero. Reapplied with a fix for a chromium regression due to a missing isNullConstant() check in combineSetCC: https://bugs.chromium.org/p/chromium/issues/detail?id=1097758 Fixes PR45378 Differential Revision: https://reviews.llvm.org/D81547	2020-06-29 15:50:44 +01:00
Simon Pilgrim	3521ecf1f8	[X86] Add vector support to targetShrinkDemandedConstant for OR/XOR opcodes If a constant is only allsignbits in the demanded/active bits, then sign extend it to an allsignbits bool pattern for OR/XOR ops. This also requires SimplifyDemandedBits XOR handling to be modified to call ShrinkDemandedConstant on any (non-NOT) XOR pattern to account for non-splat cases. Next step towards fixing PR45808 - with this patch we now get a <-1,-1,0,0> v4i64 constant instead of <1,1,0,0>. Differential Revision: https://reviews.llvm.org/D82257	2020-06-29 12:19:05 +01:00
Simon Pilgrim	973685fc78	[TargetLowering] Add DemandedElts arg to ShrinkDemandedConstant Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.	2020-06-29 11:46:58 +01:00
Simon Pilgrim	e07a982693	[X86] combineScalarToVector - handle (v2i64 scalar_to_vector(aextload)) as well as (v2i64 scalar_to_vector(aext)) We already fold (v2i64 scalar_to_vector(aext)) -> (v2i64 bitcast(v4i32 scalar_to_vector(x))), this adds support for similar aextload cases and also handles v2f64 cases that wrap the i64 extension behind bitcasts. Fixes the remaining issue with PR39016	2020-06-28 13:00:32 +01:00
Simon Pilgrim	393b4bd136	[X86] SimplifyDemandedVectorEltsForTargetNode - merge shuffle/pack lower demanded elements handling. Generalize the vector operand extraction code for shuffle/pack ops - we can assume that the vector operands are the same width as the result, and any non-vector values can be reused directly in the smaller width op.	2020-06-27 19:10:13 +01:00
Simon Pilgrim	e855efe424	[X86][AVX] SimplifyDemandedVectorEltsForTargetNode - reduce width of X86ISD::VPERMIL2 If we don't need the elements of the upper lanes, reduce the width of the X86ISD::VPERMIL2 node.	2020-06-27 15:06:49 +01:00
Simon Pilgrim	d56c6475a6	[X86][AVX] SimplifyDemandedVectorEltsForTargetNode - reduce width of X86ISD::VPERMILPV If we don't need the elements of the upper lanes, reduce the width of the X86ISD::VPERMILPV node.	2020-06-27 14:43:03 +01:00
Craig Topper	9e8b5a20e9	[X86] Add MOVBE and RDRND features to BDVER4. Only 6 years behind gcc. https://gcc.gnu.org/legacy-ml/gcc-patches/2014-08/msg00231.html Found while working on improving how we define CPU features for clang and auditing for correctness.	2020-06-26 23:32:17 -07:00
Amy Huang	8b59c26bf3	Extend or truncate __ptr32/__ptr64 pointers when dereferenced. Summary: A while ago I implemented the functionality to lower Microsoft __ptr32 and __ptr64 pointers, which are stored as 32-bit and 64-bit pointer and are extended/truncated to the appropriate pointer size when dereferenced. This patch adds an addrspacecast to cast from the __ptr32/__ptr64 pointer to a default address space when dereferencing. Bug: https://bugs.llvm.org/show_bug.cgi?id=42359 Reviewers: hans, arsenm, RKSimon Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81517	2020-06-26 13:33:54 -07:00
Guillaume Chatelet	b66e33a689	[Alignment][NFC] Migrate TTI::getGatherScatterOpCost to Align This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82577	2020-06-26 11:08:27 +00:00
Guillaume Chatelet	fdc7c7fb87	[Alignment][NFC] Migrate TTI::getInterleavedMemoryOpCost to Align This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82573	2020-06-26 11:00:53 +00:00
Guillaume Chatelet	7e1f79c3de	[Alignment][NFC] Migrate TTI::getMaskedMemoryOpCost to Align This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82569	2020-06-26 10:14:16 +00:00
Craig Topper	12665f2812	[X86] Make XSAVEC/XSAVEOPT/XSAVES properly depend on XSAVE in both the frontend and the backend. These features implicitly enabled XSAVE in the frontend, but not the backend. Disabling XSAVE in the frontend disabled XSAVEOPT, but not the other 2. Nothing happened in the backend.	2020-06-26 00:14:58 -07:00
Craig Topper	6673d69226	[X86] Don't imply -mprfchw when -m3dnow is specified. Enable prefetchw in the backend with 3dnow feature. The PREFETCHW instruction was originally part of the 3DNow. But it was given its own CPUID bit on later CPUs just before 3DNow was deprecated. We were setting the -mprfchw flag if -m3dnow was passed or the CPU supported 3dnow unless -mno-prfchw was passed. But -march=native on a CPU without the PRFCHW CPUID bit set will pass -mno-prfchw. So -march=k8 will behave differently than -march=native on a K8 for example. So remove this implicit setting from the frontend and instead enable the backend to use PREFETCHW if 3dnow OR prfchw is enabled. Also enable PRFCHW flag on amdfam10/barcelona which seems to be where this CPUID bit was introduced. That CPU also supported 3dnow.	2020-06-25 12:46:52 -07:00
Craig Topper	01c18f9199	Revert "[X86] Don't imply -mprfchw when -m3dnow is specified. Enable prefetchw in the backend with 3dnow feature." This is failing on the bots. This reverts commit `636d31a5c3`.	2020-06-25 11:43:02 -07:00
Craig Topper	636d31a5c3	[X86] Don't imply -mprfchw when -m3dnow is specified. Enable prefetchw in the backend with 3dnow feature. The PREFETCHW instruction was originally part of the 3DNow. But it was given its own CPUID bit on later CPUs just before 3DNow was deprecated. We were setting the -mprfchw flag if -m3dnow was passed or the CPU supported 3dnow unless -mno-prfchw was passed. But -march=native on a CPU without the PRFCHW CPUID bit set will pass -mno-prfchw. So -march=k8 will behave differently than -march=native on a K8 for example. So remove this implicit setting from the frontend and instead enable the backend to use PREFETCHW if 3dnow OR prfchw is enabled. Also enable PRFCHW flag on amdfam10/barcelona which seems to be where this CPUID bit was introduced. That CPU also supported 3dnow.	2020-06-25 11:25:35 -07:00
Guillaume Chatelet	324cda2073	[Alignment][NFC] Conform X86, ARM and AArch64 TargetTransformInfo backends to the public API The main interface has been migrated to Align already but a few backends where broadening the type from Align to MaybeAlign. This patch makes sure all implementations conform to the public API. Differential Revision: https://reviews.llvm.org/D82465	2020-06-25 13:23:13 +00:00
Craig Topper	a5041987ed	[X86] Emit a reg-reg copy for fast isel of vector bitcasts. Previously we just updated a map and moved on. But it possible we cached known bits information with the vreg that can be used by another basic block. If the other basic block has a different view of the VT these known bits won't make sense. By emitting a copy we ensure we have different vregs before and after the bitcast. This prevents the known bits from being used with the wrong type. Differential Revision: https://reviews.llvm.org/D82517	2020-06-24 20:15:21 -07:00
Wang, Pengfei	b2eb1c5793	[X86] Fix a typo error. Summary: This will result opcode MULX32Hrm been emitted to MULX32Hrr. Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D82472	2020-06-25 10:06:27 +08:00
dfukalov	7ddee0922f	[NFCI][CostModel] Add const to Value*. Summary: Get back `const` partially lost in one of recent changes. Additionally specify explicit qualifiers in few places. Reviewers: samparker Reviewed By: samparker Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82383	2020-06-24 23:16:08 +03:00
Craig Topper	8172ed91f8	[X86] Speculatively fix to X86AvoidStoreForwardingBlocks not deference a machine mem operand if there isn't one present. Eric Christopher informed me that FastISel memcpy handling creates load/store instructions without mem operands. We should fix that, but I doubt that's the only case of missed mem operands so seems better to be defensive here. I don't have a test case yet, but I'll try to add one if i get a test from Eric.	2020-06-24 00:13:58 -07:00
Craig Topper	31c40f2d6b	[X86] Add mayLoad/mayStore flags to some X87 instructions that don't have isel patterns to infer them from. Should remove part of the differences in D81833 due to some some of these getting isel patterns.	2020-06-23 23:40:30 -07:00
Simon Pilgrim	e7e204a373	[X86][AVX] Attempt to lower v16i32/v16f32 shuffles with lowerShuffleAsRepeatedMaskAndLanePermute Avoids prematurely creating permps/permd variable shuffles. Fixes PR46249	2020-06-23 18:33:50 +01:00
Simon Pilgrim	4c257bb44e	[X86] truncateVectorWithPACK - fix outdated comment. NFC. We perform PACKSS/PACKUS on AVX512 targets if the calling function wants to.	2020-06-23 10:45:27 +01:00
Hans Wennborg	1357c06578	Revert "[X86][SSE] MatchVectorAllZeroTest - handle OR vector reductions" This caused a Chromium test to miscompile. See discussion on the Phabricator review. > This patch extends MatchVectorAllZeroTest to handle OR vector reduction patterns where the result is compared against zero. > > Fixes PR45378 > > Differential Revision: https://reviews.llvm.org/D81547 This reverts `057c9c7ee0`	2020-06-22 21:27:11 +02:00
stozer	539381da26	[DebugInfo] Update MachineInstr to help support variadic DBG_VALUE instructions Following on from this RFC[0] from a while back, this is the first patch towards implementing variadic debug values. This patch specifically adds a set of functions to MachineInstr for performing operations specific to debug values, and replacing uses of the more general functions where appropriate. The most prevalent of these is replacing getOperand(0) with getDebugOperand(0) for debug-value-specific code, as the operands corresponding to values will no longer be at index 0, but index 2 and upwards: getDebugOperand(x) == getOperand(x+2). Similar replacements have been added for the other operands, along with some helper functions to replace oft-repeated code and operate on a variable number of value operands. [0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139376.html<Paste> Differential Revision: https://reviews.llvm.org/D81852	2020-06-22 16:01:12 +01:00
Simon Pilgrim	48d1a2d6d0	[DAG] Add SimplifyMultipleUseDemandedVectorElts helper for SimplifyMultipleUseDemandedBits. NFCI. We have many cases where we call SimplifyMultipleUseDemandedBits and demand specific vector elements, but all the bits from them - this adds a helper wrapper to handle this.	2020-06-22 14:24:39 +01:00
David Green	730ecb63ec	[CGP] Convert phi types If a collection of interconnected phi nodes is only ever loaded, stored or bitcast then we can convert the whole set to the bitcast type, potentially helping to reduce the number of register moves needed as the phi's are passed across basic block boundaries. This has to be done in CodegenPrepare as it naturally straddles basic blocks. The alorithm just looks from phi nodes, looking at uses and operands for a collection of nodes that all together are bitcast between float and integer types. We record visited phi nodes to not have to process them more than once. The whole subgraph is then replaced with a new type. Loads and Stores are bitcast to the correct type, which should then be folded into the load/store, changing it's type. This comes up in the biquad testcase due to the way MVE needs to keep values in integer registers. I have also seen it come up from aarch64 partner example code, where a complicated set of sroa/inlining produced integer phis, where float would have been a better choice. I also added undef and extract element handling which increased the potency in some cases. This adds it with an option that defaults to off, and disabled for 32bit X86 due to potential issues around canonicalizing NaNs. Differential Revision: https://reviews.llvm.org/D81827	2020-06-21 15:54:17 +01:00
Simon Pilgrim	fb9f9dc318	[X86][SSE] Add SimplifyDemandedVectorEltsForTargetShuffle to handle target shuffle variable masks Pulled out from the ongoing work on D66004, currently we don't do a good job of simplifying variable shuffle masks that have already lowered to constant pool entries. This patch adds SimplifyDemandedVectorEltsForTargetShuffle (a custom x86 helper) to first try SimplifyDemandedVectorElts (which we already do) and then constant pool simplification to help mark undefined elements. To prevent lowering/combines infinite loops, we only handle basic constant pool loads instead of creating new BUILD_VECTOR nodes for lowering - e.g. we don't try to convert them to broadcast/vzext_load - there might be some benefit to this but if so I'd rather we come up with some way to reuse existing code than reimplement a lot of BUILD_VECTOR code. Differential Revision: https://reviews.llvm.org/D81791	2020-06-21 11:16:07 +01:00
Simon Pilgrim	89dcbdfcfd	[X86] combineSetCCMOVMSK - consistently use CmpBits variable. NFCI. The comparison value should be the same size - I've added an assert to be absolutely certain.	2020-06-20 12:35:24 +01:00
Simon Pilgrim	56a9332328	[X86][SSE] Fold MOVMSK(PCMPEQ(X,0)) != -1 -> !PTESTZ(X,X) allof patterns	2020-06-20 12:17:32 +01:00
Eric Christopher	cf23852587	[Target] As part of using inclusive language within the llvm project, migrate away from the use of blacklist and whitelist. This change affects an internal llvm command line option.	2020-06-20 00:06:39 -07:00
Craig Topper	c721bc081e	[X86] Correct the implementation of ud1(a.k.a. ud2b) instruction. We were missing the modrm byte this instruction has according to current Intel SDM. Experiments with gcc indicate that different modrm values are chosen based on 2 operands so I've added those as well. I think our previous implementation was based on an older behavior of binutils that has since been changed.	2020-06-19 23:57:48 -07:00
Craig Topper	0dda5e4ce2	[X86] Ignore bits 2:0 of the modrm byte when disassembling lfence, mfence, and sfence. These are documented as using modrm byte of 0xe8, 0xf0, and 0xf8 respectively. But hardware ignore bits 2:0. So 0xe9-0xef is treated the same as 0xe8. Similar for the other two. Fixing this required adding 8 new formats to the X86 instructions to convey this information. Could have gotten away with 3, but adding all 8 made for a more logical conversion from format to modrm encoding. I renumbered the format encodings to keep the register modrm formats grouped together.	2020-06-19 22:24:24 -07:00
Simon Pilgrim	c143db3b10	[X86][SSE] combineHorizontalPredicateResult - improve all_of(X == 0) for vXi64 on pre-SSE41 targets Without SSE41 we don't have the PCMPEQQ instruction, making cmp-with-zero reductions more complicated than necessary. We can compare as vXi32 (PCMPEQD) and tweak the MOVMSK comparison to test upper/lower DWORD comparisons. This pre-fixes something that occurs with null tests for vectors of (64-bit) pointers such as in PR35129.	2020-06-19 11:43:25 +01:00
Simon Pilgrim	cad2038700	[X86][SSE] combineSetCCMOVMSK - fold MOVMSK(SHUFFLE(X,u)) -> MOVMSK(X) If we're permuting ALL the elements of a single vector, then for allof/anyof MOVMSK tests we can avoid the shuffle entirely.	2020-06-19 10:57:52 +01:00
Matt Arsenault	7f8b2e1b91	GlobalISel: Pass LegalizerHelper to custom legalize callbacks This was passing in all the parameters needed to construct a LegalizerHelper in the custom legalization, when it's simpler to just pass in the existing helper. This is slightly more annoying to use in the common case where you don't need the legalizer helper, but we could add back the common parameters back in addition to the helper. I didn't propagate this to all the internal target changes that this logically implies, but did update a sample one for legalizeMinNumMaxNum. This is in preparation for moving AMDGPU load/store legalization entirely into custom lowering. The current set of legalization actions is really constraining and not really capable of expressing all the actions needed to legalize loads/stores. In particular there's no way to express when the memory access itself needs to change size vs. the result type. There's also a lot of redundancy since the same split/widen actions need to be applied in both vector and scalar cases. All of the sub-cases logically belong as steps in the legalizer helper, but it will be easier to consider everything at once in custom lowering.	2020-06-18 17:17:38 -04:00
Simon Pilgrim	fe0a85faf4	[X86][SSE] Fold MOVMSK(PCMPEQ(X,0)) == -1 -> PTESTZ(X,X) Allow combineSetCCMOVMSK to handle 'allof' X == 0 patterns to be replaced with PTESTZ This is a preliminary patch before properly handling PR35129	2020-06-18 15:38:32 +01:00
Kristof Beyls	832cfc7672	[IndirectThunks] Make generated MF structure as expected by all instruction selectors. This also enables running the AArch64 SLSHardening pass with GlobalISel, so add a test for that. Differential Revision: https://reviews.llvm.org/D81403	2020-06-18 06:44:53 +01:00
Alexandre Ganea	acb30f6856	[X86] For 32-bit targets, emit two-byte NOP when possible In order to support hot-patching, we need to make sure the first emitted instruction in a function is a two-byte+ op. This is already the case on x86_64, which seems to always emit two-byte+ ops. However on 32-bit targets this wasn't the case. PATCHABLE_OP now lowers to a XCHG AX, AX, (66 90) like MSVC does. However when targetting pentium3 (/arch:SSE) or i386 (/arch:IA32) targets, we generate MOV EDI,EDI (8B FF) like MSVC does. This is for compatiblity reasons with older tools that rely on this two byte pattern. Differential Revision: https://reviews.llvm.org/D81301	2020-06-17 13:44:38 -04:00
Alexandre Ganea	ad879b31f0	[X86] Change signature of EmitNops. NFC. This is to support https://reviews.llvm.org/D81301.	2020-06-17 13:44:37 -04:00
Simon Pilgrim	9d11822f09	Fix comment typo - Uexpected -> Unexpected. NFC.	2020-06-16 12:14:51 +01:00
Simon Pilgrim	379c5b31f7	[X86][SSE] combineVectorSizedSetCCEquality - remove unused AVX2 MOVMSK path. NFCI. If PTEST is not available, then we're guaranteed to be performing a 128-bit vector comparison using MOVMSK(PCMPEQB(v16i8)).	2020-06-16 10:07:41 +01:00
Simon Pilgrim	057c9c7ee0	[X86][SSE] MatchVectorAllZeroTest - handle OR vector reductions This patch extends MatchVectorAllZeroTest to handle OR vector reduction patterns where the result is compared against zero. Fixes PR45378 Differential Revision: https://reviews.llvm.org/D81547	2020-06-16 09:42:34 +01:00
Simon Pilgrim	65c3fa849b	[X86][SSE] combineVectorSizedSetCCEquality - move single Subtarget.hasAVX() use into condition. NFC. We already have Subtarget.hasSSE2() and Subtarget.useAVX512Regs() in the condition - seems to be a legacy from when we had multiple uses.	2020-06-16 09:42:33 +01:00
Craig Topper	255d5dbae1	[X86] Add support for inline assembly 'x' constraint for i128. Limiting to x86-64 since that's when __int128 is legal in clang. Differential Revision: https://reviews.llvm.org/D81817	2020-06-15 19:34:02 -07:00
Craig Topper	d72cb4ce21	Recommit "[X86] Separate imm from relocImm handling." Fix the copy/paste mistake that caused it to fail previously	2020-06-15 10:59:43 -07:00
Craig Topper	ad1c46c3c0	[X86] Remove printanymem/printopaquemem from the InstPrinters. Just tell tablegen to printMemReference directly. NFC Most of the wrappers exist to print the memory size in Intel syntax and then call the printMemReference. But printanymem/printopaquemem don't print anything extra in Intel syntax so just drop them.	2020-06-15 09:46:06 -07:00
Simon Pilgrim	cb8a0ba829	[X86][SSE] Add LowerVectorAllZero helper for checking if all bits of a vector are zero. Pull the lowering code out of LowerVectorAllZeroTest (and rename it MatchVectorAllZeroTest). We should be able to reuse this in combineVectorSizedSetCCEquality as well. Another cleanup to simplify D81547.	2020-06-15 15:54:38 +01:00
Simon Pilgrim	ae33cbc494	[X86][SSE] LowerVectorAllZeroTest - add support for >256-bit vectors Reduce by splitting the vector until we reach the target size for PTEST/MOVMSK_PCMPEQ. There might be some cases where AVX512 can perform this with 512-bit vectors but so far I haven't encountered any such pattern that reaches LowerVectorAllZeroTest. Prep work for D81547	2020-06-15 15:30:24 +01:00
Hans Wennborg	f47a776628	Revert "[X86] Separate imm from relocImm handling." > relocImm was a complexPattern that handled both ConstantSDNode > and X86Wrapper. But it was only applied selectively because using > it would cause patterns to be not importable into FastISel or > GlobalISel. So it only got applied to flag setting instructions, > stores, RMW arithmetic instructions, and rotates. > > Most of the test changes are a result of making patterns available > to GlobalISel or FastISel. The absolute-cmp.ll change is due to > this fixing a pattern ordering issue to make an absolute symbol > match to an 8-bit immediate before trying a 32-bit immediate. > > I tried to use PatFrags to reduce the repetition, but I was getting > errors from TableGen. This caused "Invalid EmitNode" assertions, see the llvm-commits thread for discussion.	2020-06-15 16:14:59 +02:00
Simon Pilgrim	0b806549b5	[X86][SSE] LowerVectorAllZeroTest - remove unnecessary bitcasts matchScalarReduction should return all its source vectors with the same type, so we can safely perform the OR reduction with the original type. So we just need to bitcast for PTEST/PCMPEQB with the final reduced vector.	2020-06-15 15:13:13 +01:00
Sam Parker	2596da3174	[CostModel] getCFInstrCost in getUserCost. Have BasicTTI call the base implementation so that both agree on the default behaviour, which the default being a cost of '1'. This has required an X86 specific implementation as it seems to be very reliant on those instructions being free. Changes are also made to AMDGPU so that their implementations distinguish between cost kinds, so that the unrolling isn't affected. PowerPC also has its own implementation to prevent changes to the reg-usage vectorizer test. The cost model test changes now reflect that ret instructions are not generally free. Differential Revision: https://reviews.llvm.org/D79164	2020-06-15 09:28:46 +01:00
Nikita Popov	7cac7e0cfc	[IR] Prefer hasFnAttribute() where possible (NFC) When checking for an enum function attribute, use hasFnAttribute() rather than hasAttribute() at FunctionIndex, because it is significantly faster (and more concise to boot).	2020-06-15 09:30:35 +02:00
Simon Pilgrim	3d8149c2a1	[X86][SSE] Fold BITOP(MOVMSK(X),MOVMSK(Y)) -> MOVMSK(BITOP(X,Y)) Reduce XMM->GPR traffic by performing bitops on the vectors, and using a single MOVMSK call. This requires us to use vectors of the same size and element width, but we can mix fp/int type equivalents with suitable bitcasting.	2020-06-14 21:37:58 +01:00
Simon Pilgrim	e0cff30c17	[X86][SSE] LowerVectorAllZeroTest - add support for pre-SSE41 targets Even without PTEST, we can still efficiently perform an OR reduction as PMOVMSKB(PCMPEQB(X,0)) == 0, avoiding xmm->gpr extractions.	2020-06-14 13:41:56 +01:00
Craig Topper	bfd12c76eb	[X86] Add mayLoad flag to FARCALL*m/FARJMP memory instrutions. Add 'm' to the end of FARJMP64/FARCALL64 instruction names. We never codegen them so this doesn't matter in practice. But sometimes someone comes along and tries to use these flags for something else. LIke the Load Value Inject inline assembly handling.	2020-06-13 15:40:51 -07:00
Craig Topper	0cbe713c69	[X86] Automatically harden inline assembly RET instructions against Load Value Injection (LVI) Previously, the X86AsmParser would issue a warning whenever a ret instruction is encountered. This patch changes the behavior to automatically transform each ret instruction in an inline assembly stream into: shlq $0, (%rsp) lfence ret which is secure, according to https://software.intel.com/security-software-guidance/insights/deep-dive-load-value-injection#specialinstructions. Patch by Scott Constable with some minor changes by Craig Topper.	2020-06-13 15:16:05 -07:00
Craig Topper	cb5072d187	[X86] Teach combineBitcastvxi1 to prefer movmsk on avx512 in more cases If the input to the bitcast is a sign bit test, it makes sense to directly use vpmovmskb or vmovmskps/pd. This removes the need to copy the sign bits to a k-register and then to a GPR. Fixes PR46200. Differential Revision: https://reviews.llvm.org/D81327	2020-06-13 14:50:13 -07:00
Craig Topper	6b4b660174	[X86] Move -x86-use-vzeroupper command line flag into runOnMachineFunction for the pass itself rather than the pass pipeline construction This pass has no dependencies on other passes so conditionally including it in the pipeline doens't do much. Just move it the pass itself to keep it isolated.	2020-06-13 14:42:41 -07:00
Craig Topper	93264a2e4f	[X86] Enable the EVEX->VEX compression pass at -O0. A lot of what EVEX->VEX does is equivalent to what the prioritization in the assembly parser does. When an AVX mnemonic is used without any EVEX features or XMM16-31, the parser will pick the VEX encoding. Since codegen doesn't go through the parser, we should also use VEX instructions when we can so that the code coming out of integrated assembler matches what you'd get from outputing an assembly listing and parsing it. The pass early outs if AVX isn't enabled and uses TSFlags to check for EVEX instructions before doing the more costly table lookups. Hopefully that's enough to keep this from impacting -O0 compile times.	2020-06-13 12:29:04 -07:00
Craig Topper	8885a7640b	[X86] Separate imm from relocImm handling. relocImm was a complexPattern that handled both ConstantSDNode and X86Wrapper. But it was only applied selectively because using it would cause patterns to be not importable into FastISel or GlobalISel. So it only got applied to flag setting instructions, stores, RMW arithmetic instructions, and rotates. Most of the test changes are a result of making patterns available to GlobalISel or FastISel. The absolute-cmp.ll change is due to this fixing a pattern ordering issue to make an absolute symbol match to an 8-bit immediate before trying a 32-bit immediate. I tried to use PatFrags to reduce the repetition, but I was getting errors from TableGen.	2020-06-13 11:29:28 -07:00
Simon Pilgrim	8d30945ab9	[X86][SSE] combineX86ShuffleChain - combine INSERT_VECTOR_ELT patterns to INSERTPS Noticed while trying to cleanup D66004 - if a shuffle operand came from a scalar, we're better off using INSERTPS vs UNPCKLPS as this is more likely to load fold later on. It also matches our existing BUILD_VECTOR lowering. We can extend this to other PINSRB/D/Q/W cases in the future as the need arises.	2020-06-12 11:59:01 +01:00
Eric Christopher	3ff8f61930	Tidy up unsigned -> Register fixups.	2020-06-11 16:50:58 -07:00
Eric Christopher	cb21b16822	Add a diagnostic string to an assert.	2020-06-11 16:34:55 -07:00
Craig Topper	8fa3e8fa14	[X86] Force VIA PadLock crypto instructions to emit a 0xF3 prefix when they encode to match what GNU as does. The spec for these says they need 0xf3 but also mentions REP before the mnemonic. But I don't think its fair to users to make them write REP first. And gas doesn't make them. objdump seems to disassemble with or without the prefix and just prints any 0xf3 as REP.	2020-06-11 12:59:21 -07:00
Craig Topper	269d843720	[X86] Replace TB with PS on instructions that are documented in the SDM with 'NP' 'NP' means that the instruction is not recognized with a 66, F2 or F3 prefix. It will either #UD or decode to a different instruction. All of the cases are here should fall into the #UD variety since we should be detecting the collision with other instructions when we build the disassembler tables.	2020-06-11 12:20:29 -07:00
Simon Pilgrim	7706c7af74	[X86] Fold vXi1 OR(KSHIFTL(X,NumElts/2),Y) -> KUNPCK Convert shift+or bool vector patterns into CONCAT_VECTORS if we know this will be lowered to KUNPCK (which requires 16+ vector elements). Fixes PR32547	2020-06-11 15:47:20 +01:00
Simon Pilgrim	5cca9828ff	[X86][AVX512] Avoid bitcasts between scalar and vXi1 bool vectors AVX512 mask types are often bitcasted to scalar integers for various ops before being bitcast back to be used as a predicate. In many cases we can avoid these KMASK<->GPR transfers and perform equivalent operations on the mask unit. If the destination mask type is legal, and we can confirm that the scalar op originally came from a mask/vector/float/double type then we should try to avoid the scalar entirely. This avoids some codegen issues noticed while working on PTEST/MOVMSK improvements. Partially fixes PR32547 - we don't create a KUNPCK yet, but OR(X,KSHIFTL(Y)) can be handled in a separate patch. Differential Revision: https://reviews.llvm.org/D81548	2020-06-11 10:22:55 +01:00
Kristof Beyls	994748770c	[NFC] Refactor ThunkInserter to make it available for all targets. By moving target-independent code from llvm/lib/Target/X86/X86IndirectThunks.cpp to llvm/include/llvm/CodeGen/IndirectThunks.h Differential Revision: https://reviews.llvm.org/D81401	2020-06-11 08:38:44 +01:00
Craig Topper	08b275f62e	[X86] Remove unnecessary In64BitMode predicate from TEST64ri32. NFC This appears to have been added when In64BitMode was added to a bunch of instructions that don't have register operands. When an instruction uses a register the parser will prevent a 64-bit register from being parsed on a 32-bit target. But with only memory and immediate operands this doesn't happen. TEST64ri32 does have a register operand so the issue the predicate was supposed to fix doesn't apply.	2020-06-11 00:33:55 -07:00
Craig Topper	1385ab356a	[X86] Use X86AS enum constants to replace hardcoded numbers in more places. NFC	2020-06-10 22:31:21 -07:00
Scott Constable	7e06cf0011	[X86] Add an Unoptimized Load Value Injection (LVI) Load Hardening Pass @nikic raised an issue on D75936 that the added complexity to the O0 pipeline was causing noticeable slowdowns for `-O0` builds. This patch addresses the issue by adding a pass with equal security properties, but without any optimizations (and more importantly, without the need for expensive analysis dependencies). Reviewers: nikic, craig.topper, mattdr Reviewed By: craig.topper, mattdr Differential Revision: https://reviews.llvm.org/D80964	2020-06-10 15:31:47 -07:00
Craig Topper	1c3dd7bf37	[X86] Call LowerADDRSPACECAST directly from ReplaceNodeResults to avoid repeating identical code. NFC	2020-06-10 14:39:02 -07:00
Craig Topper	c5bbdea9e1	[X86] Enable masked GPR broadcasts to be formed even if the broadcast has more than one use. This is a cheap instruction. It's better to repeat it than to do two separate operations. There are probably more cases like this, but this one was reported as a regression in our internal benchmarking.	2020-06-10 12:42:44 -07:00
serge-sans-paille	60fe25cb0c	Fix dynamic probing scheme If we probe after each static stack allocation, we need to probe before each dynamic stack allocation. Provide a scheme to describe the possible scenario. Thanks a lot to @jonpa for motivating this fix. Differential Revision: https://reviews.llvm.org/D81067	2020-06-10 21:37:09 +02:00
Craig Topper	324e13668e	[X86] Split imm handling out of selectMOV64Imm32 and add a separate isel pattern. This makes the pattern available to global isel.	2020-06-10 11:12:36 -07:00
Christopher Tetreault	9044027e45	[SVE] Eliminate calls to default-false VectorType::get() from X86 Reviewers: efriedma, craig.topper, RKSimon, samparker, kmclaughlin, david-arm Reviewed By: david-arm Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81520	2020-06-10 09:56:00 -07:00
Sam Parker	fa8bff0cd1	[CostModel] Unify getArithmeticInstrCost Add the remaining arithmetic opcodes into the generic implementation of getUserCost and then call this from getInstructionThroughput. Most of the backends have been modified to return the base implementation for cost kinds other RecipThroughput. The outlier here is AMDGPU which already uses getArithmeticInstrCost for all the cost kinds. This change means that most of the opcodes can be removed from that backends implementation of getUserCost. Differential Revision: https://reviews.llvm.org/D80992	2020-06-10 09:08:45 +01:00
Sanjay Patel	6f6d2d2383	[x86] refine conditions for immediate hoisting to save code-size As shown in PR46237: https://bugs.llvm.org/show_bug.cgi?id=46237 The size-savings win for hoisting an 8-bit ALU immediate (intentionally excluding store constants) requires extreme conditions; it may not even be possible when including REX prefix bytes on x86-64. I did draft a version of this patch that included use counts after the loop, but I suspect that accounting is not working as expected. I think that is because the number of constant uses are changing as we select instructions (for example as we transform shl/add into LEA). Differential Revision: https://reviews.llvm.org/D81468	2020-06-09 15:44:55 -04:00
Sam Parker	37289615c0	[NFCI][CostModel] Unify getCmpSelInstrCost Add cases for icmp, fcmp and select into the switch statement of the generic getUserCost implementation with getInstructionThroughput then calling into it. The BasicTTI and backend implementations have be set to return a default value (1) when a cost other than throughput is being queried. Differential Revision: https://reviews.llvm.org/D80550	2020-06-09 07:41:22 +01:00
Craig Topper	2328cab16c	[X86] Prevent LowerSELECT from causing suboptimal codegen for __builtin_ffs(X) - 1. LowerSELECT sees the CMP with 0 and wants to use a trick with SUB and SBB. But we can use the flags from the BSF/TZCNT. Fixes PR46203. Differential Revision: https://reviews.llvm.org/D81312	2020-06-08 11:46:56 -07:00
Sam Parker	5b5e78ad2b	[CostModel] Follow-up to buildbot fix Adding type checks into the other backends that call getTypeLegalizationCost. Differential Revision: https://reviews.llvm.org/D80984	2020-06-08 15:26:25 +01:00
Guillaume Chatelet	1778564f91	[Alignment][NFC] Migrate the rest of backends Summary: This is a followup on D81196 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81278	2020-06-08 07:17:20 +00:00
Craig Topper	b0eea7213b	[X86] Support load shrinking for strict fp nodes in combineCVTPH2PS	2020-06-07 21:09:55 -07:00
Craig Topper	e3aece06cf	[X86] Improve (vzmovl (insert_subvector)) combine to handle a bitcast between the vzmovl and insert This combine tries shrink a vzmovl if its input is an insert_subvector. This patch improves it to turn (vzmovl (bitcast (insert_subvector))) into (insert_subvector (vzmovl (bitcast))) potentially allowing the bitcast to be folded with a load.	2020-06-07 19:31:06 -07:00
Craig Topper	22987babd5	[X86] Teach combineCVTP2I_CVTTP2I to handle STRICT_CVTTP2SI/STRICT_CVTTP2UI Allows us to shrink 128-bit simple load to enable folding for v2f32->v2i64 vcvttps2qq/vcvttps2uqq.	2020-06-07 19:31:06 -07:00
Craig Topper	a135c4a2cf	[X86] Don't scalarize v2f32->v2i64 strict_fp_to_sint/uint with avx512dq and not avx512vl. We can pad the v2f32 with 0s up to v8f32 and use a v8f32->v8i64 operation. This is what we end up with on non-strict nodes except we don't pad with 0s since we don't care about exceptions.	2020-06-07 14:45:26 -07:00
Simon Pilgrim	ce677ef532	[X86][AVX2] combineSetCCMOVMSK - handle all_of patterns for PMOVMSKB(PACKSSBW(LO(X), HI(X))) In the sign splat case, we can fold PMOVMSKB(PACKSSBW(LO(X), HI(X))) -> PMOVMSKB(BITCAST_v32i8(X)) without introducing a signmask + comparison (which unlike for any_of won't fold into a single TEST).	2020-06-07 21:08:53 +01:00
Simon Pilgrim	3a28ae091b	[X86][SSE] combineSetCCMOVMSK - add initial support for allof patterns. Handle MOVMSK 'allof' comparisons (X86ISD::SUB X, AllBitsMask) as well as 'anyof' patterns. This allows us to handle these patterns in the MOVMSK(BITCAST(X)) pattern to fix PR37087.	2020-06-07 16:10:13 +01:00
Simon Pilgrim	0741b75ad5	[X86][SSE] Attempt to widen MOVMSK vector input if the signbits are splatted. As shown on PR37087, if we have a MOVMSK(BICAST(X)) from a wider vector, then by using MOVMSK from the wider type (32/64-bit elements) we can improve the chances of further combines with SimplifyDemandedBits/Elts and on some targets (skylake) can be more efficient.	2020-06-07 11:44:43 +01:00
Craig Topper	095dceefa3	[X86] Correct some isel patterns for v1i1 KNOT/KANDN/KXNOR. The KNOT pattern was missing. The others were looking for a v1i1 -1 instead of a vector all ones.	2020-06-06 17:25:56 -07:00
Simon Pilgrim	e5e33f23c7	CFG.h - reduce includes to forward declarations. NFC. Remove unnecessary includes from CFG.cpp. Fix implicit include dependency in X86WinEHState.cpp.	2020-06-06 15:06:42 +01:00
Craig Topper	3408dcbdf0	[X86] Fold undef elts to 0 in getTargetVShiftByConstNode. Similar to D81212. Differential Revision: https://reviews.llvm.org/D81292	2020-06-05 13:39:40 -07:00
Craig Topper	7c9a89fed8	[X86] Teach combineVectorShiftImm to constant fold undef elements to 0 not undef. Shifts are supposed to always shift in zeros or sign bits regardless of their inputs. It's possible the input value may have been replaced with undef by SimplifyDemandedBits, but the shift in zeros are still demanded. This issue was reported to me by ispc from 10.0. Unfortunately their failing test does not fail on trunk. Seems to be because the shl is optimized out earlier now and doesn't become VSHLI. ispc bug https://github.com/ispc/ispc/issues/1771 Differential Revision: https://reviews.llvm.org/D81212	2020-06-05 11:29:55 -07:00
Sanjay Patel	e50059f6b6	[x86] form reduction intrinsics from vectorizers instead of raw IR Motivating examples are seen in the PhaseOrdering tests based on: https://bugs.llvm.org/show_bug.cgi?id=43953#c2 - if we have intrinsics there, some pass can fold them. The intrinsics are still named "experimental" at this point, but if there is no fallout from this patch, that will be a good indicator that it is safe to finalize them. Differential Revision: https://reviews.llvm.org/D80867	2020-06-05 12:38:49 -04:00
Simon Pilgrim	d194ff31cf	[X86][SSE] Simplify MOVMSK patterns based on comparison An initial patch adding combineSetCCMOVMSK to simplify MOVMSK and its vector input based on the comparison of the MOVMSK result. This first stage just adds support for some simple MOVMSK(PACKSSBW()) cases where we remove the PACKSS if we're comparing ne/eq zero (any_of patterns), allowing us to directly compare against the v8i16 source vector(s) bitcasted to v16i8, with suitable masking to take into account of which signbits are valid. Future combines could peek through further PACKSS, target shuffles, handle all_of patterns (ne/eq -1), optimize to a PTEST op, etc. Differential Revision: https://reviews.llvm.org/D81171	2020-06-05 16:53:22 +01:00
Sam Parker	9303546b42	[CostModel] Unify getMemoryOpCost Use getMemoryOpCost from the generic implementation of getUserCost and have getInstructionThroughput return the result of that for loads and stores. This also means that the X86 implementation of getUserCost can be removed with the functionality folded into its getMemoryOpCost. Differential Revision: https://reviews.llvm.org/D80984	2020-06-05 10:13:38 +01:00
Craig Topper	7eff1a7136	[X86] Remove (V)MOVHPDrm patterns that involve bitcast+scalar_to_vec+loadi64. I think these are left over from when we used to type legalize v2f32 loads using bitcast+scalar_to_vec+loadi64 on 64-bit targets. These days we use loadf64. If this becomes a problem a better solution would be a DAG combine to turn it into scalar_to_vec+loadf64.	2020-06-04 00:31:47 -07:00
Eric Christopher	21a7b8a77d	consitfy and auto -> auto * a few places to clean up uses.	2020-06-03 17:00:08 -07:00
Eric Christopher	053fce9a02	Fix typo in filename comment.	2020-06-03 16:01:36 -07:00
Simon Pilgrim	ea80b40669	[DAG] SimplifyDemandedBits - peek through SHL if we only demand sign bits. If we're only demanding the (shifted) sign bits of the shift source value, then we can use the value directly. This handles SimplifyDemandedBits/SimplifyMultipleUseDemandedBits for both ISD::SHL and X86ISD::VSHLI. Differential Revision: https://reviews.llvm.org/D80869	2020-06-03 16:11:54 +01:00
Simon Pilgrim	d9d28b3559	[X86][AVX] getFauxShuffleMask - fix sub vector size check in INSERT_SUBVECTOR(X,SHUFFLE(Y,Z)) We were bailing on subvector shuffle inputs that were smaller than the subvector type instead of larger than it. Fixes PR46178	2020-06-03 15:26:22 +01:00
Craig Topper	bb1d8bf270	[X86] Add CLWB to Tremont CPU. Remove CLDEMOTE, MOVDIRI, MOVDIR64B, and WAITPKG to match gcc.	2020-06-02 22:38:51 -07:00
Guozhi Wei	587af86f1d	[X86] Add a flag to guard the wide load As shown in http://lists.llvm.org/pipermail/llvm-dev/2020-May/141854.html, widen load can also cause stall. Add a flag to guard the widening code, so users can disable it and evaluate its performance impact. Differential Revision: https://reviews.llvm.org/D80943	2020-06-02 16:16:13 -07:00
Craig Topper	961c1b5f72	[X86] Remove DeleteNode calls from PreprocessISelDAG. Rely on the RemoveDeadNodes call at the end. Add a MadeChange flag so we don't call RemoveDeadNodes unless something changed.	2020-06-02 14:10:20 -07:00
Craig Topper	ca4bd052f6	[X86] Cleanup inconsistencies in our zext/sext vector patterns. -Fix one place where we had a X86vzload64 but should have had X86vzload32. -Make sure all patterns that have scalar_to_vector+loadi64 also have scalar_to_vector+f64 to match 32-bit codegen. -Add some bitcasts that were missing from patterns. -Make sure that if we have a scalar_to_vector+load pattern we also have a vzload pattern. We probably need some better canonicalization to avoid having so many patterns.	2020-06-02 13:50:16 -07:00
serge-sans-paille	6c733f5a13	Use Pseudo Instruction to carry stack probing information Instead of using a fake call and metadata to temporarily represent a probed static alloca, use a pseudo instruction. This is inspired by the SystemZ approach proposed in https://reviews.llvm.org/D78717. Differential Revision: https://reviews.llvm.org/D80641	2020-06-02 16:14:06 +02:00
Craig Topper	e51d5bc7a4	[X86] Fix a few recursivelyDeleteUnusedNodes calls that were trying to delete nodes before their user was really gone. We looked through a truncate to get to the load. So we should be deleting the truncate first. There is a check that the node is really unused before deleting so this didn't cause a functional issue.	2020-06-01 21:55:13 -07:00
hsmahesha	0ed2c04636	[AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes Summary: While clustering mem ops, AMDGPU target needs to consider number of clustered bytes to decide on max number of mem ops that can be clustered. This patch adds support to pass number of clustered bytes to target mem ops clustering logic. Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar Reviewed By: foad Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80545	2020-06-01 22:52:34 +05:30
Craig Topper	8abe830093	[X86] Rewrite how X86PartialReduction finds candidates to consider optimizing. Previously we walked the users of any vector binop looking for more binops with the same opcode or phis that eventually ended up in a reduction. While this is simple it also means visiting the same nodes many times since we'll do a forward walk for each BinaryOperator in the chain. It was also far more general than what we have tests for or expect to see. This patch replaces the algorithm with a new method that starts at extract elements looking for a horizontal reduction. Once we find a reduction we walk through backwards through phis and adds to collect leaves that we can consider for rewriting. We only consider single use adds and phis. Except for a special case if the Add is used by a phi that forms a loop back to the Add. Including other single use Adds to support unrolled loops. Ultimately, I want to narrow the Adds, Phis, and final reduction based on the partial reduction we're doing. I still haven't figured out exactly what that looks like yet. But restricting the types of graphs we expect to handle seemed like a good first step. As does having all the leaves and the reduction at once. Differential Revision: https://reviews.llvm.org/D79971	2020-05-31 12:53:01 -07:00
Simon Pilgrim	22e50833e9	[X86][AVX] Reduce unary target shuffles width if the upper elements aren't demanded.	2020-05-31 20:19:24 +01:00
Simon Pilgrim	8f2f613a6e	[X86][AVX] combineX86ShufflesRecursively - peekThroughOneUseBitcasts subvector before widening. This matches what we do for the full sized vector ops at the start of combineX86ShufflesRecursively, and helps getFauxShuffleMask extract more INSERT_SUBVECTOR patterns.	2020-05-31 19:58:33 +01:00
Simon Pilgrim	4a2673d79f	[X86][AVX] Add SimplifyMultipleUseDemandedBits VBROADCAST handling to SimplifyDemandedVectorElts. As suggested on D79987.	2020-05-31 14:20:15 +01:00
Simon Pilgrim	f046326847	[X86] getFauxShuffleMask/getTargetShuffleInputs - make SelectionDAG const (PR45974). Try to prevent future node creation issues (as detailed in PR45974) by making the SelectionDAG reference const, so it can still be used for analysis, but not node creation.	2020-05-31 13:51:01 +01:00
Simon Pilgrim	d33ba1aa0b	[X86][AVX] getFauxShuffleMask - don't widen shuffle inputs from INSERT_SUBVECTOR(X,SHUFFLE(Y,Z)) Don't create nodes on the fly when decoding INSERT_SUBVECTOR as faux shuffles.	2020-05-31 13:19:18 +01:00
Simon Pilgrim	45ebe38ffc	[X86][AVX] Pad small shuffle inputs in combineX86ShufflesRecursively As detailed on PR45974 and D79987, getFauxShuffleMask is creating nodes on the fly to create shuffles with inputs the same size as the result, causing problems for hasOneUse() checks in later simplification stages. Currently only combineX86ShufflesRecursively benefits from these widened inputs so I've begun moving the functionality there, and out of getFauxShuffleMask. This allows us to remove the widening from VBROADCAST and EXTEND faux shuffle cases. This just leaves the INSERT_SUBVECTOR case in getFauxShuffleMask still creating nodes, which will require more extensive refactoring.	2020-05-31 11:43:47 +01:00
Craig Topper	dbda87186e	[X86] Remove unneeded bitconverts from isel patterns. NFC The types already match so TableGen is removing the bitconvert.	2020-05-30 20:24:52 -07:00
Craig Topper	7c3b8077cc	[X86] Add DAG combine to turn (v2i64 (scalar_to_vector (i64 (bitconvert (mmx))))) to MOVQ2DQ. Remove unneeded isel patterns. We already had a DAG combine for (mmx (bitconvert (i64 (extractelement v2i64)))) to MOVDQ2Q. Remove patterns for MMX_MOVQ2DQrr/MMX_MOVDQ2Qrr that use scalar_to_vector/extractelement involving i64 scalar type with v2i64 and x86mmx.	2020-05-30 19:47:08 -07:00
Craig Topper	af1accdd86	[X86] Teach computeKnownBitsForTargetNode that the upper half of X86ISD::MOVQ2DQ is all zero.	2020-05-30 19:47:07 -07:00
Craig Topper	1ecf39d607	[X86] Fix a place where we created MOVQ2DQ with a DstVT other than v2i64. The type profile and isel pattern have this type declared as being MVT::v2i64. But isel skips the explicit type check due to the type profile.	2020-05-30 19:47:07 -07:00
Craig Topper	8857822452	[X86] Move MMX_SET0 pattern into the instruction definition. NFC	2020-05-30 19:47:07 -07:00
Craig Topper	07e8a780d8	[X86] Add pseudo instructions to use MULX with a single destination when the low result isn't used. The instruction is defined to only produce high result if both destinations are the same. We can exploit this to avoid unnecessarily clobbering a register. In order to hide this from register allocation we use a pseudo instruction and expand the result during MCInst creation. Differential Revision: https://reviews.llvm.org/D80500	2020-05-30 16:01:01 -07:00
Craig Topper	16976cb925	[X86] Minor cleanups to addShuffleComments in X86MCInstPrinter.cpp. NFCI -Replace some ifs that should be impossible with asserts. -Use X86::AddrDisp and X86::AddrNumOperands to make code more readable -Use X86II::isKMasked/isKMergeMasked to do some operand skipping to remove or simplify switches	2020-05-30 13:51:48 -07:00
Craig Topper	3eb430d598	[X86] Factor constant pool comment printing out of the switch in X86AsmPrinter::emitInstruction. NFC Pull the verbose asm check out of the cases and move it up to the call of the new function.	2020-05-30 13:51:37 -07:00
Christopher Tetreault	5a99ec10f5	[SVE] Eliminate calls to default-false VectorType::get() from X86 Reviewers: efriedma, sdesmalen, c-rhodes, craig.topper Reviewed By: craig.topper Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80331	2020-05-29 16:16:07 -07:00
Zequan Wu	80e107ccd0	Add NoMerge MIFlag to avoid MIR branch folding Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given Differential Revision: https://reviews.llvm.org/D79537	2020-05-29 12:31:06 -07:00
Craig Topper	5c7aca6a4c	[X86] Ignore large code model in X86FastISel::X86MaterializeFP in 32-bit mode Large code model doesn't mean anything to 32-bit mode. But nothing prevents it from being set. Ignore to avoid generating 64-bit mode only instructions. Differential Revision: https://reviews.llvm.org/D80768	2020-05-29 10:39:08 -07:00
Craig Topper	87e4ad4d5c	[X86] Remove isel pattern for MMX_X86movdq2q+simple_load. Replace with DAG combine to to loadmmx. Only 64-bit bits will be loaded, not the whole 128 bits. We can just combine it to plain mmx load. This has the side effect of enabling isel load folding for it. This part of my desire to get rid of isel patterns that shrink loads.	2020-05-29 10:20:03 -07:00
Craig Topper	17ed6dcb0c	[X86] Remove MMX isel patterns containing (x86mmx (scalar_to_vector (i32))). I don't think we can make such a node. I don't think x86_mmx is considered a vector for the check in getNode.	2020-05-28 23:42:03 -07:00
Craig Topper	8c050070fb	[X86] Fix a nullptr dereference in X86Subtarget::classifyLocalReference when compiling with -mcmodel=medium -fpic and using a constant pool LowerConstantPool passes a nullptr into classifyLocalReference. The medium code model handling for PIC will try to deference it using isa. This patch switches to isa_and_nonnull. Differential Revision: https://reviews.llvm.org/D80763	2020-05-28 17:20:42 -07:00
Jean-Michel Gorius	f5192d7fb7	[x86] Propagate memory operands during call frame optimization Summary: Propagate memory operands when folding load instructions into instructions that directly operate on memory. The original revision has been split. See D80140 for the other part of the changes. Reviewers: craig.topper, rnk, lebedev.ri, efriedma Reviewed By: craig.topper Subscribers: lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80062	2020-05-28 17:45:53 +02:00
Simon Pilgrim	1ddac9563d	[X86][SSE] Peek though MOVMSK source sign bits using SimplifyMultipleUseDemandedBits Allows SimplifyDemandedBitsForTargetNode to peek through multi-use ops where MOVMSK only demands the signbit of each vector element.	2020-05-28 13:42:24 +01:00
Craig Topper	8e7e6a8d6b	[X86] Restore selection of MULX on BMI2 targets. Looking back over gcc and icc behavior it looks like icc does use mulx32 on 32-bit targets and mulx64 on 64-bit targets. It's also used when dividing i32 by constant on 32-bit targets and i64 by constant on 64-bit targets. gcc uses it multiplies producing a 64 bit result on 32-bit targets and 128-bit results on a 64-bit target. gcc does not appear to use it for division by constant. After this patch clang is closer to the icc behavior. This basically reverts `d1c61861dd`, but there were no strong feelings at the time. Fixes PR45518. Differential Revision: https://reviews.llvm.org/D80498	2020-05-27 12:01:18 -07:00
Fangrui Song	5b4cd2d4c4	[X86] Assemble movzb 1280(%rbx, %r12), %r12 after D80608 ffmpeg/libavcodec/x86/h264_cabac.c inline assembly may produce movzb 1280(%rbx, %r12), %r12 After D80608, llvm-mc errors: error: unknown use of instruction mnemonic without a size suffix	2020-05-27 09:55:55 -07:00
Guillaume Chatelet	5b84ee4f61	[Alignment] Fix misaligned interleaved loads Summary: Tentatively fixing https://bugs.llvm.org/show_bug.cgi?id=45957 Reviewers: craig.topper, nlopes Subscribers: hiraditya, llvm-commits, RKSimon, jdoerfert, efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D80276	2020-05-27 12:12:22 +00:00
Simon Pilgrim	410667f1b7	[X86][SSE] Convert PTEST to MOVMSK for allsign bits vector results If we are using PTEST to check 'allsign bits' vector elements we can use MOVMSK to extract the signbits directly and perform the comparison on the scalar value. For vXi16 cases, as we don't have a MOVMSK for this type, we must mask each signbit out of a PMOVMSKB v2Xi8 result, which folds into the TEST comparison. If this allows us to remove a vector op (via the SimplifyMultipleUseDemandedBits call) this is consistently faster than a PTEST (https://godbolt.org/z/ziJUst). I'm investigating whether we ever get regressions without the SimplifyMultipleUseDemandedBits call, even if this means we don't remove a vector op, but that has exposed some other poor codegen issues that I'm still investigating and would have to wait for a later patch. Suggested on PR42035 to avoid unnecessary ashr(x,bw-1)/pcmpgt(0,x) sign splat patterns feeding into ptest. Differential Revision: https://reviews.llvm.org/D80563	2020-05-27 11:06:16 +01:00
Craig Topper	a1dfd6d828	[X86] Add helper function to reduce some code duplication when shrinking a vector load to a vzext_load. There's more code for calling CombineTo and replacing the nodes that I'd like to share, but its complicated by the getNode call in the middle that needs to be specific to each opcode. While there are also make sure we recursively delete the load we're replacing. It eventually gets removed by a RemoveDeadNodes call at the end of DAG combine, but we should be more eager about it. We were inconsistently doing this in some places but not all.	2020-05-27 01:32:13 -07:00
Craig Topper	84cf8ed8fd	[X86] Lower sse_cmp_ss/sse2_cmp_sd intrinsics to X86ISD::FSETCC with vector types. Isel match that instead of the intrinsic. Similar to what we do for avx512. Trying to move more intrinsics to target specific ISD opcodes. Hoping to add DAG combines to shrink simple loads going into scalar intrinsics that only read 32 or 64 bits.	2020-05-26 23:48:16 -07:00
Craig Topper	b4978b2444	[X86] Use SIMD_EXC to remove some let statements in tablegen. NFCI	2020-05-26 23:48:16 -07:00
Wang, Pengfei	6565b58584	[X86][llvm-mc] Make the suffix matcher more accurate. Summary: Some instruction like VPMULDQ is NOT the variant of VPMULD but a new one. So we should make sure the suffix matcher only works for memory variant that has the same size with the suffix. Currently we only check for SSE/AVX* instructions, because many legacy instructions didn't declare the alias instructions of their variants. Differential Revision: https://reviews.llvm.org/D80608	2020-05-27 14:45:17 +08:00
Sam Parker	8aaabadece	[CostModel] Unify getCastInstrCost Add the remaining cast instruction opcodes to the base implementation of getUserCost and directly return the result. This allows getInstructionThroughput to return getUserCost for the casts. This has required changes to PPC and SystemZ because they implement getUserCost and/or getCastInstrCost with adjustments for vector operations. Adjusts have also been made in the remaining backends that implement the method so that they still produce a cost of zero or one for cost kinds other than throughput. Differential Revision: https://reviews.llvm.org/D79848	2020-05-26 11:29:57 +01:00
Simon Pilgrim	6f802ec433	[X86] Fix fshr comment copy+paste typo. NFC. Noticed by @foad on D80466.	2020-05-26 10:55:57 +01:00
Sam Parker	871556a494	[CostModel] Unify Intrinsic Costs. Recommitting most of the remaining changes from `259eb619ff`, but excluding the call to getUserCost from getInstructionThroughput. Though there's still no test changes, I doubt that this is an NFC... With the two getIntrinsicInstrCosts folded into one, now fold in the scalar/code-size orientated getIntrinsicCost. The remaining scalar intrinsics were memcpy, cttz and ctlz which now have special handling in the BasicTTI implementation. This had required a change in the AMDGPU backend for fabs as it should always be 'free'. I've also changed the X86 backend to return the BaseT implementation when the CostKind isn't RecipThroughput. Differential Revision: https://reviews.llvm.org/D80012	2020-05-26 09:48:26 +01:00

... 3 4 5 6 7 ...

20958 Commits