llvm-project

Commit Graph

Author	SHA1	Message	Date
Xiang1 Zhang	0980038a5e	Handle CET for -exception-model sjlj Summary: In SjLj exception mode, the old landingpad BB will create a new landingpad BB and use indirect branch jump to the old landingpad BB in lowering. So we should add 2 endbr for this exception model. Reviewers: hjl.tools, craig.topper, annita.zhang, LuoYuanke, pengfei, efriedma Reviewed By: LuoYuanke Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77124	2020-04-20 11:13:40 +08:00
Simon Pilgrim	e71dd7c011	[X86][SSE] getFauxShuffle - don't combine shuffles with small truncated scalars (PR45604) getFauxShuffle attempts to combine INSERT_VECTOR_ELT(TRUNCATE/EXTEND(EXTRACT_VECTOR_ELT(x))) patterns into a target shuffle chain. PR45604 identified an issue where the scalar was truncated to a size smaller than the destination vector element and then zero extended back, which requires the upper bits to be zero'd which we don't currently do. To avoid the bug I've added an early out in these truncation cases, a future commit should allow us to handle this by inserting the necessary SM_SentinelZero padding.	2020-04-19 13:35:22 +01:00
Sanjay Patel	cceb630a07	[x86] use vector instructions to lower more FP->int->FP casts This is an enhancement to D77895 to avoid another round-trip from XMM->GPR->XMM. This time we handle the case of starting/ending with an f64 and casting to signed i32 as the intermediate value. It's a bit more involved than I initially assumed because we need to use target-specific opcodes to represent the non-standard cast ops. Differential Revision: https://reviews.llvm.org/D78362	2020-04-19 08:33:17 -04:00
Simon Pilgrim	d6db919bee	[X86][SSE] Add test case for PR45604	2020-04-19 13:13:54 +01:00
Andrew Litteken	8d5024f7fe	fix to outline cfi instruction when can be grouped in a tail call [MachineOutliner] fix test for excluding CFI and add test to include CFI in outlining New test to check that we only outline CFI instruction if all CFI Instructions in the function would be captured by the outlining adding x86 tests analagous to AARCH64 cfi tests Revision: https://reviews.llvm.org/D77852	2020-04-17 22:26:34 -07:00
Craig Topper	31a166e4cb	[X86] Clean up some mir tests with INLINEASM to avoid regdef or to correct the immediate for the regdef. The immediate used for the regdef is the encoding for the register class in the enum generated by tablegen. This encoding will change any time a new register class is added. Since the number is part of the input, this means it can become stale. This change modifies some test to avoid this kind of immediate all together. And updates one test to use the current encoding of GR64.	2020-04-17 21:55:44 -07:00
Craig Topper	5f69e53e55	[X86] Remove single incoming value phis from tests for the loop SAD pattern. NFC InstCombine should ensure these don't exist. I'm looking at making some changes to how we detect these patterns and not having to worry about these phis will help.	2020-04-17 13:39:47 -07:00
Sanjay Patel	a6fc687e34	[x86] add/adjust tests for FP<->int casts; NFC	2020-04-17 08:22:42 -04:00
Craig Topper	944cc5e0ab	[SelectionDAGBuilder][CGP][X86] Move some of SDB's gather/scatter uniform base handling to CGP. I've always found the "findValue" a little odd and inconsistent with other things in SDB. This simplfifies the code in SDB to just handle a splat constant address or a 2 operand GEP in the same BB. This removes the need for "findValue" since the operands to the GEP are guaranteed to be available. The splat constant handling is new, but was needed to avoid regressions due to constant folding combining GEPs created in CGP. CGP is now responsible for canonicalizing gather/scatters into this form. The pattern I'm using for scalarizing, a scalar GEP followed by a GEP with an all zeroes index, seems to be subject to constant folding that the insertelement+shufflevector was not. Differential Revision: https://reviews.llvm.org/D76947	2020-04-16 17:49:22 -07:00
Sanjay Patel	b29fca30fa	[x86] auto-generate complete test checks; NFC	2020-04-16 17:16:51 -04:00
bd1976llvm	86478d3de9	[MC][ELF] Put explicit section name symbols into entry size compatible sections Ensure that symbols explicitly* assigned a section name are placed into a section with a compatible entry size. This is done by creating multiple sections with the same name** if incompatible symbols are explicitly given the name of an incompatible section, whilst: - Avoiding using uniqued sections where possible (for readability and to maximize compatibly with assemblers). - Creating as few SHF_MERGE sections as possible (for efficiency). Given that each symbol is assigned to a section in a single pass, we must decide which section each symbol is assigned to without seeing the properties of all symbols. A stable and easy to understand assignment is desirable. The following rules facilitate this: The "generic" section for a given section name will be mergeable if the name is a mergeable "default" section name (such as .debug_str), a mergeable "implicit" section name (such as .rodata.str2.2), or MC has already created a mergeable "generic" section for the given section name (e.g. in response to a section directive in inline assembly). Otherwise, the "generic" section for a given name is non-mergeable; and, non-mergeable symbols are assigned to the "generic" section, while mergeable symbols are assigned to uniqued sections. Terminology: "default" sections are those always created by MC initially, e.g. .text or .debug_str. "implicit" sections are those created normally by MC in response to the symbols that it encounters, i.e. in the absence of an explicit section name assignment on the symbol, e.g. a function foo might be placed into a .text.foo section. "generic" sections are those that are referred to when a unique section ID is not supplied, e.g. if there are multiple unique .bob sections then ".quad .bob" will reference the generic .bob section. Typically, the generic section is just the first section of a given name to be created. Default sections are always generic. * Typically, section names might be explicitly assigned in source code using a language extension e.g. a section attribute: _attribute_ ((section ("section-name"))) - https://clang.llvm.org/docs/AttributeReference.html ** I refer to such sections as unique/uniqued sections. In assembly the ", unique," assembly syntax is used to express such sections. Fixes https://bugs.llvm.org/show_bug.cgi?id=43457. See https://reviews.llvm.org/D68101 for previous discussions leading to this patch. Some minor fixes were required to LLVM's tests, for tests had been using the old behavior - which allowed for explicitly assigning globals with incompatible entry sizes to a section. This fix relies on the ",unique ," assembly feature. This feature is not available until bintuils version 2.35 (https://sourceware.org/bugzilla/show_bug.cgi?id=25380). If the integrated assembler is not being used then we avoid using this feature for compatibility and instead try to place mergeable symbols into non-mergeable sections or issue an error otherwise. Differential Revision: https://reviews.llvm.org/D72194	2020-04-16 19:12:49 +00:00
Konstantin Schwarz	1a3e89aa2b	[MIR] Add comments to INLINEASM immediate flag MachineOperands Summary: The INLINEASM MIR instructions use immediate operands to encode the values of some operands. The MachineInstr pretty printer function already handles those operands and prints human readable annotations instead of the immediates. This patch adds similar annotations to the output of the MIRPrinter, however uses the new MIROperandComment feature. Reviewers: SjoerdMeijer, arsenm, efriedma Reviewed By: arsenm Subscribers: qcolombet, sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78088	2020-04-16 13:46:14 +02:00
Eli Friedman	7c10541e56	[SelectionDAG] Fix usage of Align constructing MachineMemOperands. The "Align" passed into getMachineMemOperand etc. is the alignment of the MachinePointerInfo, not the alignment of the memory operation. (getAlign() on a MachineMemOperand automatically reduces the alignment to account for this.) We were passing on wrong (overconservative) alignment in a bunch of places. Fix a bunch of these, mostly in legalization. And while I'm here, switch to the new Align APIs. The test changes are all scheduling changes: the biggest effect of preserving large alignments is that it improves alias analysis, so the scheduler has more freedom. (I was originally just trying to do a minor cleanup in SelectionDAGBuilder, but I accidentally went deeper down the rabbit hole.) Differential Revision: https://reviews.llvm.org/D77687	2020-04-15 13:01:41 -07:00
Craig Topper	8dfb9627b7	[X86] Make v32i16/v64i8 legal types without avx512bw. Use custom splitting instead. This moves v32i16/v64i8 to a model consistent with how we treat integer types with avx1. This does change the ABI for types vXi16/vXi8 vectors larger than 512 bits to pass in multiple zmms instead of multiple ymms. We'd already hacked some code to make v64i8/v32i16 pass in zmm. Cost model is still a bit of a mess. In some place I tried to match existing behavior. But really we need to account for splitting and concating costs. Cost model for shuffles is especially pessimistic. Differential Revision: https://reviews.llvm.org/D76212	2020-04-15 12:17:18 -07:00
Simon Pilgrim	2bcbf1319e	[X86] Add generic cpu target for the slow division tests Baseline for any change due to D75567	2020-04-15 19:38:29 +01:00
Hubert Tong	cda006cbc7	[test][NFC] Use plain FileCheck in statepoint-stackmap-size.ll Summary: The test in question uses a non-portable `grep -A` option in conjunction with `wc -l`. `FileCheck` can be used to do the check without using these extra utilities. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D78060	2020-04-14 20:53:41 -04:00
Eli Friedman	2876b3eef3	[SelectionDAG] Always preserve offset in MachinePointerInfo Previously, getWithOffset() would drop the offset if the base was null. Because of this, MachineMemOperand would return the wrong result from getAlign() in these cases. MachineMemOperand stores the alignment of the pointer without the offset. A bunch of MIR tests changed because we print the offset now. Split off from D77687. Differential Revision: https://reviews.llvm.org/D78049	2020-04-14 15:29:41 -07:00
Rahman Lavaee	05192e585c	Extend BasicBlock sections to allow specifying clusters of basic blocks in the same section. Differential Revision: https://reviews.llvm.org/D76954	2020-04-13 12:19:59 -07:00
Rahman Lavaee	4ddf7ab454	Revert "Extend BasicBlock sections to allow specifying clusters of basic blocks" This reverts commit `0d4ec16d3d` Because tests were not added to the commit.	2020-04-13 12:19:59 -07:00
Rahman Lavaee	0d4ec16d3d	Extend BasicBlock sections to allow specifying clusters of basic blocks in the same section. This allows specifying BasicBlock clusters like the following example: !foo !!0 1 2 !!4 This places basic blocks 0, 1, and 2 in one section in this order, and places basic block #4 in a single section of its own.	2020-04-13 11:46:11 -07:00
Jay Foad	bc78baec4c	[X86] Improve combineVectorShiftImm Summary: Fold (shift (shift X, C2), C1) -> (shift X, (C1 + C2)) for logical as well as arithmetic shifts. This is needed to prevent regressions from an upcoming funnel shift expansion change. While we're here, fold (VSRAI -1, C) -> -1 too. Reviewers: RKSimon, craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77300	2020-04-13 15:54:55 +01:00
Simon Pilgrim	401cbe373b	[X86][AVX] Attempt to scale masked shuffles to match the root type Improve the chances of folding the writemask into the combined shuffle by scaling a wider shuffle mask to match the root's original type. This creates a few minor issues with variable shuffles, preventing combines of shuffles because of the more limited support binary shuffle types. In most cases we're probably better off combining the shuffles and losing the writemask fold, but this isn't always going to be true.	2020-04-13 14:57:25 +01:00
Simon Pilgrim	ec938c2a83	[X86][AVX] Add some masked variable shuffle tests Now that's D77928 landed we need to try harder to match shuffle and mask widths. This is a couple of tests showing where variable shuffle masks have been widened preventing them from folding with the mask.	2020-04-13 14:32:29 +01:00
Craig Topper	42fc7852f5	[X86] Print k-mask in FMA3 comments.	2020-04-12 13:16:53 -07:00
Jonathan Roelofs	41f13f1f64	reland: [DAG] Fix PR45049: LegalizeTypes crash Sometimes LegalizeTypes knows about common subexpressions before SelectionDAG does, leading to accidental SDValue removal before its reference count was truly zero. Differential Revision: https://reviews.llvm.org/D76994 Reviewed-By: bjope Fixes: https://bugs.llvm.org/show_bug.cgi?id=45049 Reverted in `3ce77142a6` because the previous patch broke the expensive-checks bots. The new patch removes the broken check.	2020-04-12 09:52:17 -06:00
Sanjay Patel	d04db4825a	[x86] use vector instructions to lower FP->int->FP casts As discussed in PR36617: https://bugs.llvm.org/show_bug.cgi?id=36617#c13 ...we can avoid the likely slow round-trip from XMM to GPR to XMM by using the vector versions of the convert instructions. Based on experimental results from recent Intel/AMD chips, we don't need to worry about triggering denorm stalls while operating on garbage data in the high lanes with convert instructions, so this is expected to always be as good or better perf than the scalar instruction equivalent. FP exceptions are also not a concern because strict code should not be using the regular SDAG opcodes. Differential Revision: https://reviews.llvm.org/D77895	2020-04-12 10:26:43 -04:00
Craig Topper	d3465e0691	[X86] Enable shuffle combining for AVX512 unless the root is used by a vselect A lot of vectorized code doesn't use masks so we shouldn't penalize them by not doing shuffle combining on avx512 targets. I've added support for VALIGNQ/VALIGND and 512-bit SHUF128 to prevent some regressions. I also prevented recombining 256-bit SHUF128 to PERM2X128. We may not need to add 256-bit SHUF128 support, but I don't think I found any cases requiring that in my testing. Differential Revision: https://reviews.llvm.org/D77928	2020-04-11 20:05:10 -07:00
Hongtao Yu	11455a7905	[CodeGen] Allow partial tail duplication in Machine Block Placement. Summary: A count profile may affect tail duplication's heuristic causing a block to be duplicated in only a part of its predecessors. This is not allowed in the Machine Block Placement pass where an assert will go off. I'm removing the assert and making the optimization bail out when such case happens. Reviewers: wenlei, davidxl, Carrot Reviewed By: Carrot Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77748	2020-04-11 12:20:31 -07:00
Sanjay Patel	ebf22a4935	[x86] add test for FP->int->FP casts; NFC (PR36617) Also, add a common prefix for SSE to reduce redundant CHECK lines.	2020-04-10 15:57:35 -04:00
Serguei Katkov	4275eb1331	Re-land [Codegen/Statepoint] Allow usage of registers for non gc deopt values. The change introduces the usage of physical registers for non-gc deopt values. This require runtime support to know how to take a value from register. By default usage is off and can be switched on by option. The change also introduces additional fix-up patch which forces the spilling of caller saved registers (clobbered after the call) and re-writes statepoint to use spill slots instead of caller saved registers. Reviewers: reames, danstrushin Reviewed By: dantrushin Subscribers: mgorny, hiraditya, mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D77797	2020-04-10 10:13:39 +07:00
Serguei Katkov	44f0d7f136	Revert "[Codegen/Statepoint] Allow usage of registers for non gc deopt values." This reverts commit `a0275705bb`. It causes buildbot failures building LLVM with BUILD_SHARED_LIBS due to a linker error.	2020-04-09 18:24:47 +07:00
Serguei Katkov	a0275705bb	[Codegen/Statepoint] Allow usage of registers for non gc deopt values. The change introduces the usage of physical registers for non-gc deopt values. This require runtime support to know how to take a value from register. By default usage is off and can be switched on by option. The change also introduces additional fix-up patch which forces the spilling of caller saved registers (clobbered after the call) and re-writes statepoint to use spill slots instead of caller saved registers. Reviewers: reames, dantrushin Reviewed By: reames, dantrushin Subscribers: mgorny, hiraditya, mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D77371	2020-04-09 16:57:35 +07:00
Jay Foad	9c7bd94ce8	Fix typo in comment	2020-04-09 10:36:00 +01:00
WangTianQing	a3dc949000	[X86] Add TSXLDTRK instructions. Summary: For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference Reviewers: craig.topper, RKSimon, LuoYuanke Reviewed By: craig.topper Subscribers: mgorny, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77205	2020-04-09 13:17:29 +08:00
Vedant Kumar	48e65fc630	MachineFunction: Copy call site info when duplicating insts Summary: Preserve call site info for duplicated instructions. We copy over the call site info in CloneMachineInstrBundle to avoid repeated calls to copyCallSiteInfo in CloneMachineInstr. (Alternatively, we could copy call site info higher up the stack, e.g. into TargetInstrInfo::duplicate, or even into individual backend passes. However, I don't see how that would be safer or more general than the current approach.) Reviewers: aprantl, djtodoro, dstenb Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77685	2020-04-08 11:06:14 -07:00
Simon Pilgrim	66c18c729d	[X86][SSE] Combine PTEST(AND(X,Y),AND(X,Y)) -> PTEST(X,Y) and ANDN equivalents Tests derived from PR42035 examples	2020-04-08 12:42:22 +01:00
Simon Pilgrim	6f46e9af8a	[X86][SSE] Add PTEST(AND(X,Y),AND(X,Y)) tests derived from PR42035 examples	2020-04-07 17:58:54 +01:00
Simon Pilgrim	e3b6059776	[X86][SSE] combineX86ShufflesConstants - early out for zeroable vectors (PR45443) Shuffle combining can insert zero byte sized elements into the shuffle mask, which combineX86ShufflesConstants will attempt to fold without taking into account whether the byte-sized type is legal (e.g. AVX512F only targets). If we have a full-zeroable vector then we should just return a zero version of the root type, otherwise if the type isn't valid we should bail. Fixes PR45443	2020-04-07 14:45:29 +01:00
Xiang1 Zhang	01a32f2bd3	Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology) Do not commit the llvm/test/ExecutionEngine/MCJIT/cet-code-model-lager.ll because it will cause build bot fail(not suitable for window 32 target). Summary: This patch comes from H.J.'s `2bd54ce7fa` This patch fix the failed llvm unit tests which running on CET machine. (e.g. ExecutionEngine/MCJIT/MCJITTests) The reason we enable IBT at "JIT compiled with CET" is mainly that: the JIT don't know the its caller program is CET enable or not. If JIT's caller program is non-CET, it is no problem JIT generate CET code or not. But if JIT's caller program is CET enabled, JIT must generate CET code or it will cause Control protection exceptions. I have test the patch at llvm-unit-test and llvm-test-suite at CET machine. It passed. and H.J. also test it at building and running VNCserver(Virtual Network Console), it works too. (if not apply this patch, VNCserver will crash at CET machine.) Reviewers: hjl.tools, craig.topper, LuoYuanke, annita.zhang, pengfei Reviewed By: LuoYuanke Subscribers: tstellar, efriedma, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76900	2020-04-07 09:48:47 +08:00
Leonard Chan	a0222ac1f9	[AsmPrinter] Do not define local aliases for global objects in a comdat A global symbol that is defined in a comdat should not generate an alias since call sites that would've referred to that symbol will refer to their own independent local aliases rather than the surviving global comdat one. This could result in something that looks like: ``` ld.lld: error: relocation refers to a discarded section: .text._ZN3fbl8internal18NullFunctionTargetIvJjjPjEED1Ev.stub >>> defined in user-x64-clang/obj/system/ulib/minfs/libminfs.a(minfs._sources.file.cc.o) >>> section group signature: _ZN3fbl8internal18NullFunctionTargetIvJjjPjEED1Ev.stub >>> prevailing definition is in user-x64-clang/obj/system/ulib/minfs/libminfs.a(minfs._sources.vnode.cc.o) >>> referenced by function.h:169 (../../zircon/system/ulib/fbl/include/fbl/function.h:169) >>> minfs._sources.file.cc.o:(minfs::File::AllocateAndCommitData(std::__2::unique_ptr<minfs::Transaction, std::__2::default_delete<minfs::Transaction> >)) in archive user-x64-clang/obj/system/ulib/minfs/libminfs.a ``` We ran into this when experimenting with a new C++ ABI for fuchsia (refer to D72959) which takes relative offsets between comdat'd functions which is why the normal C++ user wouldn't run into this. Differential Revision: https://reviews.llvm.org/D77429	2020-04-06 13:48:05 -07:00
Nick Desaulniers	5bc291be71	[SelectionDAG] fix predecessor list for INLINEASM_BRs' parent Summary: A bug report mentioned that LLVM was producing jumps off the end of a function when using "asm goto with outputs". Further digging pointed to MachineBasicBlocks that had their address taken and were indirect targets of INLINEASM_BR being removed by BranchFolder, because their predecessor list was empty, so they appeared to have no entry. This was a cascading failure caused earlier, during Pre-RA instruction scheduling. We have a few special cases in Pre-RA instruction scheduling where we split a MachineBasicBlock in two. This requires careful handing of predecessor and successor lists for a MachineBasicBlock that was split, and careful handing of PHI MachineInstrs that referred to the MachineBasicBlock before it was split. The clue that led to this fix was the observation that many callers of MachineBasicBlock::splice() frequently call MachineBasicBlock::transferSuccessorsAndUpdatePHIs() to update their PHI nodes after a splice. We don't want to reuse that method, as we have custom successor transferring logic for this block split. This patch fixes 2 pre-existing bugs, and adds tests. The first bug was that MachineBasicBlock::splice() correctly handles updating most successors and predecessors; we don't need to do anything more than removing the previous fallthrough block from the first half of the split block post splice. Previously, we were updating the successor list incorrectly (updating successors updates predecessors). The second bug was that PHI nodes that needed registers from the first half of the split block were not having entries populated. The register live out information was correct, and the FuncInfo->PHINodesToUpdate was correct. Specifically, the check in SelectionDAGISel::FinishBasicBlock: for (unsigned i = 0, e = FuncInfo->PHINodesToUpdate.size(); i != e; ++i) { MachineInstrBuilder PHI(*MF, FuncInfo->PHINodesToUpdate[i].first); if (!FuncInfo->MBB->isSuccessor(PHI->getParent())) continue; PHI.addReg(FuncInfo->PHINodesToUpdate[i].second).addMBB(FuncInfo->MBB); was `continue`ing because FuncInfo->MBB tracks the second half of the post-split block; no one was updating PHI entries for the first half of the post-split block. SelectionDAGBuilder::UpdateSplitBlock() already expects to perform special handling for MachineBasicBlocks that were split post calls to ScheduleDAGSDNodes::EmitSchedule(), so I'm confident that it's both correct for ScheduleDAGSDNodes::EmitSchedule() to return the second half of the split block `CopyBB` which updates `FuncInfo->MBB` (ie. the current MachineBasicBlock being processed), and perform special handling for this in SelectionDAGBuilder::UpdateSplitBlock(). Reviewers: void, craig.topper, efriedma Reviewed By: void, efriedma Subscribers: hfinkel, fhahn, MatzeB, efriedma, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D76961	2020-04-06 13:46:39 -07:00
Craig Topper	07ed1fb597	[SelectionDAGBuilder] Fix ISD::FREEZE creation for structs with fields of different types. The previous code used the type of the first field for the VT passed to getNode for every field. I've based the implementation here off what is done in visitSelect as it removes the need to special case aggregates. Differential Revision: https://reviews.llvm.org/D77093	2020-04-06 11:03:40 -07:00
Jonathan Roelofs	7c5d2bec76	[llvm] Fix missing FileCheck directive colons https://reviews.llvm.org/D77352	2020-04-06 09:59:08 -06:00
Sanjay Patel	fbb1b43f13	[ValueTracking] enhance matching of umin/umax with 'not' operands The cmyk test is based on the known regression that resulted from: rGf2fbdf76d8d0 This improves on the equivalent signed min/max change: rG867f0c3c4d8c The underlying icmp equivalence is: ~X pred ~Y --> Y pred X For an icmp with constant, canonicalization results in a swapped pred: ~X < C --> X > ~C	2020-04-06 11:51:59 -04:00
Hans Wennborg	64c2312750	Revert `43f031d312` "Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology)" ExecutionEngine/MCJIT/cet-code-model-lager.ll is failing on 32-bit windows, see llvm-commits thread for `fef2dab`. This reverts commit `43f031d312` and the follow-ups `fef2dab100` and `6a800f6f62`.	2020-04-06 15:05:25 +02:00
Simon Pilgrim	9bc5b1a489	[X86][SSE] combineVectorSignBitsTruncation - remove minimum vector length limitations truncateVectorWithPACK has its own vector length controls, so we can rely on those directly. This helps some existing truncation to subvector tests, which were being combined later during shuffle lowering at which point the sign/zero bit detection had become obscured preventing lowerShuffleWithPACK working as well as it could.	2020-04-06 12:45:23 +01:00
Simon Pilgrim	4431a29c60	[X86][SSE] Combine unary shuffle(HORIZOP,HORIZOP) -> HORIZOP We had previously limited the shuffle(HORIZOP,HORIZOP) combine to binary shuffles, but we can often merge unary shuffles just as well, folding in UNDEF/ZERO values into the 64-bit half lanes. For the (P)HADD/HSUB cases this is limited to fast-horizontal cases but PACKSS/PACKUS combines under all cases.	2020-04-05 22:49:46 +01:00
Zuojian Lin	a58c8a7866	Remove the additional constant which requires an extra register for statepoint lowering. The newly-created constant zero will need an extra register to hold it in the current statepoint lowering implementation. Remove it if there exists one.	2020-04-05 11:22:09 -04:00
Simon Pilgrim	3079e51858	[X86][SSE] Generalize shuffle(HORIZOP,HORIZOP) -> HORIZOP combine Our existing combine allows to merge the shuffle of 2 similar 64-bit wide 'horizontal ops' (HADD/PACK/etc.) if the shuffle was a UNPCK/MOVSD. This patch generalizes this to decode any target shuffle mask that can be widened to a 128-bit repeating v2*64 mask, which helps us catch PBLENDW/PBLENDD cases.	2020-04-05 12:09:19 +01:00
Simon Pilgrim	a17de6b91c	[X86][SSE] truncateVectorWithPACK - upper undef for 128->64 packing If we're packing from 128-bits to 64-bits then we don't need the RHS argument. This helps with register allocation, especially as we avoid repeating a use of the input value.	2020-04-05 11:47:36 +01:00

1 2 3 4 5 ...

15554 Commits