llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	f7f84a0ca3	[X86][SandyBridge] Strip unnecessary MOVQ/CVT instruction instrw overrides. llvm-svn: 330505	2018-04-21 14:03:40 +00:00
Simon Pilgrim	02fc375a22	[X86] Strip unnecessary MMX instruction instrw overrides from scheduler models. llvm-svn: 330503	2018-04-21 12:15:42 +00:00
Simon Pilgrim	c0f654f18e	[X86] Strip unnecessary x87 instruction instrw overrides from scheduler models. llvm-svn: 330501	2018-04-21 11:25:02 +00:00
Simon Pilgrim	d14d2e7b18	[X86] Add WriteFSign/WriteFLogic scheduler classes Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes. This unearthed a couple of things that are also handled in this patch: (1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic (2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015. (3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops. Differential Revision: https://reviews.llvm.org/D45629 llvm-svn: 330480	2018-04-20 21:16:05 +00:00
Craig Topper	173d59b62e	[X86][SandyBridge] Remove duplciate InstRWs from Sandy Brige scheduler model. llvm-svn: 330465	2018-04-20 18:55:40 +00:00
Gabor Buella	31fa8025ba	[X86] WaitPKG instructions Three new instructions: umonitor - Sets up a linear address range to be monitored by hardware and activates the monitor. The address range should be a writeback memory caching type. umwait - A hint that allows the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events. tpause - Directs the processor to enter an implementation-dependent optimized state until the TSC reaches the value in EDX:EAX. Also modifying the description of the mfence instruction, as the rep prefix (0xF3) was allowed before, which would conflict with umonitor during disassembly. Before: $ echo 0xf3,0x0f,0xae,0xf0 \| llvm-mc -disassemble .text mfence After: $ echo 0xf3,0x0f,0xae,0xf0 \| llvm-mc -disassemble .text umonitor %rax Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45253 llvm-svn: 330462	2018-04-20 18:42:47 +00:00
Simon Pilgrim	df8fa6d734	[X86][BtVer2] Cleanup some old FIXMEs from the model. NFCI. llvm-svn: 330428	2018-04-20 13:12:04 +00:00
Simon Pilgrim	2f522ef13d	[X86] Tag CLDEMOTE instruction with WriteLoad scheduling class Same as other cacheline instructions llvm-svn: 330424	2018-04-20 12:54:53 +00:00
Craig Topper	bc895a3afc	[X86] Enable popcnt false dependency breaking on Silvermont and Goldmont. Silvermont and Goldmont have the same issue on popcnt as Sandy Bridge, Haswell, Broadwell, and Skylake. Believe it is fixed in Goldmont Plus. llvm-svn: 330358	2018-04-19 19:25:24 +00:00
Simon Pilgrim	4ba057dbd1	[X86][SLM] Fix typo using SandyBridge resources. Luckily this was on instructions not supported on Silvermont.... llvm-svn: 330351	2018-04-19 18:01:52 +00:00
Craig Topper	b5f2659130	[X86] Correct the scheduling data for register forms of XCHG and XADD on Intel CPUs. The XCHG16rr/XCHG32rr/XCHG64rr instructions should be 3 uops just like XCHG8rr. I believe they're just implemented as 3 move uops with a temporary register. XADD is probably 2 moves and an add also using a temporary register. Change the latency for both from 2 cycles to 3 cycles. Only 2 of the uops are serialized in their execution, the move into the temporary and the move out of the temporary. The move from one GPR to the other should be able to go in parallel with this if there are ALU resources available. llvm-svn: 330349	2018-04-19 18:00:17 +00:00
Simon Pilgrim	5e492d29a3	[X86] Merge some MMX instregex There's a lot more but I'd prefer focussing on removing unnecessary InstRWs first. llvm-svn: 330347	2018-04-19 17:32:10 +00:00
Simon Pilgrim	f21ace6cdd	[X86][BtVer2] Remove SSE4A EXTRQ/EXTRQI InstRW overrides. These are already handled identically by WriteALU. llvm-svn: 330332	2018-04-19 14:38:36 +00:00
Alexander Ivchenko	e8fed1546e	Lowering x86 adds/addus/subs/subus intrinsics (llvm part) This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. The patch also includes folding of previously missing saturation patterns so that IR emits the same machine instructions as the intrinsics. Patch by tkrupa Differential Revision: https://reviews.llvm.org/D44785 llvm-svn: 330322	2018-04-19 12:13:30 +00:00
Simon Pilgrim	3c06617f0e	[X86][FMA] Remove FMA reg-reg InstRW scheduler overrides. These are all already handled identically by WriteFMA. llvm-svn: 330319	2018-04-19 11:37:26 +00:00
Simon Pilgrim	33dede9075	[X86][BtVer2] Remove 128-bit F16C InstRW overrides. These are already handled identically by WriteCvtF2F. llvm-svn: 330318	2018-04-19 11:16:33 +00:00
Craig Topper	f846e2d1b1	[X86] Scrub scheduling information for MUL/IMUL on Intel CPUs. This removes a bunch of unnecessary InstRW overrides. It also cleans up the missing information from the Sandy Bridge model. Other fixes to other models. llvm-svn: 330308	2018-04-19 05:34:05 +00:00
Bob Haarman	cb80a3fce0	Fix data race in X86FloatingPoint.cpp ASSERT_SORTED Summary: ASSERT_SORTED checks if a table is sorted, and uses a boolean to prevent the check from being run again if it was earlier determined that the table is in fact sorted. Unsynchronized reads and writes of that boolean triggered ThreadSanitizer's data race detection. This change rewrites the code to use std::atomic<bool> instead. Fixes PR36922. Reviewers: rnk Reviewed By: rnk Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D45742 llvm-svn: 330301	2018-04-18 23:04:09 +00:00
Craig Topper	ebf52e80c1	[X86] Correct the Defs, Uses, hasSideEffects, mayLoad, mayStore for XCHG and XADD instructions. I don't think we emit any of these from codegen except for using XCHG16ar as 2 byte NOP. llvm-svn: 330298	2018-04-18 22:07:53 +00:00
Craig Topper	04244cbf45	[X86] Fix the Uses/Defs,mayLoad,mayStore,hasSideEffects flags for the CMPXCHG instructions. The compiler only emits the locked version of these which use different instruction definitions. The versions fixed here are only used by the assembler/disassembler. llvm-svn: 330287	2018-04-18 20:15:00 +00:00
Chandler Carruth	ccd3ecb95a	[x86] Switch EFLAGS copy lowering to use reg-reg form of testing for a zero register. Previously I tried this and saw LLVM unable to transform this to fold with memory operands such as spill slot rematerialization. However, it clearly works as shown in this patch. We turn these into `cmpb $0, <mem>` when useful for folding a memory operand without issue. This form has no disadvantage compared to `testb $-1, <mem>`. So overall, this is likely no worse and may be slightly smaller in some cases due to the `testb %reg, %reg` form. Differential Revision: https://reviews.llvm.org/D45475 llvm-svn: 330269	2018-04-18 15:52:50 +00:00
Chandler Carruth	1f87618f8f	[x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite uses across basic blocks in the limited cases where it is very straight forward to do so. This will also be useful for other places where we do some limited EFLAGS propagation across CFG edges and need to handle copy rewrites afterward. I think this is rapidly approaching the maximum we can and should be doing here. Everything else begins to require either heroic analysis to prove how to do PHI insertion manually, or somehow managing arbitrary PHI-ing of EFLAGS with general PHI insertion. Neither of these seem at all promising so if those cases come up, we'll almost certainly need to rewrite the parts of LLVM that produce those patterns. We do now require dominator trees in order to reliably diagnose patterns that would require PHI nodes. This is a bit unfortunate but it seems better than the completely mysterious crash we would get otherwise. Differential Revision: https://reviews.llvm.org/D45673 llvm-svn: 330264	2018-04-18 15:13:16 +00:00
Craig Topper	dfccafe18a	[X86][Broadwell] Remove some unnecessary InstRW overrides and add some FIXMEs. llvm-svn: 330241	2018-04-18 06:41:25 +00:00
Craig Topper	513e11bb70	[X86] Give CMOV 2 cycle latency on SLM. llvm-svn: 330239	2018-04-18 06:04:30 +00:00
Craig Topper	8704612481	[X86] Don't crash on bad operand modifiers in inline assembly Summary: Previously if a modifer was placed on a non-GPR register class we would hit an assert or crash. Reviewers: echristo Reviewed By: echristo Subscribers: eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D45751 llvm-svn: 330238	2018-04-18 05:15:24 +00:00
Keith Wyss	3d86823f3d	[XRay] Typed event logging intrinsic Summary: Add an LLVM intrinsic for type discriminated event logging with XRay. Similar to the existing intrinsic for custom events, but also accepts a type tag argument to allow plugins to be aware of different types and semantically interpret logged events they know about without choking on those they don't. Relies on a symbol defined in compiler-rt patch D43668. I may wait to submit before I can see demo everything working together including a still to come clang patch. Reviewers: dberris, pelikan, eizan, rSerge, timshen Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45633 llvm-svn: 330219	2018-04-17 21:30:29 +00:00
Craig Topper	e56a2fc5e7	[X86] Add separate scheduling class for PSADBW instruction. llvm-svn: 330204	2018-04-17 19:35:19 +00:00
Craig Topper	655e1db722	[X86] Remove unnecessary InstRW overrides. Add somes FIXMEs/TODOs. llvm-svn: 330203	2018-04-17 19:35:14 +00:00
Simon Pilgrim	86e3c26924	[X86] Add FP comparison scheduler classes Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd Differential Revision: https://reviews.llvm.org/D45656 llvm-svn: 330179	2018-04-17 07:22:44 +00:00
Gabor Buella	8f1646b579	[X86] Introduce archs: goldmont-plus & tremont Using Goldmont's cost tables for these two upcoming atom archs. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45612 llvm-svn: 330109	2018-04-16 07:47:35 +00:00
Craig Topper	53f9558903	[X86] Use uint32_t instead of unsigned in GetLo32XForm for readability. NFC GetLo8XForm right next to it uses uint8_t so uint32_t is consistent. llvm-svn: 330104	2018-04-15 19:11:24 +00:00
Simon Pilgrim	b8adf558f8	[X86][MMX] Set PAVG/PHADD/PMIN/PMAX/PSIGN instructions to use same scheduler classes as SSE/AVX llvm-svn: 330085	2018-04-14 13:06:38 +00:00
Hiroshi Inoue	ae17900997	[NFC] fix trivial typos in document and comments "not not" -> "not" etc llvm-svn: 330083	2018-04-14 08:59:00 +00:00
Craig Topper	95f421cfbf	[X86] Add the bizarro movsww and movzww mnemonics for the disassembler. The destination size of the movzx/movsx instruction is controlled by the normal operand size mechanisms. Only the input type is fixed. This means that a 0x66 prefix on the encoding for zext/sext 16->32 should really produce a 16->16 instruction. Functionally this is equivalent to a GR16->GR16 move since bits 16 and above will be preserved. So nothing is actually extended. llvm-svn: 330078	2018-04-13 23:57:54 +00:00
Tim Northover	271d3d2771	MachO: trap unreachable instructions Debugability is more important than saving 4 bytes to let us to fall through to nonense. llvm-svn: 330073	2018-04-13 22:25:20 +00:00
Simon Pilgrim	0e74e50401	[X86] Remove remaining itinerary support from instructions and target (PR37093) llvm-svn: 330035	2018-04-13 15:37:56 +00:00
Simon Pilgrim	a3a9d81231	[X86] Generalize X86FixupLEAs to work with TargetSchedModel Similar to rL329834, don't rely on itinerary scheduler model to determine latencies for LEA thresholds, use the generic TargetSchedModel::computeInstrLatency call. llvm-svn: 330030	2018-04-13 15:09:39 +00:00
Simon Pilgrim	01637c473f	Remove comment reference to itineraries. NFCI. llvm-svn: 330025	2018-04-13 14:42:48 +00:00
Simon Pilgrim	fe3d59e98b	[X86][AVX512] UNPCKL/H PS and PD should be scheduled with WriteFShuffle not WriteFAdd llvm-svn: 330023	2018-04-13 14:41:05 +00:00
Simon Pilgrim	21e89795cc	[X86] Remove remaining OpndItins/SizeItins from all instruction defs (PR37093) llvm-svn: 330022	2018-04-13 14:36:59 +00:00
Simon Pilgrim	e0c7868ded	Remove comment references to itineraries. NFCI. llvm-svn: 330021	2018-04-13 14:31:57 +00:00
Simon Pilgrim	963bf4de2b	Remove out of data comment. NFCI. llvm-svn: 330019	2018-04-13 14:24:06 +00:00
Simon Pilgrim	ae0c2711b6	[X86] Remove OpndItins/SizeItins from all sse instruction defs (PR37093) llvm-svn: 330013	2018-04-13 12:50:31 +00:00
Hiroshi Inoue	372ffa15cb	[NFC] fix trivial typos in comments "the the" -> "the", "we we" -> "we", etc llvm-svn: 330006	2018-04-13 11:37:06 +00:00
Gabor Buella	604be4424b	[X86] Introduce cldemote instruction Hint to hardware to move the cache line containing the address to a more distant level of the cache without writing back to memory. Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45256 llvm-svn: 329992	2018-04-13 07:35:08 +00:00
Craig Topper	254ed028a4	[X86] Remove the pmuldq/pmuldq intrinsics and replace with native IR. This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics. We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well. llvm-svn: 329990	2018-04-13 06:07:18 +00:00
Simon Pilgrim	1f070c334c	[X86] Remove unused MoveLoadStoreItins/ShiftOpndItins schedule class wrappers. Was being used to move around empty/unused itineraries... llvm-svn: 329970	2018-04-12 22:57:34 +00:00
Simon Pilgrim	6551d405dc	[X86] Remove x86 InstrItinClass entries (PR37093) This removes the last of the x86 schedule itineraries, I'm intending to cleanup the remaining uses of NoItinerary/OpndItins/etc. before resolving PR37093. llvm-svn: 329967	2018-04-12 22:44:47 +00:00
Simon Pilgrim	0e45634f4e	[X86] Remove InstrItinClass entries from all x86 instruction defs (PR37093) llvm-svn: 329953	2018-04-12 20:47:34 +00:00
Simon Pilgrim	e9376b9fdc	[X86] Remove InstrItinClass entries from SSE/AVX instructions defs (PR37093) llvm-svn: 329945	2018-04-12 19:59:35 +00:00
Simon Pilgrim	577ae24feb	[X86] Remove explicit SSE/AVX schedule itineraries from defs (PR37093) llvm-svn: 329940	2018-04-12 19:25:07 +00:00
Simon Pilgrim	35935c0632	[X86] Remove remaining gpr schedule itineraries (PR37093) llvm-svn: 329938	2018-04-12 18:46:15 +00:00
Gabor Buella	297c138798	[X86] Introduce LLVM wbinvd intrinsic A previously missing intrinsic for an old instruction. Reviewers: craig.topper, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45312 llvm-svn: 329936	2018-04-12 18:38:18 +00:00
Simon Pilgrim	dec781c141	[X86] Remove gpr shift/extension schedule itineraries (PR37093) llvm-svn: 329933	2018-04-12 18:25:38 +00:00
Simon Pilgrim	8904a86f65	[X86] Remove AES/CLMUL/CRC32/LDDQU/MOVNT/POPCNT/SHA schedule itineraries (PR37093) llvm-svn: 329912	2018-04-12 14:31:42 +00:00
Simon Pilgrim	294556d40e	[X86] Remove remaining system/special schedule itineraries (PR37093) llvm-svn: 329906	2018-04-12 12:43:49 +00:00
Simon Pilgrim	0cd0fbd8c5	[X86] Remove system/control schedule itineraries (PR37093) llvm-svn: 329903	2018-04-12 12:09:24 +00:00
Simon Pilgrim	69e0e8e3d4	[X86] Remove CMOV/SETCC schedule itineraries (PR37093) llvm-svn: 329898	2018-04-12 11:01:40 +00:00
Simon Pilgrim	10e3bdaaa8	[X86] Remove MMX/3DNow schedule itineraries (PR37093) llvm-svn: 329896	2018-04-12 10:49:57 +00:00
Simon Pilgrim	32d368147f	[X86] Remove X87 schedule itineraries (PR37093) First of a number of commits to remove x86 schedule itineraries entirely - approved off-line with @craig.topper llvm-svn: 329893	2018-04-12 10:27:37 +00:00
Simon Pilgrim	7b88d09e75	[X86] Remove unused itinerary argument from FMA3/FMA4/XOP instructions. NFCI. llvm-svn: 329862	2018-04-11 23:24:38 +00:00
Gabor Buella	2ef36f3571	[X86] Describe wbnoinvd instruction Similar to the wbinvd instruction, except this one does not invalidate caches. Ring 0 only. The encoding matches a wbinvd instruction with an F3 prefix. Reviewers: craig.topper, zvi, ashlykov Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D43816 llvm-svn: 329847	2018-04-11 20:01:57 +00:00
Simon Pilgrim	8fc2b49620	[X86][Atom] Convert Atom scheduler model to SchedRW (PR32431) Atom is the only x86 target that still uses schedule itineraries, if we can remove this then we can begin the work on removing x86 itineraries. I've also found that it will help with PR36550. I've focussed on matching the existing model as closely as possible (relying on the schedule tests), PR36895 indicated a lot of these were incorrect but we can just as easily fix these after this patch as before. Hopefully we can get llvm-exegesis to help here, There are a few instructions that rely on itinerary scheduling (mainly push/pop/return) of multiple resource stages, but I don't think any of these are show stoppers. There are also a few codegen changes that seem related to the post-ra scheduler acting a little differently, I haven't tracked these down but they don't seem critical. NOTE: I don't have access to any Atom hardware, so this hasn't been tested in the wild. Differential Revision: https://reviews.llvm.org/D45486 llvm-svn: 329837	2018-04-11 18:23:01 +00:00
Simon Pilgrim	7f321d8c24	[X86] Generalize X86PadShortFunction to work with TargetSchedModel Pre-commit for D45486, don't rely on itinerary scheduler model to determine latencies for padding, use the generic TargetSchedModel::computeInstrLatency call. Also, replace hard coded (atom specific) 2*uop creation per padding cycle with a version based on the scheduler model's issue width. Differential Revision: https://reviews.llvm.org/D45486 llvm-svn: 329834	2018-04-11 18:05:17 +00:00
Simon Pilgrim	89c8a10f7c	[X86] Add variable shuffle schedule classes Split variable index shuffles from immediate index shuffles WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.) WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.) WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.) WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.) Differential Revision: https://reviews.llvm.org/D45404 llvm-svn: 329806	2018-04-11 13:49:19 +00:00
Craig Topper	9507fa358c	[X86] Remove 128/256-bit masked pmaddubsw and pmaddwd intrinsics. Replace 512-bit masked intrinsic with unmasked intrinsic and a select. The 128/256-bit versions were no longer used by clang. It uses the legacy SSE/AVX2 version and a select. The 512-bit was changed to the same for consistency. llvm-svn: 329774	2018-04-11 04:55:04 +00:00
Craig Topper	ee2c1dea4d	[X86] In X86FlagsCopyLowering, when rewriting a memory setcc we need to emit an explicit MOV8mr instruction. Previously the code only knew how to handle setcc to a register. This should fix a crash in the chromium build. llvm-svn: 329771	2018-04-11 01:09:10 +00:00
Sriraman Tallam	d693093a65	GOTPCREL references must always use RIP. With -fno-plt, global value references can use GOTPCREL and RIP must be used. Differential Revision: https://reviews.llvm.org/D45460 llvm-svn: 329765	2018-04-10 22:50:05 +00:00
Gabor Buella	213edc4a15	[X86] Split up -march=icelake to -client & -server Reviewers: craig.topper, zvi, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45055 llvm-svn: 329742	2018-04-10 18:59:13 +00:00
Craig Topper	442428540a	[X86] Change the name string for the newly add DF flag register to 'dirflag' to match the clobber name supported by clang for MS inline assembly. This should fix the failure found by Chromium reported here https://bugs.chromium.org/p/chromium/issues/detail?id=831158 The test case will be added in clang. llvm-svn: 329734	2018-04-10 18:21:04 +00:00
Simon Pilgrim	95f941117c	Fix whitespace indentation. NFCI. llvm-svn: 329704	2018-04-10 14:21:33 +00:00
Gabor Buella	3eab22d896	[X86] Disable SGX for Skylake Server Reviewers: craig.topper, zvi, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45057 llvm-svn: 329700	2018-04-10 13:58:57 +00:00
Andrea Di Biagio	486358c153	[X86][Broadwell] HWPort5 should not be added to BroadwellModelProcResources. The BroadwellModelProcResources had an entry for HWPort5, which is a Haswell resource, and not a Broadwell processor resource. That entry was added to the Broadwell model because variable blends were consuming it. This was clearly a typo (the resource name should have been BWPort5), which unfortunately was never caught before. It was not reported as an error because HWPort5 is a resource defined by the Haswell model. It has been found when testing some code with llvm-mca: the list of resources in the resource pressure view was odd. This patch fixes the issue; now variable blend instructions consume 2 cycles on BWPort5 instead of HWPort5. This is enough to get rid of the extra (spurious) entry in the BroadWellModelProcResources table. llvm-svn: 329686	2018-04-10 10:49:41 +00:00
Clement Courbet	b449379eae	[MC][TableGen] Add optional libpfm counter names for ProcResUnits. Summary: Subtargets can define the libpfm counter names that can be used to measure cycles and uops issued on ProcResUnits. This allows making llvm-exegesis available on more targets. Fixes PR36984. Reviewers: gchatelet, RKSimon, andreadb, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45360 llvm-svn: 329675	2018-04-10 08:16:37 +00:00
Chandler Carruth	0ca3bd0729	[x86] Model the direction flag (DF) separately from the rest of EFLAGS. This cleans up a number of operations that only claimed te use EFLAGS due to using DF. But no instructions which we think of us setting EFLAGS actually modify DF (other than things like popf) and so this needlessly creates uses of EFLAGS that aren't really there. In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD, and the whole-flags writes (WRFLAGS and POPF) need to model this. I've also somewhat cleaned up some of the flag management instruction definitions to be in the correct .td file. Adding this extra register also uncovered a failure to use the correct datatype to hold X86 registers, and I've corrected that as necessary here. Differential Revision: https://reviews.llvm.org/D45154 llvm-svn: 329673	2018-04-10 06:40:51 +00:00
Craig Topper	7e42af87a6	[X86] Prevent folding loads with 64-bit ANDs with immediates that fit in 32-bits. Prefer to use the 32-bit AND with immediate instead. Primarily I'm doing this to ensure that immediates created by shrinkAndImmediate will always get absorbed into the AND. But I do believe this would be a reduction in the number of uops that need to execute. Ideally we should shrink the 'and' and the 'load' during DAG combine to re-enable the fold. Fixes PR37063. llvm-svn: 329667	2018-04-10 03:44:15 +00:00
Chandler Carruth	19618fc639	[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues. The key idea is to lower COPY nodes populating EFLAGS by scanning the uses of EFLAGS and introducing dedicated code to preserve the necessary state in a GPR. In the vast majority of cases, these uses are cmovCC and jCC instructions. For such cases, we can very easily save and restore the necessary information by simply inserting a setCC into a GPR where the original flags are live, and then testing that GPR directly to feed the cmov or conditional branch. However, things are a bit more tricky if arithmetic is using the flags. This patch handles the vast majority of cases that seem to come up in practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of partially preserved EFLAGS as LLVM doesn't currently model that at all. There are a large number of operations that techinaclly observe EFLAGS currently but shouldn't in this case -- they typically are using DF. Currently, they will not be handled by this approach. However, I have never seen this issue come up in practice. It is already pretty rare to have these patterns come up in practical code with LLVM. I had to resort to writing MIR tests to cover most of the logic in this pass already. I suspect even with its current amount of coverage of arithmetic users of EFLAGS it will be a significant improvement over the current use of pushf/popf. It will also produce substantially faster code in most of the common patterns. This patch also removes all of the old lowering for EFLAGS copies, and the hack that forced us to use a frame pointer when EFLAGS copies were found anywhere in a function so that the dynamic stack adjustment wasn't a problem. None of this is needed as we now lower all of these copies directly in MI and without require stack adjustments. Lots of thanks to Reid who came up with several aspects of this approach, and Craig who helped me work out a couple of things tripping me up while working on this. Differential Revision: https://reviews.llvm.org/D45146 llvm-svn: 329657	2018-04-10 01:41:17 +00:00
Vlad Tsyrklevich	0cdc6ec535	ShadowCallStack/x86_64: Ignore pseudo-machine instructions llvm-svn: 329656	2018-04-10 01:31:01 +00:00
Craig Topper	47b2f9d836	[X86] Don't use Lower512IntUnary to split bitcasts with v32i16/v64i8 types on targets without AVX512BW. LowerIntUnary as its name says has an assert for integer types. But for the bitcast case one side might be an FP type. Rather than making sure the function really works for fp types and renaming it. Just do really basic splitting directly. The LowerIntUnary has the advantage that it can peek through BUILD_VECTOR because every other call is during Lowering. But these calls are during legalization and will be followed by a DAG combine round. Revert some change to LowerVectorIntUnary that were originally made just to make these two calls work even in pure integer cases. This was found purely by compiling the avx512f-builtins.c test from clang so I've copied over the offending function from that. llvm-svn: 329616	2018-04-09 20:37:14 +00:00
Craig Topper	0c2a12cb3e	[X86] Revert the SLM part of r328914. While it appears to be correct information based on Intel's optimization manual and Agner's data, it causes perf regressions on a couple of the benchmarks in our internal list. llvm-svn: 329593	2018-04-09 17:07:40 +00:00
Simon Pilgrim	e5ed5e2cba	[X86][MMX] Fix missing itinerary for PALIGNR llvm-svn: 329568	2018-04-09 13:52:33 +00:00
Simon Pilgrim	140fee078f	[X86][MMX] Fix missing itinerary for MOVQ2DQ instruction format llvm-svn: 329567	2018-04-09 13:42:14 +00:00
Simon Pilgrim	abf3611332	[X86][MMX] Fix missing itinerary for CVTPI2PS llvm-svn: 329565	2018-04-09 13:27:47 +00:00
Simon Pilgrim	0047efdd1e	[X86][MMX] Fix flipped reg/mem typo in MMX_MISC_FUNC_ITINS The RR/RM itineraries were the wrong way around llvm-svn: 329561	2018-04-09 13:02:07 +00:00
Simon Pilgrim	6131286553	[X86][SSE] Fix f32 mul/div itinerary groups typo The RM folded itineraries were incorrectly using the f64 version. llvm-svn: 329556	2018-04-09 10:45:53 +00:00
Sanjay Patel	0d7df36c66	[TargetSchedule] shrink interface for init(); NFCI The TargetSchedModel is always initialized using the TargetSubtargetInfo's MCSchedModel and TargetInstrInfo, so we don't need to extract those and pass 3 parameters to init(). Differential Revision: https://reviews.llvm.org/D44789 llvm-svn: 329540	2018-04-08 19:56:04 +00:00
Craig Topper	b7baa358f6	[X86] Add SchedWrites for CMOV and SETCC. Use them to remove InstRWs. Summary: Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops. This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs. The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc. Reviewers: RKSimon, andreadb, GGanesh Reviewed By: RKSimon Subscribers: GGanesh, llvm-commits Differential Revision: https://reviews.llvm.org/D45380 llvm-svn: 329539	2018-04-08 17:53:18 +00:00
Craig Topper	c362f42b6a	[X86][Znver1] Remove InstRWs for BLENDVPS/PD Summary: This removes the InstRWs for BLENDVPS/PD in favor of WriteFVarBlend. The latency listed was 3 cycles but WriteFVarBlend is defined as 1 cycle latency. The 1 cycle latency matches Agner Fog's data. The patterns were missing the VEX forms which is why there are no test changes. We don't test "-mcpu=znver1 -mattr=-avx" Reviewers: RKSimon, GGanesh Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44841 llvm-svn: 329538	2018-04-08 17:53:15 +00:00
Mandeep Singh Grang	68a151a13c	[X86] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace all std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: chandlerc, craig.topper, RKSimon Reviewed By: chandlerc, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44874 llvm-svn: 329534	2018-04-08 16:42:52 +00:00
Simon Pilgrim	86588fc809	[X86][Btver2] Add vector extract costs llvm-svn: 329524	2018-04-08 11:26:26 +00:00
Craig Topper	ef37aebc96	[X86] Combine vXi64 multiplies to MULDQ/MULUDQ during DAG combine instead of lowering. Previously we used a custom lowering for this because of the AVX1 splitting requirement. But we can do the split during DAG combine if we check the types and subtarget llvm-svn: 329510	2018-04-07 19:09:52 +00:00
Simon Pilgrim	80ce1dde44	[CostModel][X86] Fix v32i16/v64i8 SETCC costs on AVX512BW targets llvm-svn: 329498	2018-04-07 13:24:33 +00:00
Craig Topper	c50570fb4f	[X686] Add appropriate ReadAfterLd for the register input to memory forms of ADC/SBB. llvm-svn: 329424	2018-04-06 17:12:18 +00:00
Craig Topper	b9d298ecf2	[X86] Remove InstRWs for basic arithmetic instructions from Sandy Bridge scheduler model. We can get this right through WriteALU and friends now. llvm-svn: 329417	2018-04-06 16:29:31 +00:00
Craig Topper	f0d042619b	[X86] Attempt to model basic arithmetic instructions in the Haswell/Broadwell/Skylake scheduler models without InstRWs Summary: This patch removes InstRW overrides for basic arithmetic/logic instructions. To do this I've added the store address port to RMW. And used a WriteSequence to make the latency additive. It does not cover ADC/SBB because they have different latency. Apparently we were inconsistent about whether the store has latency or not thus the test changes. I've also left out Sandy Bridge because the load latency there is currently 4 cycles and should be 5. Reviewers: RKSimon, andreadb Reviewed By: andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45351 llvm-svn: 329416	2018-04-06 16:16:48 +00:00
Craig Topper	f131b60049	[X86] Add an extra store address cycle to WriteRMW in the Sandy Bridge/Broadwell/Haswell/Skylake scheduler model. Even those the address was calculated for the load, its calculated again for the store. llvm-svn: 329415	2018-04-06 16:16:46 +00:00
Craig Topper	22d25a08ae	[X86] Merge itineraries for CLC, CMC, and STC. These are very simple flag setting instructions that appear to only be a single uop. They're unlikely to need this separation. llvm-svn: 329414	2018-04-06 16:16:43 +00:00
Simon Pilgrim	09eeb3a8b9	[X86][SandyBridge] Add (V)DPPS memory fold latencies Noticed this during D44654 llvm-svn: 329389	2018-04-06 11:25:21 +00:00
Simon Pilgrim	8a83f16ccd	[X86][SandyBridge] SBWriteResPair +5cy Memory Folds As mentioned on D44647, this patch increases the default memory latency to +5cy , which more closely matches what most custom cases are doing for reg-mem instructions. I've bumped LoadLatency, ReadAfterLd and WriteLoad values to 5cy to be consistent. As Sandy Bridge is currently our default generic model, this affects a lot of scheduling tests... Differential Revision: https://reviews.llvm.org/D44654 llvm-svn: 329388	2018-04-06 11:00:51 +00:00
Simon Pilgrim	fd1f4fe54e	[X86][SkylakeServer] Merge 2 InstRW entries to the same sched group. NFCI. llvm-svn: 329386	2018-04-06 10:16:36 +00:00

1 2 3 4 5 ...

16957 Commits