llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	577ae24feb	[X86] Remove explicit SSE/AVX schedule itineraries from defs (PR37093) llvm-svn: 329940	2018-04-12 19:25:07 +00:00
Simon Pilgrim	35935c0632	[X86] Remove remaining gpr schedule itineraries (PR37093) llvm-svn: 329938	2018-04-12 18:46:15 +00:00
Gabor Buella	297c138798	[X86] Introduce LLVM wbinvd intrinsic A previously missing intrinsic for an old instruction. Reviewers: craig.topper, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45312 llvm-svn: 329936	2018-04-12 18:38:18 +00:00
Simon Pilgrim	dec781c141	[X86] Remove gpr shift/extension schedule itineraries (PR37093) llvm-svn: 329933	2018-04-12 18:25:38 +00:00
Simon Pilgrim	8904a86f65	[X86] Remove AES/CLMUL/CRC32/LDDQU/MOVNT/POPCNT/SHA schedule itineraries (PR37093) llvm-svn: 329912	2018-04-12 14:31:42 +00:00
Simon Pilgrim	294556d40e	[X86] Remove remaining system/special schedule itineraries (PR37093) llvm-svn: 329906	2018-04-12 12:43:49 +00:00
Simon Pilgrim	0cd0fbd8c5	[X86] Remove system/control schedule itineraries (PR37093) llvm-svn: 329903	2018-04-12 12:09:24 +00:00
Simon Pilgrim	69e0e8e3d4	[X86] Remove CMOV/SETCC schedule itineraries (PR37093) llvm-svn: 329898	2018-04-12 11:01:40 +00:00
Simon Pilgrim	10e3bdaaa8	[X86] Remove MMX/3DNow schedule itineraries (PR37093) llvm-svn: 329896	2018-04-12 10:49:57 +00:00
Simon Pilgrim	32d368147f	[X86] Remove X87 schedule itineraries (PR37093) First of a number of commits to remove x86 schedule itineraries entirely - approved off-line with @craig.topper llvm-svn: 329893	2018-04-12 10:27:37 +00:00
Simon Pilgrim	7b88d09e75	[X86] Remove unused itinerary argument from FMA3/FMA4/XOP instructions. NFCI. llvm-svn: 329862	2018-04-11 23:24:38 +00:00
Gabor Buella	2ef36f3571	[X86] Describe wbnoinvd instruction Similar to the wbinvd instruction, except this one does not invalidate caches. Ring 0 only. The encoding matches a wbinvd instruction with an F3 prefix. Reviewers: craig.topper, zvi, ashlykov Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D43816 llvm-svn: 329847	2018-04-11 20:01:57 +00:00
Simon Pilgrim	8fc2b49620	[X86][Atom] Convert Atom scheduler model to SchedRW (PR32431) Atom is the only x86 target that still uses schedule itineraries, if we can remove this then we can begin the work on removing x86 itineraries. I've also found that it will help with PR36550. I've focussed on matching the existing model as closely as possible (relying on the schedule tests), PR36895 indicated a lot of these were incorrect but we can just as easily fix these after this patch as before. Hopefully we can get llvm-exegesis to help here, There are a few instructions that rely on itinerary scheduling (mainly push/pop/return) of multiple resource stages, but I don't think any of these are show stoppers. There are also a few codegen changes that seem related to the post-ra scheduler acting a little differently, I haven't tracked these down but they don't seem critical. NOTE: I don't have access to any Atom hardware, so this hasn't been tested in the wild. Differential Revision: https://reviews.llvm.org/D45486 llvm-svn: 329837	2018-04-11 18:23:01 +00:00
Simon Pilgrim	7f321d8c24	[X86] Generalize X86PadShortFunction to work with TargetSchedModel Pre-commit for D45486, don't rely on itinerary scheduler model to determine latencies for padding, use the generic TargetSchedModel::computeInstrLatency call. Also, replace hard coded (atom specific) 2*uop creation per padding cycle with a version based on the scheduler model's issue width. Differential Revision: https://reviews.llvm.org/D45486 llvm-svn: 329834	2018-04-11 18:05:17 +00:00
Simon Pilgrim	89c8a10f7c	[X86] Add variable shuffle schedule classes Split variable index shuffles from immediate index shuffles WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.) WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.) WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.) WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.) Differential Revision: https://reviews.llvm.org/D45404 llvm-svn: 329806	2018-04-11 13:49:19 +00:00
Craig Topper	9507fa358c	[X86] Remove 128/256-bit masked pmaddubsw and pmaddwd intrinsics. Replace 512-bit masked intrinsic with unmasked intrinsic and a select. The 128/256-bit versions were no longer used by clang. It uses the legacy SSE/AVX2 version and a select. The 512-bit was changed to the same for consistency. llvm-svn: 329774	2018-04-11 04:55:04 +00:00
Craig Topper	ee2c1dea4d	[X86] In X86FlagsCopyLowering, when rewriting a memory setcc we need to emit an explicit MOV8mr instruction. Previously the code only knew how to handle setcc to a register. This should fix a crash in the chromium build. llvm-svn: 329771	2018-04-11 01:09:10 +00:00
Sriraman Tallam	d693093a65	GOTPCREL references must always use RIP. With -fno-plt, global value references can use GOTPCREL and RIP must be used. Differential Revision: https://reviews.llvm.org/D45460 llvm-svn: 329765	2018-04-10 22:50:05 +00:00
Gabor Buella	213edc4a15	[X86] Split up -march=icelake to -client & -server Reviewers: craig.topper, zvi, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45055 llvm-svn: 329742	2018-04-10 18:59:13 +00:00
Craig Topper	442428540a	[X86] Change the name string for the newly add DF flag register to 'dirflag' to match the clobber name supported by clang for MS inline assembly. This should fix the failure found by Chromium reported here https://bugs.chromium.org/p/chromium/issues/detail?id=831158 The test case will be added in clang. llvm-svn: 329734	2018-04-10 18:21:04 +00:00
Simon Pilgrim	95f941117c	Fix whitespace indentation. NFCI. llvm-svn: 329704	2018-04-10 14:21:33 +00:00
Gabor Buella	3eab22d896	[X86] Disable SGX for Skylake Server Reviewers: craig.topper, zvi, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45057 llvm-svn: 329700	2018-04-10 13:58:57 +00:00
Andrea Di Biagio	486358c153	[X86][Broadwell] HWPort5 should not be added to BroadwellModelProcResources. The BroadwellModelProcResources had an entry for HWPort5, which is a Haswell resource, and not a Broadwell processor resource. That entry was added to the Broadwell model because variable blends were consuming it. This was clearly a typo (the resource name should have been BWPort5), which unfortunately was never caught before. It was not reported as an error because HWPort5 is a resource defined by the Haswell model. It has been found when testing some code with llvm-mca: the list of resources in the resource pressure view was odd. This patch fixes the issue; now variable blend instructions consume 2 cycles on BWPort5 instead of HWPort5. This is enough to get rid of the extra (spurious) entry in the BroadWellModelProcResources table. llvm-svn: 329686	2018-04-10 10:49:41 +00:00
Clement Courbet	b449379eae	[MC][TableGen] Add optional libpfm counter names for ProcResUnits. Summary: Subtargets can define the libpfm counter names that can be used to measure cycles and uops issued on ProcResUnits. This allows making llvm-exegesis available on more targets. Fixes PR36984. Reviewers: gchatelet, RKSimon, andreadb, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45360 llvm-svn: 329675	2018-04-10 08:16:37 +00:00
Chandler Carruth	0ca3bd0729	[x86] Model the direction flag (DF) separately from the rest of EFLAGS. This cleans up a number of operations that only claimed te use EFLAGS due to using DF. But no instructions which we think of us setting EFLAGS actually modify DF (other than things like popf) and so this needlessly creates uses of EFLAGS that aren't really there. In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD, and the whole-flags writes (WRFLAGS and POPF) need to model this. I've also somewhat cleaned up some of the flag management instruction definitions to be in the correct .td file. Adding this extra register also uncovered a failure to use the correct datatype to hold X86 registers, and I've corrected that as necessary here. Differential Revision: https://reviews.llvm.org/D45154 llvm-svn: 329673	2018-04-10 06:40:51 +00:00
Craig Topper	7e42af87a6	[X86] Prevent folding loads with 64-bit ANDs with immediates that fit in 32-bits. Prefer to use the 32-bit AND with immediate instead. Primarily I'm doing this to ensure that immediates created by shrinkAndImmediate will always get absorbed into the AND. But I do believe this would be a reduction in the number of uops that need to execute. Ideally we should shrink the 'and' and the 'load' during DAG combine to re-enable the fold. Fixes PR37063. llvm-svn: 329667	2018-04-10 03:44:15 +00:00
Chandler Carruth	19618fc639	[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues. The key idea is to lower COPY nodes populating EFLAGS by scanning the uses of EFLAGS and introducing dedicated code to preserve the necessary state in a GPR. In the vast majority of cases, these uses are cmovCC and jCC instructions. For such cases, we can very easily save and restore the necessary information by simply inserting a setCC into a GPR where the original flags are live, and then testing that GPR directly to feed the cmov or conditional branch. However, things are a bit more tricky if arithmetic is using the flags. This patch handles the vast majority of cases that seem to come up in practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of partially preserved EFLAGS as LLVM doesn't currently model that at all. There are a large number of operations that techinaclly observe EFLAGS currently but shouldn't in this case -- they typically are using DF. Currently, they will not be handled by this approach. However, I have never seen this issue come up in practice. It is already pretty rare to have these patterns come up in practical code with LLVM. I had to resort to writing MIR tests to cover most of the logic in this pass already. I suspect even with its current amount of coverage of arithmetic users of EFLAGS it will be a significant improvement over the current use of pushf/popf. It will also produce substantially faster code in most of the common patterns. This patch also removes all of the old lowering for EFLAGS copies, and the hack that forced us to use a frame pointer when EFLAGS copies were found anywhere in a function so that the dynamic stack adjustment wasn't a problem. None of this is needed as we now lower all of these copies directly in MI and without require stack adjustments. Lots of thanks to Reid who came up with several aspects of this approach, and Craig who helped me work out a couple of things tripping me up while working on this. Differential Revision: https://reviews.llvm.org/D45146 llvm-svn: 329657	2018-04-10 01:41:17 +00:00
Vlad Tsyrklevich	0cdc6ec535	ShadowCallStack/x86_64: Ignore pseudo-machine instructions llvm-svn: 329656	2018-04-10 01:31:01 +00:00
Craig Topper	47b2f9d836	[X86] Don't use Lower512IntUnary to split bitcasts with v32i16/v64i8 types on targets without AVX512BW. LowerIntUnary as its name says has an assert for integer types. But for the bitcast case one side might be an FP type. Rather than making sure the function really works for fp types and renaming it. Just do really basic splitting directly. The LowerIntUnary has the advantage that it can peek through BUILD_VECTOR because every other call is during Lowering. But these calls are during legalization and will be followed by a DAG combine round. Revert some change to LowerVectorIntUnary that were originally made just to make these two calls work even in pure integer cases. This was found purely by compiling the avx512f-builtins.c test from clang so I've copied over the offending function from that. llvm-svn: 329616	2018-04-09 20:37:14 +00:00
Craig Topper	0c2a12cb3e	[X86] Revert the SLM part of r328914. While it appears to be correct information based on Intel's optimization manual and Agner's data, it causes perf regressions on a couple of the benchmarks in our internal list. llvm-svn: 329593	2018-04-09 17:07:40 +00:00
Simon Pilgrim	e5ed5e2cba	[X86][MMX] Fix missing itinerary for PALIGNR llvm-svn: 329568	2018-04-09 13:52:33 +00:00
Simon Pilgrim	140fee078f	[X86][MMX] Fix missing itinerary for MOVQ2DQ instruction format llvm-svn: 329567	2018-04-09 13:42:14 +00:00
Simon Pilgrim	abf3611332	[X86][MMX] Fix missing itinerary for CVTPI2PS llvm-svn: 329565	2018-04-09 13:27:47 +00:00
Simon Pilgrim	0047efdd1e	[X86][MMX] Fix flipped reg/mem typo in MMX_MISC_FUNC_ITINS The RR/RM itineraries were the wrong way around llvm-svn: 329561	2018-04-09 13:02:07 +00:00
Simon Pilgrim	6131286553	[X86][SSE] Fix f32 mul/div itinerary groups typo The RM folded itineraries were incorrectly using the f64 version. llvm-svn: 329556	2018-04-09 10:45:53 +00:00
Sanjay Patel	0d7df36c66	[TargetSchedule] shrink interface for init(); NFCI The TargetSchedModel is always initialized using the TargetSubtargetInfo's MCSchedModel and TargetInstrInfo, so we don't need to extract those and pass 3 parameters to init(). Differential Revision: https://reviews.llvm.org/D44789 llvm-svn: 329540	2018-04-08 19:56:04 +00:00
Craig Topper	b7baa358f6	[X86] Add SchedWrites for CMOV and SETCC. Use them to remove InstRWs. Summary: Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops. This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs. The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc. Reviewers: RKSimon, andreadb, GGanesh Reviewed By: RKSimon Subscribers: GGanesh, llvm-commits Differential Revision: https://reviews.llvm.org/D45380 llvm-svn: 329539	2018-04-08 17:53:18 +00:00
Craig Topper	c362f42b6a	[X86][Znver1] Remove InstRWs for BLENDVPS/PD Summary: This removes the InstRWs for BLENDVPS/PD in favor of WriteFVarBlend. The latency listed was 3 cycles but WriteFVarBlend is defined as 1 cycle latency. The 1 cycle latency matches Agner Fog's data. The patterns were missing the VEX forms which is why there are no test changes. We don't test "-mcpu=znver1 -mattr=-avx" Reviewers: RKSimon, GGanesh Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44841 llvm-svn: 329538	2018-04-08 17:53:15 +00:00
Mandeep Singh Grang	68a151a13c	[X86] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace all std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: chandlerc, craig.topper, RKSimon Reviewed By: chandlerc, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44874 llvm-svn: 329534	2018-04-08 16:42:52 +00:00
Simon Pilgrim	86588fc809	[X86][Btver2] Add vector extract costs llvm-svn: 329524	2018-04-08 11:26:26 +00:00
Craig Topper	ef37aebc96	[X86] Combine vXi64 multiplies to MULDQ/MULUDQ during DAG combine instead of lowering. Previously we used a custom lowering for this because of the AVX1 splitting requirement. But we can do the split during DAG combine if we check the types and subtarget llvm-svn: 329510	2018-04-07 19:09:52 +00:00
Simon Pilgrim	80ce1dde44	[CostModel][X86] Fix v32i16/v64i8 SETCC costs on AVX512BW targets llvm-svn: 329498	2018-04-07 13:24:33 +00:00
Craig Topper	c50570fb4f	[X686] Add appropriate ReadAfterLd for the register input to memory forms of ADC/SBB. llvm-svn: 329424	2018-04-06 17:12:18 +00:00
Craig Topper	b9d298ecf2	[X86] Remove InstRWs for basic arithmetic instructions from Sandy Bridge scheduler model. We can get this right through WriteALU and friends now. llvm-svn: 329417	2018-04-06 16:29:31 +00:00
Craig Topper	f0d042619b	[X86] Attempt to model basic arithmetic instructions in the Haswell/Broadwell/Skylake scheduler models without InstRWs Summary: This patch removes InstRW overrides for basic arithmetic/logic instructions. To do this I've added the store address port to RMW. And used a WriteSequence to make the latency additive. It does not cover ADC/SBB because they have different latency. Apparently we were inconsistent about whether the store has latency or not thus the test changes. I've also left out Sandy Bridge because the load latency there is currently 4 cycles and should be 5. Reviewers: RKSimon, andreadb Reviewed By: andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45351 llvm-svn: 329416	2018-04-06 16:16:48 +00:00
Craig Topper	f131b60049	[X86] Add an extra store address cycle to WriteRMW in the Sandy Bridge/Broadwell/Haswell/Skylake scheduler model. Even those the address was calculated for the load, its calculated again for the store. llvm-svn: 329415	2018-04-06 16:16:46 +00:00
Craig Topper	22d25a08ae	[X86] Merge itineraries for CLC, CMC, and STC. These are very simple flag setting instructions that appear to only be a single uop. They're unlikely to need this separation. llvm-svn: 329414	2018-04-06 16:16:43 +00:00
Simon Pilgrim	09eeb3a8b9	[X86][SandyBridge] Add (V)DPPS memory fold latencies Noticed this during D44654 llvm-svn: 329389	2018-04-06 11:25:21 +00:00
Simon Pilgrim	8a83f16ccd	[X86][SandyBridge] SBWriteResPair +5cy Memory Folds As mentioned on D44647, this patch increases the default memory latency to +5cy , which more closely matches what most custom cases are doing for reg-mem instructions. I've bumped LoadLatency, ReadAfterLd and WriteLoad values to 5cy to be consistent. As Sandy Bridge is currently our default generic model, this affects a lot of scheduling tests... Differential Revision: https://reviews.llvm.org/D44654 llvm-svn: 329388	2018-04-06 11:00:51 +00:00
Simon Pilgrim	fd1f4fe54e	[X86][SkylakeServer] Merge 2 InstRW entries to the same sched group. NFCI. llvm-svn: 329386	2018-04-06 10:16:36 +00:00
Craig Topper	fbe3132f67	[X86] Separate CDQ and CDQE in the scheduler model. According to Agner's data, CDQE is closer to CWDE. llvm-svn: 329354	2018-04-05 21:56:19 +00:00
Craig Topper	4cc3827791	[X86] Add MOVZPQILo2PQIrr to the Sandy Bridge scheduler model llvm-svn: 329351	2018-04-05 21:40:32 +00:00
Craig Topper	3b0b96c591	[X86] Add LEAVE instruction to the scheduler models using the same data as LEAVE64. Make LEAVE/LEAVE64 more correct on Sandy Bridge. This is the 32-bit mode version of LEAVE64. It should be at least somewhat similar to LEAVE64. The Sandy Bridge version was missing a load port use. llvm-svn: 329347	2018-04-05 21:16:26 +00:00
Craig Topper	c6bb36a3d0	[X86] Remove some InstRWs for plain store instructions on Sandy Bridge. We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion. llvm-svn: 329339	2018-04-05 20:04:06 +00:00
Craig Topper	9eec2025c5	[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329330	2018-04-05 18:38:45 +00:00
Craig Topper	665f74414d	[X86] Disassembler support for having an ADSIZE prefix affect instructions with 0xf2 and 0xf3 prefixes. Needed to support umonitor from D45253. llvm-svn: 329327	2018-04-05 18:20:14 +00:00
Craig Topper	6ecdb03f16	[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256. llvm-svn: 329310	2018-04-05 16:32:48 +00:00
Andrea Di Biagio	c74ad502ce	[MC][Tablegen] Allow models to describe the retire control unit for llvm-mca. This patch adds the ability to describe properties of the hardware retire control unit. Tablegen class RetireControlUnit has been added for this purpose (see TargetSchedule.td). A RetireControlUnit specifies the size of the reorder buffer, as well as the maximum number of opcodes that can be retired every cycle. A zero (or negative) value for the reorder buffer size means: "the size is unknown". If the size is unknown, then llvm-mca defaults it to the value of field SchedMachineModel::MicroOpBufferSize. A zero or negative number of opcodes retired per cycle means: "there is no restriction on the number of instructions that can be retired every cycle". Models can optionally specify an instance of RetireControlUnit. There can only be up-to one RetireControlUnit definition per scheduling model. Information related to the RCU (RetireControlUnit) is stored in (two new fields of) MCExtraProcessorInfo. llvm-mca loads that information when it initializes the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp). This patch fixes PR36661. Differential Revision: https://reviews.llvm.org/D45259 llvm-svn: 329304	2018-04-05 15:41:41 +00:00
Craig Topper	15303dda0d	[X86] Revert r329251-329254 It's failing on the bots and I'm not sure why. This reverts: [X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. [X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256. [X86] Remove some InstRWs for plain store instructions on Sandy Bridge. [X86] Auto-generate complete checks. NFC llvm-svn: 329256	2018-04-05 05:19:36 +00:00
Craig Topper	25c7110a37	[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329254	2018-04-05 04:42:03 +00:00
Craig Topper	4b1fdd4921	[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256. llvm-svn: 329253	2018-04-05 04:42:02 +00:00
Craig Topper	5c36557426	[X86] Auto-generate complete checks. NFC llvm-svn: 329251	2018-04-05 04:41:59 +00:00
Jessica Paquette	bccd18b816	[MachineOutliner] Add `useMachineOutliner` target hook The MachineOutliner has a bunch of target hooks that will call llvm_unreachable if the target doesn't implement them. Therefore, if you enable the outliner on such a target, it'll just crash. It'd be much better if it'd just not run the outliner at all in this case. This commit adds a hook to TargetInstrInfo that returns false by default. Targets that implement the hook make it return true. The outliner checks the return value of this hook to decide whether or not to continue. llvm-svn: 329220	2018-04-04 19:13:31 +00:00
Craig Topper	498875fab0	[X86] Separate BSWAP32r and BSWAP64r scheduling data in SandyBridge/Haswell/Broadwell/Skylake scheduler models. The BSWAP64r version is 2 uops and BSWAP32r is only 1 uop. The regular expressions also looked for a non-existant BSWAP16r. llvm-svn: 329211	2018-04-04 17:54:19 +00:00
Nico Weber	1cbd096914	Sort targetgen calls in lib/Target/*/CMakeLists. Makes it easier to see mistakes such as the one fixed in r329178 and makes the different target CMakeLists more consistent. Also remove some stale-looking comments from the Nios2 target cmakefile. No intended behavior change. llvm-svn: 329181	2018-04-04 12:37:44 +00:00
Benjamin Kramer	1fc0da4849	Make helpers static. NFC. llvm-svn: 329170	2018-04-04 11:45:11 +00:00
Craig Topper	a30db995b3	[X86] Use the same predicate for the load for PMOVSXBQ and PMOVZXBQ. These both use a 16-bit load, but one used loadi16_anyext and the other used extloadi32i16. The only difference between them is that loadi16_anyext checked that the load was at least 2 byte aligned and non-volatile. But the alignment doesn't matter here. Just use extloadi32i16 for both. llvm-svn: 329154	2018-04-04 07:00:24 +00:00
Craig Topper	a3cac956fc	[X86] Use loadi16/loadi32 predicates in multiply patterns llvm-svn: 329153	2018-04-04 07:00:19 +00:00
Craig Topper	88e38e3e3e	[X86] Remove more dead code left over from the handling of i8/i16 UMUL_LOHI/SMUL_LOHI that is no longer needed. NFC llvm-svn: 329152	2018-04-04 07:00:16 +00:00
Craig Topper	afa22edcf0	[X86] Remove dead code for handling i8/i16 UMUL_LOHI/SMUL_LOHI from X86ISelDAGToDAG.cpp. NFC These are promoted to i16/i32 multiplies by a DAG combine. llvm-svn: 329147	2018-04-04 04:38:55 +00:00
Craig Topper	3064c15dc3	[X86] Remove some code that was only needed when i1 was a legal type. NFC llvm-svn: 329146	2018-04-04 04:38:54 +00:00
Vlad Tsyrklevich	b324733169	Fix bad #include path in r329139 llvm-svn: 329140	2018-04-04 01:34:42 +00:00
Vlad Tsyrklevich	e3446017ed	Add the ShadowCallStack pass Summary: The ShadowCallStack pass instruments functions marked with the shadowcallstack attribute. The instrumented prolog saves the return address to [gs:offset] where offset is stored and updated in [gs:0]. The instrumented epilog loads/updates the return address from [gs:0] and checks that it matches the return address on the stack before returning. Reviewers: pcc, vitalybuka Reviewed By: pcc Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D44802 llvm-svn: 329139	2018-04-04 01:21:16 +00:00
Jessica Paquette	5fa2a63785	[MachineOutliner] Test for X86FI->getUsesRedZone() as well as Attribute::NoRedZone This commit is similar to r329120, but uses the existing getUsesRedZone() function in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a function truly uses a redzone instead of just the noredzone attribute on a function. Thus, after this commit, it's possible to outline from x86 without using -mno-red-zone and still get outlining results. This also adds a new test for the new redzone behaviour. llvm-svn: 329134	2018-04-03 23:32:41 +00:00
Andrea Di Biagio	9da4d6db33	[MC][Tablegen] Allow the definition of processor register files in the scheduling model for llvm-mca This patch allows the description of register files in processor scheduling models. This addresses PR36662. A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td. Targets can optionally describe register files for their processors using that class. In particular, class RegisterFile allows to specify: - The total number of physical registers. - Which target registers are accessible through the register file. - The cost of allocating a register at register renaming stage. Example (from this patch - see file X86/X86ScheduleBtVer2.td) def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]> Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar (btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM register definitions only cost 1 physical register. The syntax allows to specify an empty set of register classes. An empty set of register classes means: this register file models all the registers specified by the Target. For each register class, users can specify an optional register cost. By default, register costs default to 1. A value of 0 for the number of physical registers means: "this register file has an unbounded number of physical registers". This patch is structured in two parts. * Part 1 - MC/Tablegen * A first part adds the tablegen definition of RegisterFile, and teaches the SubtargetEmitter how to emit information related to register files. Information about register files is accessible through an instance of MCExtraProcessorInfo. The idea behind this design is to logically partition the processor description which is only used by external tools (like llvm-mca) from the processor information used by the llvm machine schedulers. I think that this design would make easier for targets to get rid of the extra processor information if they don't want it. * Part 2 - llvm-mca related * The second part of this patch is related to changes to llvm-mca. The main differences are: 1) class RegisterFile now needs to take into account the "cost of a register" when allocating physical registers at register renaming stage. 2) Point 1. triggered a minor refactoring which lef to the removal of the "maximum 32 register files" restriction. 3) The BackendStatistics view has been updated so that we can print out extra details related to each register file implemented by the processor. The effect of point 3. is also visible in tests register-files-[1..5].s. Differential Revision: https://reviews.llvm.org/D44980 llvm-svn: 329067	2018-04-03 13:36:24 +00:00
Craig Topper	9b6a65b9ef	[X86] Reduce number of OpPrefix bits in TSFlags to 2. NFCI TSFlag doesn't need to disambiguate NoPrfx from PS. So shift the encodings so PS is NoPrfx\|0x4. llvm-svn: 329049	2018-04-03 06:37:04 +00:00
Lama Saba	927468309f	[X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346 If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Differential revision: https://reviews.llvm.org/D41330 Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9 llvm-svn: 328973	2018-04-02 13:48:28 +00:00
Craig Topper	96729cd64b	[X86][Silvermont] Use correct latency and throughput information for divide and square root in the scheduler model. Data taken from Table 16-17 in the Intel Optimization Manual. llvm-svn: 328962	2018-04-02 06:34:16 +00:00
Craig Topper	6a814904da	[X86][SkylakeServer] Correct throughput for 512-bit sqrt and divide. Data taken from the AVX512_SKX_PortAssign spreadsheet at http://instlatx64.atw.hu/ llvm-svn: 328961	2018-04-02 05:54:34 +00:00
Craig Topper	8104f266a4	[X86] Correct the throughput for divide instructions in Sandy Bridge/Haswell/Broadwell/Skylake scheduler models. Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those. llvm-svn: 328960	2018-04-02 05:33:28 +00:00
Craig Topper	dc74094398	[X86] Fix the SchedRW for AVX512 shift instructions. It was being inadvertently defaulted to an FADD scheduler class. llvm-svn: 328959	2018-04-02 03:15:02 +00:00
Craig Topper	5fb1dc2d22	[X86] Give the AVX512 VEXTRACT instructions the same SchedRWs as the SSE/AVX versions. llvm-svn: 328958	2018-04-02 02:44:55 +00:00
Craig Topper	caec723a1a	[X86] Add an itinerary to BTR64rr. llvm-svn: 328956	2018-04-02 01:12:34 +00:00
Craig Topper	02daec00a2	[X86] Make sure all the classes declare in the Haswell scheduler model are prefixed with HW. The tablegen files all share a namespace so we shouldn't use a generic names in a specific scheduler model. llvm-svn: 328955	2018-04-02 01:12:32 +00:00
Craig Topper	c90d906b16	[X86] Give VINSERTPS the same intinerary as INSERTPS. llvm-svn: 328954	2018-04-02 00:48:11 +00:00
Craig Topper	dc4a6d1ef6	[X86] Cleanup ADCX/ADOX instruction definitions. Give them both the same itineraries. Add hasSideEffects = 0 to ADOX since they don't have patterns. Rename source operands to $src1 and $src2 instead of $src0 and $src. Add ReadAfterLd to the memory form SchedRW. llvm-svn: 328952	2018-04-01 23:58:50 +00:00
Craig Topper	9f834810ea	[X86] Give ADC8/16/32/64mi the same scheduling information as ADC8/16/32/64mr and SBB8/16/32/64mi. It doesn't make a lot of sense that it would be different. llvm-svn: 328946	2018-04-01 21:54:24 +00:00
Chandler Carruth	4244625c51	[x86] Correct the operand structure of the ADOX instruction. This also moves to define it in the same way as ADCX which seems to use constraints a bit better. This is pulled out of the review for reducing the use of popf for restoring EFLAGS, but is independent. There are still more problems with our definitions for these instructions that Craig is going to look at but this is at least less broken and he can start from this to improve them more fully. Thanks to Craig for the review here. llvm-svn: 328945	2018-04-01 21:53:18 +00:00
Chandler Carruth	06b343c6ed	[x86] Expose more of the condition conversion routines in the public API for X86's instruction information. I've now got a second patch under review that needs these same APIs. This bit is nicely orthogonal and obvious, so landing it. NFC. llvm-svn: 328944	2018-04-01 21:47:55 +00:00
Craig Topper	9b8cd5fe55	[X86] Don't check for folding into a store when deciding if we can promote an i16 mul. There's no RMW mul operation. llvm-svn: 328931	2018-04-01 06:29:32 +00:00
Craig Topper	db6caabccc	[X86] Check if the load and store are to the same pointer before preventing i16 RMW shifts and subtracts from being promoted. llvm-svn: 328930	2018-04-01 06:29:28 +00:00
Craig Topper	ae2de57db0	[X86] Allow i16 subtracts to be promoted if the load is on the LHS and its not being stored. llvm-svn: 328928	2018-04-01 06:29:25 +00:00
Craig Topper	9bc0d881a3	[X86] Remove unneeded temporary variable. NFC This Promote flag was alwasys set to true except in the default case. But in the default case we don't need to set PVT and can just return false. llvm-svn: 328926	2018-04-01 06:29:21 +00:00
Simon Pilgrim	3b8ad346f9	[X86][Btver2] Add MMX_PSHUFB to the JWritePSHUFB InstRW entries llvm-svn: 328918	2018-03-31 09:15:54 +00:00
Simon Pilgrim	8c8ebd7945	Fix trailing whitespace. NFCI. llvm-svn: 328917	2018-03-31 09:14:14 +00:00
Craig Topper	13a0f83a05	[X86] Add SchedRW for PMULLD Summary: It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput. This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet. I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs. Reviewers: RKSimon, GGanesh, courbet Reviewed By: RKSimon Subscribers: gchatelet, gbedwell, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D44972 llvm-svn: 328914	2018-03-31 04:54:32 +00:00
Fangrui Song	956ee79795	Fix a bunch of typoes. NFC llvm-svn: 328907	2018-03-30 22:22:31 +00:00
Andrea Di Biagio	dc97172b2f	[X86][BtVer2] Fixed the number of micro opcodes for AVX vector converts and VSQRT instructions. There were still a few AVX instructions with an incorrect number of opcodes. These should be fixed now. llvm-svn: 328892	2018-03-30 18:53:47 +00:00
Andrea Di Biagio	3eaa26bb64	[X86][BtVer2] Fix the number of uOps for horizontal operations. llvm-svn: 328886	2018-03-30 18:15:30 +00:00
Andrea Di Biagio	073a9d74ca	[X86][BtVer2] Add missing ReadAfterLd to RM variants of AVX horizontal adds and most vector logic instructions. Fixed a few InstRW that forgot to specify a ReadAfterLd for the register input operand. llvm-svn: 328867	2018-03-30 14:48:08 +00:00
Craig Topper	ee3c19fd7f	[X86] Add ReadAfterLds to some 3 src instructions Sometimes the operand comes after the memory operand so we need 5 ReadDefaults first. I suspect we also need to do something for the mask operand for masked avx512 instructions? I'm not sure if the mask should be ReadAfterLd or not since it can mask faults. If it shouldn't be ReadAfterLd then we're probably wrong for zero masking instructions already. Differential Revision: https://reviews.llvm.org/D44726 llvm-svn: 328834	2018-03-29 22:03:05 +00:00
Craig Topper	3f2dbec652	[X86] Remove ReadAfterLd from BMI and TBM instructions that don't have a register operand in their memory form The memory form of these instructions only read an input from memory. They don't have any register operands. Differential Revision: https://reviews.llvm.org/D44836 llvm-svn: 328828	2018-03-29 21:03:53 +00:00
Craig Topper	89310f56c8	[X86] Correct the placement of ReadAfterLd in BEXTR and BZHI. Add dedicated SchedRW for BEXTR/BZHI. These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd Differential Revision: https://reviews.llvm.org/D44838 llvm-svn: 328823	2018-03-29 20:41:39 +00:00
Craig Topper	2fa1436206	[IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806	2018-03-29 17:21:10 +00:00
Simon Pilgrim	71c5f3fffd	[X86][SSE] Don't bother re-adding combined target shuffles to the work list We are re-adding all the bitcasts, constant masks and target shuffles to the work list for no apparent gain. Found while investigating adding SimplifyDemandedVectorElts to target shuffles. Differential Revision: https://reviews.llvm.org/D44942 llvm-svn: 328771	2018-03-29 11:18:41 +00:00
Craig Topper	a21758fa2c	[X86] Don't pass getRegisterName from the InstPrinters into EmitAnyX86InstComments. Just always use the function from the ATTPrinter. NFC The IntelPrinter and the ATTPrinter produce the same strings for the same input. We already use the ATTPrinter explicitly in several other places. llvm-svn: 328762	2018-03-29 04:14:04 +00:00
Craig Topper	7456af88f4	[X86] Rename RIi64_NOREX tblgen class to just Ii64. Make RIi64 inherit from it. NFC This feels more consistent with the other classes. We don't need to say _NOREX if we didn't start it with an R in the first place. llvm-svn: 328757	2018-03-29 03:14:57 +00:00
Craig Topper	7441ffff84	[X86] Cleanup inheritance of the X86InstrFormats.td classes. NFC EVEX shouldn't inherit from VEX and EVEX_4V shouldn't inherit from VEX_4V. llvm-svn: 328756	2018-03-29 03:14:56 +00:00
Craig Topper	aac23d7881	[X86][SkylakeServer] Remove checks for 'k', 'z', '_Int' and 'b' from scheduler regexs. Most of these were optional matches at the end of the strings, but since the strings themselves are prefix matches by default you don't need to check for something optional at the end. I've left the 'b' on memory instructions where it means 'broadcast' because I'm not sure those really have the same load latency and we may need to split them explicitly in the future. llvm-svn: 328730	2018-03-28 20:40:24 +00:00
Simon Pilgrim	b1bc6cd96b	[X86][Btver2] Moved JWriteFCmp/JWriteFCmpY classes next to each other. NFCI Renamed JWriteFPAY22 to JWriteFCmpY - we've tended to avoid latency based names llvm-svn: 328701	2018-03-28 13:53:21 +00:00
Andrea Di Biagio	5076b98fb9	[X86][BtVer2] Fix the number of micro opcodes for AES[ENC\|DEC] and other YMM instructions. Similar to r328694. The number of micro opcodes should be 2 for those instructions. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328698	2018-03-28 12:12:04 +00:00
Andrea Di Biagio	010924e35c	[X86][BtVer2] Fix the number of micro opcodes for a bunch of YMM instructions. The Jaguar backend natively supports 128-bit data types. Operations on YMM registers are split into two COPs (complex operations). Each COP consumes a slot in the dispatch group, and in the reorder buffer. The scheduling model for Jaguar should mark those instructions as `let NumMicroOps = 2`. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328694	2018-03-28 10:49:33 +00:00
Simon Pilgrim	a2f26788a3	[X86] Add WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classes Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer. Differential Revision: https://reviews.llvm.org/D44924 llvm-svn: 328664	2018-03-27 20:38:54 +00:00
Simon Pilgrim	5f7ab4fedf	[X86][Btver2] Add MMX_PMOVMSKBrr to MOVMSK scheduler class llvm-svn: 328620	2018-03-27 12:26:12 +00:00
Simon Pilgrim	28e7bcbba6	[X86] Add WriteCRC32 scheduler class Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis. Differential Revision: https://reviews.llvm.org/D44647 llvm-svn: 328582	2018-03-26 21:06:14 +00:00
Simon Pilgrim	fcf49df21c	[X86][Btver2] Add (U)COMISD/(U)COMISD scheduler costs Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) llvm-svn: 328573	2018-03-26 19:01:06 +00:00
Reid Kleckner	41fb2dba9c	[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32 Summary: Re-lands r328386 and r328443, reverting r328482. Incorporates fixes from @mstorsjo in D44876 (thanks!) so that small parameters in i8 and i16 do not end up in the SysV register parameters (EDI, ESI, etc). I added tests for how we receive small parameters, since that is the important part. It's always safe to store more bytes than will be read, but the assumptions you make when loading them are what really matter. I also tested this by self-hosting clang and it passed tests on win64. Reviewers: mstorsjo, hans Subscribers: hiraditya, mstorsjo, llvm-commits Differential Revision: https://reviews.llvm.org/D44900 llvm-svn: 328570	2018-03-26 18:49:48 +00:00
Simon Pilgrim	f33d905293	[X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes (PR36881) Give the bit count instructions their own scheduler classes instead of forcing them into existing classes. These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar). Differential Revision: https://reviews.llvm.org/D44879 llvm-svn: 328566	2018-03-26 18:19:28 +00:00
Simon Pilgrim	86ea53123d	[X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costs We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR..... llvm-svn: 328551	2018-03-26 17:02:02 +00:00
Simon Pilgrim	8815105cd5	[X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costs llvm-svn: 328541	2018-03-26 16:24:13 +00:00
Simon Pilgrim	aa40148cae	[X86][Btver2] Account for the "+i" integer pipe transfer costs (1cy use of JALU0 for GPR PRF write) llvm-svn: 328536	2018-03-26 16:10:08 +00:00
Simon Pilgrim	0b73b29388	[X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costs Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) This also adds missing vcvttss2si tests llvm-svn: 328505	2018-03-26 15:30:47 +00:00
Simon Pilgrim	3aa9344605	[X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costs These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults. llvm-svn: 328501	2018-03-26 14:44:24 +00:00
Simon Pilgrim	67df1cf597	[X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costs The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps llvm-svn: 328497	2018-03-26 14:03:40 +00:00
Simon Pilgrim	caa203aed5	[X86][Btver2] Double the AGU and schedule pipe resources for YMM Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model. llvm-svn: 328491	2018-03-26 13:15:20 +00:00
Hans Wennborg	311b63f13b	Revert r328386 "[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32" This broke Chromium (see crbug.com/825748). It looks like mstorsjo's follow-up patch at D44876 fixes this, but let's revert back to green for now until that's ready to land. (Also reverts r328443.) > Both GCC and MSVC only look at the low byte of a boolean when it is > passed. llvm-svn: 328482	2018-03-26 10:07:51 +00:00
Craig Topper	6f28d3c954	[X86] Fix the SchedRW for intrinsic register form of SQRT/RCP/RSQRT. llvm-svn: 328474	2018-03-26 05:05:12 +00:00
Craig Topper	cdfcf8ecda	[X86] Merge the SSE and AVX versions of fp divs and sqrts in the SandyBridge/Haswell/Broadwell/Skylake scheduler models. I've used Agner's data as best I could to get the values to converge on. llvm-svn: 328473	2018-03-26 05:05:10 +00:00
Craig Topper	fbf2d850e3	[X86] Add itinerary to intrinsic version of sqrtss, rcpss, and rsqrtss instructions. llvm-svn: 328472	2018-03-26 04:20:36 +00:00
Craig Topper	c049cb7823	[X86] Correct the itineraries for the dot production instructions. llvm-svn: 328471	2018-03-26 02:17:15 +00:00
Craig Topper	4367874bc5	[X86] Use the same itinerary for VCVTDQ2PD as the SSE version so that the generated scheduler classes will merge. llvm-svn: 328470	2018-03-26 02:17:14 +00:00
Craig Topper	659f85af14	[X86] Swap the itineraries on the memory and register forms of CVTDQ2PD. They were backwards. llvm-svn: 328469	2018-03-26 02:17:13 +00:00
Craig Topper	4bf23eddaf	[X86] Give VMOVSX/ZX the same itinerary as the SSE version so they'll reuse the same generated scheduler class. llvm-svn: 328468	2018-03-26 02:17:12 +00:00
Craig Topper	6e8d99bbea	[X86] Give vpmsadbw the same itinerary as the SSE version so they'll be able to share the same generated scheduler class. llvm-svn: 328466	2018-03-25 23:52:06 +00:00
Craig Topper	15fef89ad9	[X86] Move (v)movss to port 5 only for Skylake. Move (v)movups/d to port 015 for Skylake. This matches Agner's data and is consistent with what the EVEX instructions were doing on SKX. llvm-svn: 328465	2018-03-25 23:40:56 +00:00
Simon Pilgrim	68a8fbc102	[X86] Use WriteResPair for WriteIDiv to cleanup sched defs. NFCI. llvm-svn: 328460	2018-03-25 20:16:53 +00:00
Simon Pilgrim	fecb0b7874	[X86][SkylakeClient] Fix missing comma llvm-svn: 328458	2018-03-25 19:17:17 +00:00
Simon Pilgrim	854ac7490d	[X86] Add missing full stop to comment. NFCI. llvm-svn: 328456	2018-03-25 18:49:48 +00:00
Craig Topper	972bdbd415	[X86][SkylakeClient] Fix a set of regular expressions that were checking for optionally starting with 'Y' instead of 'V' These bad regexs were introduced by r328435 llvm-svn: 328454	2018-03-25 17:33:14 +00:00
Simon Pilgrim	562e8b4eae	[X86][MMX] MOVQ2DQ/MOVDQ2Q are better described as WriteVecMove than WriteMove Not that it makes a difference to current cost values, but will when we try to better model GPR-SIMD transfer costs llvm-svn: 328453	2018-03-25 17:28:06 +00:00
Simon Pilgrim	25acc0a79b	[X86][SkylakeServer] Merge multiple instregex. NFCI llvm-svn: 328452	2018-03-25 17:25:37 +00:00
Craig Topper	a985919d3e	[X86] Update cost model for Goldmont. Add fsqrt costs for Silvermont Add fdiv costs for Goldmont using table 16-17 of the Intel Optimization Manual. Also add overrides for FSQRT for Goldmont and Silvermont. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44644 llvm-svn: 328451	2018-03-25 15:58:12 +00:00
Simon Pilgrim	e3547af7be	[X86] Add the ability to override memory folding latency to schedules and add 1uop for memory folds for Intel models The Intel models need an extra 1uop for memory folded instructions, plus a lot of instructions take a non-default memory latency which should allow us to use the multiclass a lot more to tidy things up. Differential Revision: https://reviews.llvm.org/D44840 llvm-svn: 328446	2018-03-25 10:21:19 +00:00
Craig Topper	e8f4e747bf	[X86] Consistently prefix all defs in X86ScheduleSLM.td with 'SLM'. llvm-svn: 328444	2018-03-25 01:28:43 +00:00
Martin Storsjo	98720156b9	[X86] Update a partially stale comment, since SVN r328386. NFC. llvm-svn: 328443	2018-03-24 23:00:00 +00:00
Simon Pilgrim	31a9633724	[X86][SkylakeClient] Merge xmm/ymm instructions instregex entries to reduce regex matches to reduce compile time llvm-svn: 328435	2018-03-24 20:40:14 +00:00
Simon Pilgrim	c21deec37b	[X86][Broadwell] Merge xmm/ymm instructions instregex entries to reduce regex matches to reduce compile time llvm-svn: 328434	2018-03-24 19:37:28 +00:00
Simon Pilgrim	2b5967f510	[X86][Haswell] Merge xmm/ymm instructions instregex entries to reduce regex matches to reduce compile time llvm-svn: 328432	2018-03-24 18:36:01 +00:00
Simon Pilgrim	efcf1d85b3	[X86][SandyBridge] Merge xmm/ymm instructions instregex entries to reduce regex matches to reduce compile time llvm-svn: 328431	2018-03-24 18:12:59 +00:00
Craig Topper	097b47a0fc	[X86] Add a new disassembler opcode map for 3DNow. Stop treating 3DNow as an attribute. This reduces the size of llvm-mc by at least 150k since we no longer have to multiply the attribute across 7 tables. llvm-svn: 328416	2018-03-24 07:48:54 +00:00

1 2 3 4 5 ...

16957 Commits