llvm-project

Commit Graph

Author	SHA1	Message	Date
Andrea Di Biagio	486358c153	[X86][Broadwell] HWPort5 should not be added to BroadwellModelProcResources. The BroadwellModelProcResources had an entry for HWPort5, which is a Haswell resource, and not a Broadwell processor resource. That entry was added to the Broadwell model because variable blends were consuming it. This was clearly a typo (the resource name should have been BWPort5), which unfortunately was never caught before. It was not reported as an error because HWPort5 is a resource defined by the Haswell model. It has been found when testing some code with llvm-mca: the list of resources in the resource pressure view was odd. This patch fixes the issue; now variable blend instructions consume 2 cycles on BWPort5 instead of HWPort5. This is enough to get rid of the extra (spurious) entry in the BroadWellModelProcResources table. llvm-svn: 329686	2018-04-10 10:49:41 +00:00
Sander de Smalen	f974e255fe	[AArch64][SVE] Asm: Add support for unpredicated LSL/LSR (shift by immediate) instructions. Reviewers: rengolin, fhahn, javed.absar, SjoerdMeijer, huntergr, t.p.northover, echristo, evandro Reviewed By: rengolin, fhahn Subscribers: tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45371 llvm-svn: 329681	2018-04-10 10:03:13 +00:00
Clement Courbet	b449379eae	[MC][TableGen] Add optional libpfm counter names for ProcResUnits. Summary: Subtargets can define the libpfm counter names that can be used to measure cycles and uops issued on ProcResUnits. This allows making llvm-exegesis available on more targets. Fixes PR36984. Reviewers: gchatelet, RKSimon, andreadb, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45360 llvm-svn: 329675	2018-04-10 08:16:37 +00:00
Sander de Smalen	30fda45c18	[AArch64][SVE] Asm: Add support for SVE INDEX instructions. Reviewers: rengolin, fhahn, javed.absar, SjoerdMeijer, huntergr, t.p.northover, echristo, evandro Reviewed By: rengolin, fhahn Subscribers: tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45370 llvm-svn: 329674	2018-04-10 07:01:53 +00:00
Chandler Carruth	0ca3bd0729	[x86] Model the direction flag (DF) separately from the rest of EFLAGS. This cleans up a number of operations that only claimed te use EFLAGS due to using DF. But no instructions which we think of us setting EFLAGS actually modify DF (other than things like popf) and so this needlessly creates uses of EFLAGS that aren't really there. In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD, and the whole-flags writes (WRFLAGS and POPF) need to model this. I've also somewhat cleaned up some of the flag management instruction definitions to be in the correct .td file. Adding this extra register also uncovered a failure to use the correct datatype to hold X86 registers, and I've corrected that as necessary here. Differential Revision: https://reviews.llvm.org/D45154 llvm-svn: 329673	2018-04-10 06:40:51 +00:00
Craig Topper	7e42af87a6	[X86] Prevent folding loads with 64-bit ANDs with immediates that fit in 32-bits. Prefer to use the 32-bit AND with immediate instead. Primarily I'm doing this to ensure that immediates created by shrinkAndImmediate will always get absorbed into the AND. But I do believe this would be a reduction in the number of uops that need to execute. Ideally we should shrink the 'and' and the 'load' during DAG combine to re-enable the fold. Fixes PR37063. llvm-svn: 329667	2018-04-10 03:44:15 +00:00
Chandler Carruth	19618fc639	[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues. The key idea is to lower COPY nodes populating EFLAGS by scanning the uses of EFLAGS and introducing dedicated code to preserve the necessary state in a GPR. In the vast majority of cases, these uses are cmovCC and jCC instructions. For such cases, we can very easily save and restore the necessary information by simply inserting a setCC into a GPR where the original flags are live, and then testing that GPR directly to feed the cmov or conditional branch. However, things are a bit more tricky if arithmetic is using the flags. This patch handles the vast majority of cases that seem to come up in practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of partially preserved EFLAGS as LLVM doesn't currently model that at all. There are a large number of operations that techinaclly observe EFLAGS currently but shouldn't in this case -- they typically are using DF. Currently, they will not be handled by this approach. However, I have never seen this issue come up in practice. It is already pretty rare to have these patterns come up in practical code with LLVM. I had to resort to writing MIR tests to cover most of the logic in this pass already. I suspect even with its current amount of coverage of arithmetic users of EFLAGS it will be a significant improvement over the current use of pushf/popf. It will also produce substantially faster code in most of the common patterns. This patch also removes all of the old lowering for EFLAGS copies, and the hack that forced us to use a frame pointer when EFLAGS copies were found anywhere in a function so that the dynamic stack adjustment wasn't a problem. None of this is needed as we now lower all of these copies directly in MI and without require stack adjustments. Lots of thanks to Reid who came up with several aspects of this approach, and Craig who helped me work out a couple of things tripping me up while working on this. Differential Revision: https://reviews.llvm.org/D45146 llvm-svn: 329657	2018-04-10 01:41:17 +00:00
Vlad Tsyrklevich	0cdc6ec535	ShadowCallStack/x86_64: Ignore pseudo-machine instructions llvm-svn: 329656	2018-04-10 01:31:01 +00:00
Daniel Sanders	5281b02e84	[globalisel][legalizerinfo] Add support for the Lower action in getActionDefinitionsBuilder() and use it in AArch64. Lower is slightly odd. It often doesn't change the type but the lowerings do use the new type to decide what code to create. Treat it like a mutation but provide convenience functions that re-use the existing type. Re-uses the existing tests: test/CodeGen/AArch64/GlobalISel/legalize-rem.mir test/CodeGen/AArch64/GlobalISel//legalize-mul.mir test/CodeGen/AArch64/GlobalISel//legalize-cmpxchg-with-success.mir llvm-svn: 329623	2018-04-09 21:10:09 +00:00
Konstantin Zhuravlyov	6183065b97	AMDGPU: Remove max_scratch_backing_memory_byte_size from kernel header 1. Remove max_scratch_backing_memory_byte_size from kernel header 2. Make it a reserved field 3. Ignore it while parsing assembly for backwards compatibility 4. Bump up minor version of kernel header Differential Revision: https://reviews.llvm.org/D45452 llvm-svn: 329620	2018-04-09 20:47:22 +00:00
Craig Topper	47b2f9d836	[X86] Don't use Lower512IntUnary to split bitcasts with v32i16/v64i8 types on targets without AVX512BW. LowerIntUnary as its name says has an assert for integer types. But for the bitcast case one side might be an FP type. Rather than making sure the function really works for fp types and renaming it. Just do really basic splitting directly. The LowerIntUnary has the advantage that it can peek through BUILD_VECTOR because every other call is during Lowering. But these calls are during legalization and will be followed by a DAG combine round. Revert some change to LowerVectorIntUnary that were originally made just to make these two calls work even in pure integer cases. This was found purely by compiling the avx512f-builtins.c test from clang so I've copied over the offending function from that. llvm-svn: 329616	2018-04-09 20:37:14 +00:00
Peter Collingbourne	5cff2409ae	AArch64: Allow offsets to be folded into addresses with ELF. This is a code size win in code that takes offseted addresses frequently, such as C++ constructors that typically need to compute an offseted address of a vtable. It reduces the size of Chromium for Android's .text section by 46KB, or 56KB with ThinLTO (which exposes more opportunities to use a direct access rather than a GOT access). Because the addend range is limited in COFF and Mach-O, this is enabled for ELF only. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 329611	2018-04-09 19:59:57 +00:00
Alex Shlyapnikov	79f2c720b5	Revert "AMDGPU: enable 128-bit for local addr space under an option" This reverts commit r329591. It breaks various bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/16516 http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/17374 http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/15992 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/11251 ... llvm-svn: 329610	2018-04-09 19:47:38 +00:00
Mandeep Singh Grang	afa3aaf14d	[WebAssembly] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace all std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: sunfish, RKSimon Reviewed By: sunfish Subscribers: jfb, dschuff, sbc100, jgravelle-google, aheejin, llvm-commits Differential Revision: https://reviews.llvm.org/D44873 llvm-svn: 329607	2018-04-09 19:38:31 +00:00
Craig Topper	0c2a12cb3e	[X86] Revert the SLM part of r328914. While it appears to be correct information based on Intel's optimization manual and Agner's data, it causes perf regressions on a couple of the benchmarks in our internal list. llvm-svn: 329593	2018-04-09 17:07:40 +00:00
Marek Olsak	52b033b827	AMDGPU: enable 128-bit for local addr space under an option Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329591	2018-04-09 16:56:32 +00:00
Tom Stellard	e753c52227	AMDGPU: Initialize GlobalISel passes Summary: This fixes AMDGPU GlobalISel test failures when enabling the AMDGPU target without any other targets that use GlobalISel. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D45353 llvm-svn: 329588	2018-04-09 16:09:13 +00:00
Simon Pilgrim	e5ed5e2cba	[X86][MMX] Fix missing itinerary for PALIGNR llvm-svn: 329568	2018-04-09 13:52:33 +00:00
Simon Pilgrim	140fee078f	[X86][MMX] Fix missing itinerary for MOVQ2DQ instruction format llvm-svn: 329567	2018-04-09 13:42:14 +00:00
Simon Pilgrim	abf3611332	[X86][MMX] Fix missing itinerary for CVTPI2PS llvm-svn: 329565	2018-04-09 13:27:47 +00:00
Dmitry Preobrazhensky	2f8e146ad3	[AMDGPU][MC][GFX9] Added instructions s_mul_hi_32, s_lshl_add_u32 See bugs 36841: https://bugs.llvm.org/show_bug.cgi?id=36841 36842: https://bugs.llvm.org/show_bug.cgi?id=36842 Differential Revision: https://reviews.llvm.org/D45251 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329562	2018-04-09 13:10:33 +00:00
Simon Pilgrim	0047efdd1e	[X86][MMX] Fix flipped reg/mem typo in MMX_MISC_FUNC_ITINS The RR/RM itineraries were the wrong way around llvm-svn: 329561	2018-04-09 13:02:07 +00:00
Simon Pilgrim	6131286553	[X86][SSE] Fix f32 mul/div itinerary groups typo The RM folded itineraries were incorrectly using the f64 version. llvm-svn: 329556	2018-04-09 10:45:53 +00:00
Hiroshi Inoue	9ff2380ea6	[NFC] fix trivial typos in comments and error message "is is" -> "is", "are are" -> "are" llvm-svn: 329546	2018-04-09 04:37:53 +00:00
Sanjay Patel	0d7df36c66	[TargetSchedule] shrink interface for init(); NFCI The TargetSchedModel is always initialized using the TargetSubtargetInfo's MCSchedModel and TargetInstrInfo, so we don't need to extract those and pass 3 parameters to init(). Differential Revision: https://reviews.llvm.org/D44789 llvm-svn: 329540	2018-04-08 19:56:04 +00:00
Craig Topper	b7baa358f6	[X86] Add SchedWrites for CMOV and SETCC. Use them to remove InstRWs. Summary: Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops. This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs. The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc. Reviewers: RKSimon, andreadb, GGanesh Reviewed By: RKSimon Subscribers: GGanesh, llvm-commits Differential Revision: https://reviews.llvm.org/D45380 llvm-svn: 329539	2018-04-08 17:53:18 +00:00
Craig Topper	c362f42b6a	[X86][Znver1] Remove InstRWs for BLENDVPS/PD Summary: This removes the InstRWs for BLENDVPS/PD in favor of WriteFVarBlend. The latency listed was 3 cycles but WriteFVarBlend is defined as 1 cycle latency. The 1 cycle latency matches Agner Fog's data. The patterns were missing the VEX forms which is why there are no test changes. We don't test "-mcpu=znver1 -mattr=-avx" Reviewers: RKSimon, GGanesh Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44841 llvm-svn: 329538	2018-04-08 17:53:15 +00:00
Mandeep Singh Grang	327fd5e47c	[PowerPC] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace all std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: hfinkel, RKSimon Reviewed By: RKSimon Subscribers: nemanjai, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D44870 llvm-svn: 329535	2018-04-08 16:45:04 +00:00
Mandeep Singh Grang	68a151a13c	[X86] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace all std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: chandlerc, craig.topper, RKSimon Reviewed By: chandlerc, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44874 llvm-svn: 329534	2018-04-08 16:42:52 +00:00
Simon Pilgrim	86588fc809	[X86][Btver2] Add vector extract costs llvm-svn: 329524	2018-04-08 11:26:26 +00:00
Craig Topper	ef37aebc96	[X86] Combine vXi64 multiplies to MULDQ/MULUDQ during DAG combine instead of lowering. Previously we used a custom lowering for this because of the AVX1 splitting requirement. But we can do the split during DAG combine if we check the types and subtarget llvm-svn: 329510	2018-04-07 19:09:52 +00:00
Simon Pilgrim	80ce1dde44	[CostModel][X86] Fix v32i16/v64i8 SETCC costs on AVX512BW targets llvm-svn: 329498	2018-04-07 13:24:33 +00:00
Tim Northover	e25e458d52	Reapply ARM: Do not spill CSR to stack on entry to noreturn functions Should fix UBSan bot by also checking there's no "uwtable" attribute before skipping. Otherwise the unwind table will be useless since its moves expect CSRs to actually be preserved. A noreturn nounwind function can be expected to never return in any way, and by never returning it will also never have to restore any callee-saved registers for its caller. This makes it possible to skip spills of those registers during function entry, saving some stack space and time in the process. This is rather useful for embedded targets with limited stack space. Should fix PR9970. Patch mostly by myeisha (pmb). llvm-svn: 329494	2018-04-07 10:57:03 +00:00
Vitaly Buka	de5f196530	Revert "ARM: Do not spill CSR to stack on entry to noreturn functions" Breaks ubsan test TestCases/Misc/missing_return.cpp on ARM This reverts commit r329287 llvm-svn: 329486	2018-04-07 05:36:44 +00:00
Artem Belevich	f256decdc4	[NVPTX] add support for initializing fp16 arrays. Previously HalfTy was not handled which would either trigger an assertion, or result in array initialized with garbage. Differential Revision: https://reviews.llvm.org/D45391 llvm-svn: 329463	2018-04-06 22:25:08 +00:00
Artem Belevich	a28e598ebb	[NVPTX] Fixed vectorized LDG for f16. v2f16 is a special case in NVPTX. v4f16 may be loaded as a pair of v2f16 and that was not previously handled correctly by tryLDGLDU() Differential Revision: https://reviews.llvm.org/D45339 llvm-svn: 329456	2018-04-06 21:10:24 +00:00
Sameer AbuAsal	c1b0e66b58	[RISCV] Tablegen-driven Instruction Compression. Summary: This patch implements a tablegen-driven Instruction Compression mechanism for generating RISCV compressed instructions (C Extension) from the expanded instruction form. This tablegen backend processes CompressPat declarations in a td file and generates all the compile-time and runtime checks required to validate the declarations, validate the input operands and generate correct instructions. The checks include validating register operands, immediate operands, fixed register operands and fixed immediate operands. Example: class CompressPat<dag input, dag output> { dag Input = input; dag Output = output; list<Predicate> Predicates = []; } let Predicates = [HasStdExtC] in { def : CompressPat<(ADD GPRNoX0:$rs1, GPRNoX0:$rs1, GPRNoX0:$rs2), (C_ADD GPRNoX0:$rs1, GPRNoX0:$rs2)>; } The result is an auto-generated header file 'RISCVGenCompressEmitter.inc' which exports two functions for compressing/uncompressing MCInst instructions, plus some helper functions: bool compressInst(MCInst& OutInst, const MCInst &MI, const MCSubtargetInfo &STI, MCContext &Context); bool uncompressInst(MCInst& OutInst, const MCInst &MI, const MCRegisterInfo &MRI, const MCSubtargetInfo &STI); The clients that include this auto-generated header file and invoke these functions can compress an instruction before emitting it, in the target-specific ASM or ELF streamer, or can uncompress an instruction before printing it, when the expanded instruction format aliases is favored. The following clients were added to implement compression\uncompression for RISCV: 1) RISCVAsmParser::MatchAndEmitInstruction: Inserted a call to compressInst() to compresses instructions parsed by llvm-mc coming from an ASM input. 2) RISCVAsmPrinter::EmitInstruction: Inserted a call to compressInst() to compress instructions that were lowered from Machine Instructions (MachineInstr). 3) RVInstPrinter::printInst: Inserted a call to uncompressInst() to print the expanded version of the instruction instead of the compressed one (e.g, add s0, s0, a5 instead of c.add s0, a5) when -riscv-no-aliases is not passed. This patch squashes D45119, D42780 and D41932. It was reviewed in smaller patches by asb, efriedma, apazos and mgrang. Reviewers: asb, efriedma, apazos, llvm-commits, sabuasal Reviewed By: sabuasal Subscribers: mgorny, eraman, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, niosHD, kito-cheng, shiva0217, zzheng Differential Revision: https://reviews.llvm.org/D45385 llvm-svn: 329455	2018-04-06 21:07:05 +00:00
Dmitry Preobrazhensky	ae31223ba7	[AMDGPU][MC][GFX9] Added s_call_b64 See bug 36843: https://bugs.llvm.org/show_bug.cgi?id=36843 Differential Revision: https://reviews.llvm.org/D45268 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329440	2018-04-06 18:24:49 +00:00
Krzysztof Parzyszek	b7e54e8482	[Hexagon] Fix assert with packetizing IMPLICIT_DEF instructions The compiler is generating packet with the following instructions, which causes an undefined register assert in the verifier. $r0 = IMPLICIT_DEF $r1 = IMPLICIT_DEF S2_storerd_io killed $r29, 0, killed %d0 The problem is that the packetizer is not saving the IMPLICIT_DEF instructions, which are needed when checking if it is legal to add the store instruction. The fix is to add the IMPLICIT_DEF instructions to the CurrentPacketMIs structure. Patch by Brendon Cahoon. llvm-svn: 329439	2018-04-06 18:19:22 +00:00
Krzysztof Parzyszek	aca8f32713	[Hexagon] Prevent a stall across zero-latency instructions in a packet Packetizer keeps two zero-latency bound instrctions in the same packet ignoring the stalls on the later instruction. This should not be the case if there is no data dependence. Patch by Sumanth Gundapaneni. llvm-svn: 329437	2018-04-06 18:13:11 +00:00
Krzysztof Parzyszek	269740a88e	[Hexagon] Remove duplicated code, NFC llvm-svn: 329436	2018-04-06 18:10:13 +00:00
Krzysztof Parzyszek	ed04f02432	[Hexagon] Handle subregisters when calculating iteration count in HW loops llvm-svn: 329434	2018-04-06 17:51:57 +00:00
Dmitry Preobrazhensky	306b1a0119	[AMDGPU][MC][GFX9] Added instruction s_endpgm_ordered_ps_done See bug 36844: https://bugs.llvm.org/show_bug.cgi?id=36844 Differential Revision: https://reviews.llvm.org/D45313 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329430	2018-04-06 17:25:00 +00:00
Craig Topper	c50570fb4f	[X686] Add appropriate ReadAfterLd for the register input to memory forms of ADC/SBB. llvm-svn: 329424	2018-04-06 17:12:18 +00:00
Dmitry Preobrazhensky	f20aff565d	[AMDGPU][MC][GFX9] Added instructions saveexec, wrexec and bitreplicate See bug 36840: https://bugs.llvm.org/show_bug.cgi?id=36840 Differential Revision: https://reviews.llvm.org/D45250 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329419	2018-04-06 16:35:11 +00:00
Craig Topper	b9d298ecf2	[X86] Remove InstRWs for basic arithmetic instructions from Sandy Bridge scheduler model. We can get this right through WriteALU and friends now. llvm-svn: 329417	2018-04-06 16:29:31 +00:00
Craig Topper	f0d042619b	[X86] Attempt to model basic arithmetic instructions in the Haswell/Broadwell/Skylake scheduler models without InstRWs Summary: This patch removes InstRW overrides for basic arithmetic/logic instructions. To do this I've added the store address port to RMW. And used a WriteSequence to make the latency additive. It does not cover ADC/SBB because they have different latency. Apparently we were inconsistent about whether the store has latency or not thus the test changes. I've also left out Sandy Bridge because the load latency there is currently 4 cycles and should be 5. Reviewers: RKSimon, andreadb Reviewed By: andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45351 llvm-svn: 329416	2018-04-06 16:16:48 +00:00
Craig Topper	f131b60049	[X86] Add an extra store address cycle to WriteRMW in the Sandy Bridge/Broadwell/Haswell/Skylake scheduler model. Even those the address was calculated for the load, its calculated again for the store. llvm-svn: 329415	2018-04-06 16:16:46 +00:00
Craig Topper	22d25a08ae	[X86] Merge itineraries for CLC, CMC, and STC. These are very simple flag setting instructions that appear to only be a single uop. They're unlikely to need this separation. llvm-svn: 329414	2018-04-06 16:16:43 +00:00
Dmitry Preobrazhensky	59399ae4cc	[AMDGPU][MC][VI][GFX9] Added s_atc_probe* instructions See bug 36839: https://bugs.llvm.org/show_bug.cgi?id=36839 Differential Revision: https://reviews.llvm.org/D45249 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329408	2018-04-06 15:48:39 +00:00
Pete Couperus	b7b6e1da6c	[ARC] Add <.f> suffix for F32_GEN4_{DOP\|SOP}. Add disassembler support for instructions which writeback STATUS32. https://reviews.llvm.org/D45148 Patch by Yan Luo! (Yan.Luo2@synopsys.com) llvm-svn: 329404	2018-04-06 15:43:11 +00:00
Dmitry Preobrazhensky	4732d876ee	[AMDGPU][MC][GFX9] Added s_dcache_discard* instructions See bug 36838: https://bugs.llvm.org/show_bug.cgi?id=36838 Differential Revision: https://reviews.llvm.org/D45247 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329397	2018-04-06 15:08:42 +00:00
Simon Pilgrim	09eeb3a8b9	[X86][SandyBridge] Add (V)DPPS memory fold latencies Noticed this during D44654 llvm-svn: 329389	2018-04-06 11:25:21 +00:00
Simon Pilgrim	8a83f16ccd	[X86][SandyBridge] SBWriteResPair +5cy Memory Folds As mentioned on D44647, this patch increases the default memory latency to +5cy , which more closely matches what most custom cases are doing for reg-mem instructions. I've bumped LoadLatency, ReadAfterLd and WriteLoad values to 5cy to be consistent. As Sandy Bridge is currently our default generic model, this affects a lot of scheduling tests... Differential Revision: https://reviews.llvm.org/D44654 llvm-svn: 329388	2018-04-06 11:00:51 +00:00
Simon Pilgrim	fd1f4fe54e	[X86][SkylakeServer] Merge 2 InstRW entries to the same sched group. NFCI. llvm-svn: 329386	2018-04-06 10:16:36 +00:00
Hiroshi Inoue	a2eefb6d9a	[PowerPC] allow D-form VSX load/store when accessing FrameIndex without offset VSX D-form load/store instructions of POWER9 require the offset be a multiple of 16 and a helper`isOffsetMultipleOf` is used to check this. So far, the helper handles FrameIndex + offset case, but not handling FrameIndex without offset case. Due to this, we are missing opportunities to exploit D-form instructions when accessing an object or array allocated on stack. For example, x-form store (stxvx) is used for int a[4] = {0}; instead of d-form store (stxv). For larger arrays, D-form instruction is not used when accessing the first 16-byte. Using D-form instructions reduces register pressure as well as instructions. Differential Revision: https://reviews.llvm.org/D45079 llvm-svn: 329377	2018-04-06 05:41:16 +00:00
Manoj Gupta	afb355bdc0	Fix lld-x86_64-darwin13 build fails. Use double braces in std::array initialization to keep Darwin builders happy. llvm-svn: 329363	2018-04-05 23:23:29 +00:00
Manoj Gupta	9d68b9eac5	Attempt to fix Mips breakages. Summary: Replace ArrayRefs by actual std::array objects so that there are no dangling references. Reviewers: rsmith, gkistanova Subscribers: sdardis, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D45338 llvm-svn: 329359	2018-04-05 22:47:25 +00:00
Craig Topper	fbe3132f67	[X86] Separate CDQ and CDQE in the scheduler model. According to Agner's data, CDQE is closer to CWDE. llvm-svn: 329354	2018-04-05 21:56:19 +00:00
Craig Topper	4cc3827791	[X86] Add MOVZPQILo2PQIrr to the Sandy Bridge scheduler model llvm-svn: 329351	2018-04-05 21:40:32 +00:00
Craig Topper	3b0b96c591	[X86] Add LEAVE instruction to the scheduler models using the same data as LEAVE64. Make LEAVE/LEAVE64 more correct on Sandy Bridge. This is the 32-bit mode version of LEAVE64. It should be at least somewhat similar to LEAVE64. The Sandy Bridge version was missing a load port use. llvm-svn: 329347	2018-04-05 21:16:26 +00:00
Konstantin Zhuravlyov	c233ae8004	AMDGPU/Metadata: Always report a fixed number of hidden arguments Currently it is 6. If the "feature" was not used, report dummy hidden argument. Otherwise it does not match the kernarg size reported in the kernel header. Differential Revision: https://reviews.llvm.org/D45129 llvm-svn: 329341	2018-04-05 20:46:04 +00:00
Craig Topper	c6bb36a3d0	[X86] Remove some InstRWs for plain store instructions on Sandy Bridge. We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion. llvm-svn: 329339	2018-04-05 20:04:06 +00:00
Craig Topper	9eec2025c5	[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329330	2018-04-05 18:38:45 +00:00
Mandeep Singh Grang	9893fe218c	[ARM] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace all std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: t.p.northover, RKSimon, MatzeB, bkramer Reviewed By: bkramer Subscribers: javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D44855 llvm-svn: 329329	2018-04-05 18:31:50 +00:00
Craig Topper	665f74414d	[X86] Disassembler support for having an ADSIZE prefix affect instructions with 0xf2 and 0xf3 prefixes. Needed to support umonitor from D45253. llvm-svn: 329327	2018-04-05 18:20:14 +00:00
Sam Clegg	cfd44a2e69	[WebAssembly] Allow for the creation of user-defined custom sections This patch adds a way for users to create their own custom sections to be added to wasm files. At the LLVM IR layer, they are defined through the "wasm.custom_sections" named metadata. The expected use case for this is bindings generators such as wasm-bindgen. Patch by Dan Gohman Differential Revision: https://reviews.llvm.org/D45297 llvm-svn: 329315	2018-04-05 17:01:39 +00:00
Craig Topper	6ecdb03f16	[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256. llvm-svn: 329310	2018-04-05 16:32:48 +00:00
Andrea Di Biagio	c74ad502ce	[MC][Tablegen] Allow models to describe the retire control unit for llvm-mca. This patch adds the ability to describe properties of the hardware retire control unit. Tablegen class RetireControlUnit has been added for this purpose (see TargetSchedule.td). A RetireControlUnit specifies the size of the reorder buffer, as well as the maximum number of opcodes that can be retired every cycle. A zero (or negative) value for the reorder buffer size means: "the size is unknown". If the size is unknown, then llvm-mca defaults it to the value of field SchedMachineModel::MicroOpBufferSize. A zero or negative number of opcodes retired per cycle means: "there is no restriction on the number of instructions that can be retired every cycle". Models can optionally specify an instance of RetireControlUnit. There can only be up-to one RetireControlUnit definition per scheduling model. Information related to the RCU (RetireControlUnit) is stored in (two new fields of) MCExtraProcessorInfo. llvm-mca loads that information when it initializes the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp). This patch fixes PR36661. Differential Revision: https://reviews.llvm.org/D45259 llvm-svn: 329304	2018-04-05 15:41:41 +00:00
Hiroshi Inoue	bbf98aea83	[PowerPC] fix assertion failure due to missing instruction in P9InstrResources.td This patch adds L(W\|H\|B)ZXTLS_32 instructions introduced by https://reviews.llvm.org/rL327635 in P9InstrResources.td. llvm-svn: 329299	2018-04-05 15:27:06 +00:00
Tim Northover	b30388bf11	ARM: Do not spill CSR to stack on entry to noreturn functions A noreturn nounwind function can be expected to never return in any way, and by never returning it will also never have to restore any callee-saved registers for its caller. This makes it possible to skip spills of those registers during function entry, saving some stack space and time in the process. This is rather useful for embedded targets with limited stack space. Should fix PR9970. Patch by myeisha (pmb). llvm-svn: 329287	2018-04-05 14:26:06 +00:00
Krzysztof Parzyszek	62c4805c1f	[Hexagon] Remove default values from lambda parameters llvm-svn: 329286	2018-04-05 14:25:52 +00:00
Simon Pilgrim	1d793b8ac5	[SchedModel] Complete models shouldn't match against itineraries when they don't use them (PR35639) For schedule models that don't use itineraries, checkCompleteness still checks that an instruction has a matching itinerary instead of skipping and going straight to matching the InstRWs. That doesn't seem to match what happens in TargetSchedule.cpp This patch causes problems for a number of models that had been incorrectly flagged as complete. Differential Revision: https://reviews.llvm.org/D43235 llvm-svn: 329280	2018-04-05 13:11:36 +00:00
Craig Topper	15303dda0d	[X86] Revert r329251-329254 It's failing on the bots and I'm not sure why. This reverts: [X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. [X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256. [X86] Remove some InstRWs for plain store instructions on Sandy Bridge. [X86] Auto-generate complete checks. NFC llvm-svn: 329256	2018-04-05 05:19:36 +00:00
Craig Topper	25c7110a37	[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329254	2018-04-05 04:42:03 +00:00
Craig Topper	4b1fdd4921	[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256. llvm-svn: 329253	2018-04-05 04:42:02 +00:00
Craig Topper	5c36557426	[X86] Auto-generate complete checks. NFC llvm-svn: 329251	2018-04-05 04:41:59 +00:00
Sam Clegg	685c5e838a	[WebAssembly] Only write 32-bits for WebAssembly::OPERAND_OFFSET32 A bug was found where an offset of -1 would generate an encoding of max int64 which is invalid in the binary format. Differential Revision: https://reviews.llvm.org/D45280 llvm-svn: 329238	2018-04-04 22:27:58 +00:00
Peter Collingbourne	f11eb3ebe7	AArch64: Implement support for the shadowcallstack attribute. The implementation of shadow call stack on aarch64 is quite different to the implementation on x86_64. Instead of reserving a segment register for the shadow call stack, we reserve the platform register, x18. Any function that spills lr to sp also spills it to the shadow call stack, a pointer to which is stored in x18. Differential Revision: https://reviews.llvm.org/D45239 llvm-svn: 329236	2018-04-04 21:55:44 +00:00
Jessica Paquette	bccd18b816	[MachineOutliner] Add `useMachineOutliner` target hook The MachineOutliner has a bunch of target hooks that will call llvm_unreachable if the target doesn't implement them. Therefore, if you enable the outliner on such a target, it'll just crash. It'd be much better if it'd just not run the outliner at all in this case. This commit adds a hook to TargetInstrInfo that returns false by default. Targets that implement the hook make it return true. The outliner checks the return value of this hook to decide whether or not to continue. llvm-svn: 329220	2018-04-04 19:13:31 +00:00
Mandeep Singh Grang	93ab79d205	[AArch64] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace all std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: t.p.northover, jmolloy, RKSimon, rengolin Reviewed By: rengolin Subscribers: dexonsmith, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D44853 llvm-svn: 329216	2018-04-04 18:20:28 +00:00
Craig Topper	498875fab0	[X86] Separate BSWAP32r and BSWAP64r scheduling data in SandyBridge/Haswell/Broadwell/Skylake scheduler models. The BSWAP64r version is 2 uops and BSWAP32r is only 1 uop. The regular expressions also looked for a non-existant BSWAP16r. llvm-svn: 329211	2018-04-04 17:54:19 +00:00
Lei Huang	09fda63af0	[Power9]Legalize and emit code for quad-precision fma instructions Legalize and emit code for the following quad-precision fma: * xsmaddqp * xsnmaddqp * xsmsubqp * xsnmsubqp Differential Revision: https://reviews.llvm.org/D44843 llvm-svn: 329206	2018-04-04 16:43:50 +00:00
Dmitry Preobrazhensky	523872ea59	[AMDGPU][MC] Enabled instruction TBUFFER_LOAD_FORMAT_XYZ for SI/CI See bug 36958: https://bugs.llvm.org/show_bug.cgi?id=36958 Differential Revision: https://reviews.llvm.org/D45099 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329197	2018-04-04 13:54:55 +00:00
Dmitry Preobrazhensky	a0b8cd038c	[AMDGPU][MC] Added support of 3-element addresses for MIMG instructions See bug 35999: https://bugs.llvm.org/show_bug.cgi?id=35999 Differential Revision: https://reviews.llvm.org/D45084 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329187	2018-04-04 13:01:17 +00:00
Nico Weber	1cbd096914	Sort targetgen calls in lib/Target/*/CMakeLists. Makes it easier to see mistakes such as the one fixed in r329178 and makes the different target CMakeLists more consistent. Also remove some stale-looking comments from the Nios2 target cmakefile. No intended behavior change. llvm-svn: 329181	2018-04-04 12:37:44 +00:00
Nico Weber	644d456a5f	Remove duplicate tablegen lines from AVR target. They were added in r285274, in what looks like a merge mishap. AVRGenMCCodeEmitter.inc is the only non-dupe tablegen invocation added in that revision. Also sort the tablegen lines to make this easier to spot in the future. llvm-svn: 329178	2018-04-04 12:27:43 +00:00
Benjamin Kramer	1fc0da4849	Make helpers static. NFC. llvm-svn: 329170	2018-04-04 11:45:11 +00:00
Nicolai Haehnle	2f5a73820c	AMDGPU: Dimension-aware image intrinsics Summary: These new image intrinsics contain the texture type as part of their name and have each component of the address/coordinate as individual parameters. This is a preparatory step for implementing the A16 feature, where coordinates are passed as half-floats or -ints, but the Z compare value and texel offsets are still full dwords, making it difficult or impossible to distinguish between A16 on or off in the old-style intrinsics. Additionally, these intrinsics pass the 'texfailpolicy' and 'cachectrl' as i32 bit fields to reduce operand clutter and allow for future extensibility. v2: - gather4 supports 2darray images - fix a bug with 1D images on SI Change-Id: I099f309e0a394082a5901ea196c3967afb867f04 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44939 llvm-svn: 329166	2018-04-04 10:58:54 +00:00
Nicolai Haehnle	3ffd383a15	AMDGPU: Fix copying i1 value out of loop with non-uniform exit Summary: When an i1-value is defined inside of a loop and used outside of it, we cannot simply use the SGPR bitmask from the loop's last iteration. There are also useful and correct cases of an i1-value being copied between basic blocks, e.g. when a condition is computed outside of a loop and used inside it. The concept of dominators is not sufficient to capture what is going on, so I propose the notion of "lane-dominators". Fixes a bug encountered in Nier: Automata. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743 Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D40547 llvm-svn: 329164	2018-04-04 10:57:58 +00:00
John Brawn	21d9b33d62	[AArch64] Add patterns matching (fabs (fsub x y)) to (fabd x y) Differential Revision: https://reviews.llvm.org/D44573 llvm-svn: 329163	2018-04-04 10:12:53 +00:00
Mikhail Maltsev	68f35bcc85	[ARM] Do not convert some vmov instructions Summary: Patch https://reviews.llvm.org/D44467 implements conversion of invalid vmov instructions into valid ones. It turned out that some valid instructions also get converted, for example vmov.i64 d2, #0xff00ff00ff00ff00 -> vmov.i16 d2, #0xff00 Such behavior is incorrect because according to the ARM ARM section F2.7.7 Modified immediate constants in T32 and A32 Advanced SIMD instructions, "On assembly, the data type must be matched in the table if possible." This patch fixes the isNEONmovReplicate check so that the above instruction is not modified any more. Reviewers: rengolin, olista01 Reviewed By: rengolin Subscribers: javed.absar, kristof.beyls, rogfer01, llvm-commits Differential Revision: https://reviews.llvm.org/D44678 llvm-svn: 329158	2018-04-04 08:54:19 +00:00
Craig Topper	a30db995b3	[X86] Use the same predicate for the load for PMOVSXBQ and PMOVZXBQ. These both use a 16-bit load, but one used loadi16_anyext and the other used extloadi32i16. The only difference between them is that loadi16_anyext checked that the load was at least 2 byte aligned and non-volatile. But the alignment doesn't matter here. Just use extloadi32i16 for both. llvm-svn: 329154	2018-04-04 07:00:24 +00:00
Craig Topper	a3cac956fc	[X86] Use loadi16/loadi32 predicates in multiply patterns llvm-svn: 329153	2018-04-04 07:00:19 +00:00
Craig Topper	88e38e3e3e	[X86] Remove more dead code left over from the handling of i8/i16 UMUL_LOHI/SMUL_LOHI that is no longer needed. NFC llvm-svn: 329152	2018-04-04 07:00:16 +00:00
Craig Topper	afa22edcf0	[X86] Remove dead code for handling i8/i16 UMUL_LOHI/SMUL_LOHI from X86ISelDAGToDAG.cpp. NFC These are promoted to i16/i32 multiplies by a DAG combine. llvm-svn: 329147	2018-04-04 04:38:55 +00:00
Craig Topper	3064c15dc3	[X86] Remove some code that was only needed when i1 was a legal type. NFC llvm-svn: 329146	2018-04-04 04:38:54 +00:00
Vlad Tsyrklevich	b324733169	Fix bad #include path in r329139 llvm-svn: 329140	2018-04-04 01:34:42 +00:00
Vlad Tsyrklevich	e3446017ed	Add the ShadowCallStack pass Summary: The ShadowCallStack pass instruments functions marked with the shadowcallstack attribute. The instrumented prolog saves the return address to [gs:offset] where offset is stored and updated in [gs:0]. The instrumented epilog loads/updates the return address from [gs:0] and checks that it matches the return address on the stack before returning. Reviewers: pcc, vitalybuka Reviewed By: pcc Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D44802 llvm-svn: 329139	2018-04-04 01:21:16 +00:00
Jessica Paquette	5fa2a63785	[MachineOutliner] Test for X86FI->getUsesRedZone() as well as Attribute::NoRedZone This commit is similar to r329120, but uses the existing getUsesRedZone() function in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a function truly uses a redzone instead of just the noredzone attribute on a function. Thus, after this commit, it's possible to outline from x86 without using -mno-red-zone and still get outlining results. This also adds a new test for the new redzone behaviour. llvm-svn: 329134	2018-04-03 23:32:41 +00:00
Farhana Aleen	e80aeac0f2	[AMDGPU] performMinMaxCombine should not optimize patterns of vectors to min3/max3. Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3. Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D45219 llvm-svn: 329131	2018-04-03 23:00:30 +00:00
Evandro Menezes	6b8d8f4010	[AArch64] Adjust the cost model for Exynos M3 Fix typo and simplify matching expression. llvm-svn: 329130	2018-04-03 22:57:17 +00:00
Ikhlas Ajbar	1376d934ed	[Hexagon] peel loops with runtime small trip counts Move the check canPeel() to Hexagon Target before setting PeelCount. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329129	2018-04-03 22:55:09 +00:00
Jessica Paquette	642f6c61a3	[MachineOutliner] Keep track of fns that use a redzone in AArch64FunctionInfo This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It returns true if the function is known to use a redzone, false if it is known to not use a redzone, and no value otherwise. This removes the requirement to pass -mno-red-zone when outlining for AArch64. https://reviews.llvm.org/D45189 llvm-svn: 329120	2018-04-03 21:56:10 +00:00
Farhana Aleen	936947349a	Revert "MSG" This reverts commit 9a0ce889d1c39c74d69ecad5ce9c875155ae55de. This was committed by mistake. llvm-svn: 329119	2018-04-03 21:51:45 +00:00
Farhana Aleen	3ab409dc86	MSG llvm-svn: 329114	2018-04-03 21:20:39 +00:00
Jun Bum Lim	7ab1b32b5e	[CodeGen]Add NoVRegs property on PostRASink and ShrinkWrap Summary: This change declare that PostRAMachineSinking and ShrinkWrap require NoVRegs property, so now the MachineFunctionPass can enforce this check. These passes are disabled in NVPTX & WebAssembly. Reviewers: dschuff, jlebar, tra, jgravelle-google, MatzeB, sebpop, thegameg, mcrosier Reviewed By: dschuff, thegameg Subscribers: jholewinski, jfb, sbc100, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D45183 llvm-svn: 329095	2018-04-03 18:17:34 +00:00
Krzysztof Parzyszek	9fa6ffe290	[Hexagon] Remove -mhvx-double and the corresponding subtarget feature Specifying the HVX vector length should be done via the -mhvx-length option. llvm-svn: 329079	2018-04-03 16:06:36 +00:00
Andrea Di Biagio	9da4d6db33	[MC][Tablegen] Allow the definition of processor register files in the scheduling model for llvm-mca This patch allows the description of register files in processor scheduling models. This addresses PR36662. A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td. Targets can optionally describe register files for their processors using that class. In particular, class RegisterFile allows to specify: - The total number of physical registers. - Which target registers are accessible through the register file. - The cost of allocating a register at register renaming stage. Example (from this patch - see file X86/X86ScheduleBtVer2.td) def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]> Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar (btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM register definitions only cost 1 physical register. The syntax allows to specify an empty set of register classes. An empty set of register classes means: this register file models all the registers specified by the Target. For each register class, users can specify an optional register cost. By default, register costs default to 1. A value of 0 for the number of physical registers means: "this register file has an unbounded number of physical registers". This patch is structured in two parts. * Part 1 - MC/Tablegen * A first part adds the tablegen definition of RegisterFile, and teaches the SubtargetEmitter how to emit information related to register files. Information about register files is accessible through an instance of MCExtraProcessorInfo. The idea behind this design is to logically partition the processor description which is only used by external tools (like llvm-mca) from the processor information used by the llvm machine schedulers. I think that this design would make easier for targets to get rid of the extra processor information if they don't want it. * Part 2 - llvm-mca related * The second part of this patch is related to changes to llvm-mca. The main differences are: 1) class RegisterFile now needs to take into account the "cost of a register" when allocating physical registers at register renaming stage. 2) Point 1. triggered a minor refactoring which lef to the removal of the "maximum 32 register files" restriction. 3) The BackendStatistics view has been updated so that we can print out extra details related to each register file implemented by the processor. The effect of point 3. is also visible in tests register-files-[1..5].s. Differential Revision: https://reviews.llvm.org/D44980 llvm-svn: 329067	2018-04-03 13:36:24 +00:00
Hiroshi Inoue	08a1775f28	[PowerPC] reorder entries in P9InstrResources.td in alphabetical order; NFC Reorder entries added in my previous commit (rL328969) to keep alphabetical order. llvm-svn: 329064	2018-04-03 12:49:42 +00:00
Craig Topper	9b6a65b9ef	[X86] Reduce number of OpPrefix bits in TSFlags to 2. NFCI TSFlag doesn't need to disambiguate NoPrfx from PS. So shift the encodings so PS is NoPrfx\|0x4. llvm-svn: 329049	2018-04-03 06:37:04 +00:00
Yonghong Song	d3b522f519	bpf: fix incorrect SELECT_CC lowering Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC") intended to improve code quality for certain jmp conditions. The commit, however, has a couple of issues: (1). In code, just swap is not enough, ConditionalCode CC should also be swapped, otherwise incorrect code will be generated. (2). The ConditionalCode swap should be subject to getHasJmpExt(). If getHasJmpExt() is False, certain conditional codes will not be supported and swap may generate incorrect code. The original goal for this patch is to optimize jmp operations which does not have JmpExt turned on. If JmpExt is on, better code could be generated. For example, the test select_ri.ll is introduced to demonstrate the optimization. The same result can be achieved with -mcpu=v2 flag. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 329043	2018-04-03 03:56:37 +00:00
Ikhlas Ajbar	b7322e8ac7	peel loops with runtime small trip counts For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329042	2018-04-03 03:39:43 +00:00
Dmitry Preobrazhensky	b181c7312e	[AMDGPU][MC][GFX9] Added instructions v_cvt_norm_*16_f16, v_sat_pk_u8_i16 See bug 36847: https://bugs.llvm.org/show_bug.cgi?id=36847 Differential Revision: https://reviews.llvm.org/D45097 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328988	2018-04-02 17:09:20 +00:00
Dmitry Preobrazhensky	6bad04ecf5	[AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructions Fixed a bug which caused Tablegen crash. See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837 Differential Revision: https://reviews.llvm.org/D45085 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328983	2018-04-02 16:10:25 +00:00
Krzysztof Parzyszek	0831f57afe	[Hexagon] Clean up some code in HexagonAsmPrinter, NFC llvm-svn: 328981	2018-04-02 15:06:55 +00:00
Nico Weber	f492f58182	Revert r328975, it makes TableGen assert on the bots. llvm-svn: 328978	2018-04-02 14:20:23 +00:00
Dmitry Preobrazhensky	32c450ae6a	[AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructions See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837 Differential Revision: https://reviews.llvm.org/D45085 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328975	2018-04-02 13:52:23 +00:00
Lama Saba	927468309f	[X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346 If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Differential revision: https://reviews.llvm.org/D41330 Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9 llvm-svn: 328973	2018-04-02 13:48:28 +00:00
Hiroshi Inoue	6d48493817	[PowerPC] fix assertion failure due to missing instruction in P9InstrResources.td This patch adds L(D\|W\|H\|B)XTLS instructions introduced by https://reviews.llvm.org/rL327635 in P9InstrResources.td. llvm-svn: 328969	2018-04-02 12:18:21 +00:00
Craig Topper	96729cd64b	[X86][Silvermont] Use correct latency and throughput information for divide and square root in the scheduler model. Data taken from Table 16-17 in the Intel Optimization Manual. llvm-svn: 328962	2018-04-02 06:34:16 +00:00
Craig Topper	6a814904da	[X86][SkylakeServer] Correct throughput for 512-bit sqrt and divide. Data taken from the AVX512_SKX_PortAssign spreadsheet at http://instlatx64.atw.hu/ llvm-svn: 328961	2018-04-02 05:54:34 +00:00
Craig Topper	8104f266a4	[X86] Correct the throughput for divide instructions in Sandy Bridge/Haswell/Broadwell/Skylake scheduler models. Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those. llvm-svn: 328960	2018-04-02 05:33:28 +00:00
Craig Topper	dc74094398	[X86] Fix the SchedRW for AVX512 shift instructions. It was being inadvertently defaulted to an FADD scheduler class. llvm-svn: 328959	2018-04-02 03:15:02 +00:00
Craig Topper	5fb1dc2d22	[X86] Give the AVX512 VEXTRACT instructions the same SchedRWs as the SSE/AVX versions. llvm-svn: 328958	2018-04-02 02:44:55 +00:00
Craig Topper	caec723a1a	[X86] Add an itinerary to BTR64rr. llvm-svn: 328956	2018-04-02 01:12:34 +00:00
Craig Topper	02daec00a2	[X86] Make sure all the classes declare in the Haswell scheduler model are prefixed with HW. The tablegen files all share a namespace so we shouldn't use a generic names in a specific scheduler model. llvm-svn: 328955	2018-04-02 01:12:32 +00:00
Craig Topper	c90d906b16	[X86] Give VINSERTPS the same intinerary as INSERTPS. llvm-svn: 328954	2018-04-02 00:48:11 +00:00
Craig Topper	dc4a6d1ef6	[X86] Cleanup ADCX/ADOX instruction definitions. Give them both the same itineraries. Add hasSideEffects = 0 to ADOX since they don't have patterns. Rename source operands to $src1 and $src2 instead of $src0 and $src. Add ReadAfterLd to the memory form SchedRW. llvm-svn: 328952	2018-04-01 23:58:50 +00:00
Petr Hosek	934e5d5436	[AArch64] Reserve x18 register on Fuchsia This register is reserved as a platform register on Fuchsia. Differential Revision: https://reviews.llvm.org/D45105 llvm-svn: 328950	2018-04-01 23:44:04 +00:00
Craig Topper	9f834810ea	[X86] Give ADC8/16/32/64mi the same scheduling information as ADC8/16/32/64mr and SBB8/16/32/64mi. It doesn't make a lot of sense that it would be different. llvm-svn: 328946	2018-04-01 21:54:24 +00:00
Chandler Carruth	4244625c51	[x86] Correct the operand structure of the ADOX instruction. This also moves to define it in the same way as ADCX which seems to use constraints a bit better. This is pulled out of the review for reducing the use of popf for restoring EFLAGS, but is independent. There are still more problems with our definitions for these instructions that Craig is going to look at but this is at least less broken and he can start from this to improve them more fully. Thanks to Craig for the review here. llvm-svn: 328945	2018-04-01 21:53:18 +00:00
Chandler Carruth	06b343c6ed	[x86] Expose more of the condition conversion routines in the public API for X86's instruction information. I've now got a second patch under review that needs these same APIs. This bit is nicely orthogonal and obvious, so landing it. NFC. llvm-svn: 328944	2018-04-01 21:47:55 +00:00
Nicolai Haehnle	4254d45a79	AMDGPU: Make isIntrinsicSourceOfDivergence table-driven Summary: This is in preparation for the new dimension-aware image intrinsics, which I'd rather not have to list here by hand. Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44938 llvm-svn: 328939	2018-04-01 17:09:14 +00:00
Nicolai Haehnle	5d0d30304c	AMDGPU: Make getTgtMemIntrinsic table-driven for resource-based intrinsics Summary: Avoids having to list all intrinsics manually. This is in preparation for the new dimension-aware image intrinsics, which I'd rather not have to list here by hand. Change-Id: If7ced04998397ef68c4cb8f7de66b5050fb767e5 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44937 llvm-svn: 328938	2018-04-01 17:09:07 +00:00
Craig Topper	9b8cd5fe55	[X86] Don't check for folding into a store when deciding if we can promote an i16 mul. There's no RMW mul operation. llvm-svn: 328931	2018-04-01 06:29:32 +00:00
Craig Topper	db6caabccc	[X86] Check if the load and store are to the same pointer before preventing i16 RMW shifts and subtracts from being promoted. llvm-svn: 328930	2018-04-01 06:29:28 +00:00
Craig Topper	ae2de57db0	[X86] Allow i16 subtracts to be promoted if the load is on the LHS and its not being stored. llvm-svn: 328928	2018-04-01 06:29:25 +00:00
Craig Topper	9bc0d881a3	[X86] Remove unneeded temporary variable. NFC This Promote flag was alwasys set to true except in the default case. But in the default case we don't need to set PVT and can just return false. llvm-svn: 328926	2018-04-01 06:29:21 +00:00
Simon Pilgrim	3b8ad346f9	[X86][Btver2] Add MMX_PSHUFB to the JWritePSHUFB InstRW entries llvm-svn: 328918	2018-03-31 09:15:54 +00:00
Simon Pilgrim	8c8ebd7945	Fix trailing whitespace. NFCI. llvm-svn: 328917	2018-03-31 09:14:14 +00:00
Craig Topper	13a0f83a05	[X86] Add SchedRW for PMULLD Summary: It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput. This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet. I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs. Reviewers: RKSimon, GGanesh, courbet Reviewed By: RKSimon Subscribers: gchatelet, gbedwell, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D44972 llvm-svn: 328914	2018-03-31 04:54:32 +00:00
Fangrui Song	956ee79795	Fix a bunch of typoes. NFC llvm-svn: 328907	2018-03-30 22:22:31 +00:00
Jacob Gravelle	40926451d2	[WebAssembly] Register wasm passes with the PassRegistry Summary: This exposes WebAssembly passes for use on the command line (as arguments to -print-before and the like). Reviewers: dschuff, sunfish Subscribers: MatzeB, jfb, sbc100, llvm-commits, aheejin Differential Revision: https://reviews.llvm.org/D45103 llvm-svn: 328901	2018-03-30 20:36:58 +00:00
Krzysztof Parzyszek	74096f7258	[Hexagon] Reduce excessive indentation in .s output llvm-svn: 328898	2018-03-30 19:30:28 +00:00
Krzysztof Parzyszek	0f983d69a4	[Hexagon] Avoid creating invalid offsets in packetizer Two memory instructions with a dependency only on the address register between the two (the first one of them being post-incrememnt) can be packetized together after the offset on the second was updated to the incremement value. Make sure that the new offset is valid for the instruction. llvm-svn: 328897	2018-03-30 19:28:37 +00:00
Andrea Di Biagio	dc97172b2f	[X86][BtVer2] Fixed the number of micro opcodes for AVX vector converts and VSQRT instructions. There were still a few AVX instructions with an incorrect number of opcodes. These should be fixed now. llvm-svn: 328892	2018-03-30 18:53:47 +00:00
Andrea Di Biagio	3eaa26bb64	[X86][BtVer2] Fix the number of uOps for horizontal operations. llvm-svn: 328886	2018-03-30 18:15:30 +00:00
Tim Shen	8f9f026965	[NVPTX] Enable StructuredCFG for NVPTX Summary: Make NVPTX require structured CFG. Added a temporary flag to "roll back" the behavior for easy deployment. Combined with D45008, this fixes several internal Nvidia GPU test failures that we suspect to be ptxas miscompiles (PR27738). Reviewers: jlebar Subscribers: jholewinski, sanjoy, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D45070 llvm-svn: 328885	2018-03-30 17:51:03 +00:00
Derek Schuff	a2726e9ab6	[WebAssembly] Refactor tablegen for store instructions (NFC) Summary: Add patterns similar to loads. Differential Revision: https://reviews.llvm.org/D45064 llvm-svn: 328876	2018-03-30 17:02:50 +00:00
Krzysztof Parzyszek	fce30c2ba3	Revert "peel loops with runtime small trip counts" This reverts commit r328854, it breaks some Hexagon tests. llvm-svn: 328875	2018-03-30 16:55:44 +00:00
Stanislav Mekhanoshin	74e2974ac6	[AMDGPU] Fixed some instructions latencies Differential Revision: https://reviews.llvm.org/D45073 llvm-svn: 328874	2018-03-30 16:19:13 +00:00
Krzysztof Parzyszek	4f99836a9e	[Hexagon] Recognize and handle :endloop01 llvm-svn: 328870	2018-03-30 15:29:47 +00:00
Krzysztof Parzyszek	46abcb236b	[Hexagon] Fix printing :mem_noshuf on compiler-generated packets llvm-svn: 328869	2018-03-30 15:09:05 +00:00
Andrea Di Biagio	073a9d74ca	[X86][BtVer2] Add missing ReadAfterLd to RM variants of AVX horizontal adds and most vector logic instructions. Fixed a few InstRW that forgot to specify a ReadAfterLd for the register input operand. llvm-svn: 328867	2018-03-30 14:48:08 +00:00
Krzysztof Parzyszek	3f55ad8fae	[Hexagon] Remove unused scheduling classes llvm-svn: 328866	2018-03-30 14:34:32 +00:00
Krzysztof Parzyszek	1ca23d9837	[Hexagon] Pass pointer to SelectionDAG to dump functions llvm-svn: 328864	2018-03-30 14:29:15 +00:00
Michael Bedy	59e5ef793c	[AMDGPU] Fix the SDWA Peephole phase to handle src for dst:UNUSED_PRESERVE. Summary: The phase attempts to transform operations that extract a portion of a value into an SDWA src operand in cases where that value is used only once. It was not prepared for this use to be the preserved portion of a value for dst:UNUSED_PRESERVE, resulting in a crash or assert. This change either rejects the illegal SDWA attempt, or in the case where dst:WORD_1 and the src_sel would be WORD_0, removes the unneeded extract instruction. Reviewers: arsenm, #amdgpu Reviewed By: arsenm, #amdgpu Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44364 llvm-svn: 328856	2018-03-30 05:03:36 +00:00
Ikhlas Ajbar	66c8ba5a50	peel loops with runtime small trip counts For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 328854	2018-03-30 03:05:34 +00:00
David Blaikie	bd0c88078a	Remove some unneeded #includes to fix layering llvm-svn: 328838	2018-03-29 22:31:36 +00:00
Craig Topper	ee3c19fd7f	[X86] Add ReadAfterLds to some 3 src instructions Sometimes the operand comes after the memory operand so we need 5 ReadDefaults first. I suspect we also need to do something for the mask operand for masked avx512 instructions? I'm not sure if the mask should be ReadAfterLd or not since it can mask faults. If it shouldn't be ReadAfterLd then we're probably wrong for zero masking instructions already. Differential Revision: https://reviews.llvm.org/D44726 llvm-svn: 328834	2018-03-29 22:03:05 +00:00
Matt Arsenault	efd1b30436	AMDGPU: Fix build warning in release llvm-svn: 328832	2018-03-29 21:44:44 +00:00
Matt Arsenault	03ae399d50	AMDGPU: Support realigning stack While the stack access instructions don't care about alignment > 4, some transformations on the pointer calculation do make assumptions based on knowing the low bits of a pointer are 0. If a stack object ends up being accessed through its absolute address (relative to the kernel scratch wave offset), the addressing expression may depend on the stack frame being properly aligned. This was breaking in a testcase due to the add->or combine. I think some of the SP/FP handling logic is still backwards, and overly simplistic to support all of the stack features. Code which tries to modify the SP with inline asm for example or variable sized objects will probably require redoing this. llvm-svn: 328831	2018-03-29 21:30:06 +00:00
Craig Topper	3f2dbec652	[X86] Remove ReadAfterLd from BMI and TBM instructions that don't have a register operand in their memory form The memory form of these instructions only read an input from memory. They don't have any register operands. Differential Revision: https://reviews.llvm.org/D44836 llvm-svn: 328828	2018-03-29 21:03:53 +00:00
Craig Topper	89310f56c8	[X86] Correct the placement of ReadAfterLd in BEXTR and BZHI. Add dedicated SchedRW for BEXTR/BZHI. These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd Differential Revision: https://reviews.llvm.org/D44838 llvm-svn: 328823	2018-03-29 20:41:39 +00:00
Matt Arsenault	ffb132e74b	AMDGPU: Increase default stack alignment 8 and 16-byte values are common, so increase the default alignment to avoid realigning the stack in most functions. llvm-svn: 328821	2018-03-29 20:22:04 +00:00
Matt Arsenault	6c041a3cab	AMDGPU: Fix selection error on constant loads with < 4 byte alignment llvm-svn: 328818	2018-03-29 19:59:28 +00:00
Mandeep Singh Grang	10d8b85570	[Mips] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace all std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: sdardis, RKSimon, dsanders, atanasyan Reviewed By: atanasyan Subscribers: atanasyan, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D44869 llvm-svn: 328815	2018-03-29 19:05:26 +00:00
Craig Topper	2fa1436206	[IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806	2018-03-29 17:21:10 +00:00
Krzysztof Parzyszek	dc7a557e6a	[Hexagon] Add support to handle bit-reverse load intrinsics Patch by Sumanth Gundapaneni. llvm-svn: 328774	2018-03-29 13:52:46 +00:00
Simon Pilgrim	71c5f3fffd	[X86][SSE] Don't bother re-adding combined target shuffles to the work list We are re-adding all the bitcasts, constant masks and target shuffles to the work list for no apparent gain. Found while investigating adding SimplifyDemandedVectorElts to target shuffles. Differential Revision: https://reviews.llvm.org/D44942 llvm-svn: 328771	2018-03-29 11:18:41 +00:00
Simon Dardis	32a27fc77a	[Mips] Remove dead code I believe the role of ehDataReg has been replaced by MipsABIInfo::GetEhDataReg, thus removing the dead code. Patch By: Wei-Ren Chen. Reviewers: ehostunreach, sdardis Differential Revision: https://reviews.llvm.org/D44867 llvm-svn: 328767	2018-03-29 09:21:20 +00:00
Craig Topper	a21758fa2c	[X86] Don't pass getRegisterName from the InstPrinters into EmitAnyX86InstComments. Just always use the function from the ATTPrinter. NFC The IntelPrinter and the ATTPrinter produce the same strings for the same input. We already use the ATTPrinter explicitly in several other places. llvm-svn: 328762	2018-03-29 04:14:04 +00:00
Craig Topper	7456af88f4	[X86] Rename RIi64_NOREX tblgen class to just Ii64. Make RIi64 inherit from it. NFC This feels more consistent with the other classes. We don't need to say _NOREX if we didn't start it with an R in the first place. llvm-svn: 328757	2018-03-29 03:14:57 +00:00
Craig Topper	7441ffff84	[X86] Cleanup inheritance of the X86InstrFormats.td classes. NFC EVEX shouldn't inherit from VEX and EVEX_4V shouldn't inherit from VEX_4V. llvm-svn: 328756	2018-03-29 03:14:56 +00:00
David Blaikie	b3f471a4bd	Remove some unused includes to fix layering. llvm-svn: 328745	2018-03-29 00:29:45 +00:00
David Blaikie	8ad9a97310	Plumb useAA through TargetTransformInfo to remove Transforms->CodeGen header dependency Thanks to echristo for the pointers on direction. llvm-svn: 328737	2018-03-28 22:28:50 +00:00
Craig Topper	aac23d7881	[X86][SkylakeServer] Remove checks for 'k', 'z', '_Int' and 'b' from scheduler regexs. Most of these were optional matches at the end of the strings, but since the strings themselves are prefix matches by default you don't need to check for something optional at the end. I've left the 'b' on memory instructions where it means 'broadcast' because I'm not sure those really have the same load latency and we may need to split them explicitly in the future. llvm-svn: 328730	2018-03-28 20:40:24 +00:00
Krzysztof Parzyszek	440ba3ae5c	[Hexagon] Add support for "new" circular buffer intrinsics These instructions have been around for a long time, but we haven't supported intrinsics for them. The "new" versions use the CSx register for the start of the buffer instead of the K field in the Mx register. We need to use pseudo instructions for these instructions until after register allocation. The problem is that these instructions allocate a M0/CS0 or M1/CS1 pair. But, we can't generate code for the CSx set-up until after register allocation when the Mx register has been fixed for the instruction. There is a related clang patch. Patch by Brendon Cahoon. llvm-svn: 328724	2018-03-28 19:38:29 +00:00
Jessica Paquette	4aa14dbcc2	[MachineOutliner] Simplify call outlining + require valid callee save info for call outlining This commit simplifies the call outlining logic by removing references to the Function associated with the callee. To do this, it requires that valid callee save info is available to the outliner. llvm-svn: 328719	2018-03-28 17:52:31 +00:00
David Blaikie	a373d18eb7	Transforms: Introduce Transforms/Utils.h rather than spreading the declarations amongst Scalar.h and IPO.h Fixes layering - Transforms/Utils shouldn't depend on including a Scalar or IPO header, because Scalar and IPO depend on Utils. llvm-svn: 328717	2018-03-28 17:44:36 +00:00
Dmitry Preobrazhensky	622bde8bc7	[AMDGPU][MC] Added ds_add_src2_f32 See bug 36833: https://bugs.llvm.org/show_bug.cgi?id=36833 Differential Revision: https://reviews.llvm.org/D44779 Reviewers: arsenm, artem.tamazov, timcorringham llvm-svn: 328713	2018-03-28 16:21:56 +00:00
Dmitry Preobrazhensky	2456ac696a	[AMDGPU][MC] Added PCK variants of image load/store instructions See bug 36834: https://bugs.llvm.org/show_bug.cgi?id=36834 Differential Revision: https://reviews.llvm.org/D44795 Reviewers: artem.tamazov, arsenm, timcorringham, nhaehnle llvm-svn: 328710	2018-03-28 15:44:16 +00:00
Dmitry Preobrazhensky	a917e88585	[AMDGPU][MC][GFX9] Added buffer_*_format_d16_hi_x See bug 36835: https://bugs.llvm.org/show_bug.cgi?id=36835 Differential Revision: https://reviews.llvm.org/D44825 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328707	2018-03-28 14:53:13 +00:00
Dmitry Preobrazhensky	dd2b929ffb	[AMDGPU][MC][GFX9] Added s_scratch* instructions See bug 36836: https://bugs.llvm.org/show_bug.cgi?id=36836 Differential Revision: https://reviews.llvm.org/D44832 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328704	2018-03-28 14:08:03 +00:00
Simon Pilgrim	b1bc6cd96b	[X86][Btver2] Moved JWriteFCmp/JWriteFCmpY classes next to each other. NFCI Renamed JWriteFPAY22 to JWriteFCmpY - we've tended to avoid latency based names llvm-svn: 328701	2018-03-28 13:53:21 +00:00
Andrea Di Biagio	5076b98fb9	[X86][BtVer2] Fix the number of micro opcodes for AES[ENC\|DEC] and other YMM instructions. Similar to r328694. The number of micro opcodes should be 2 for those instructions. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328698	2018-03-28 12:12:04 +00:00
Tim Renouf	cdac172e2a	Revert "[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader" This reverts commit 0daf86291d3aa04d3cc280cd0ef24abdb0174981. It was causing an assert in test/CodeGen/AMDGPU/amdpal.ll only on a release-with-asserts build. I will resubmit the change when I have fixed that. Change-Id: If270594eba27a7dc4076bdeab3fa8e6bfda3288a llvm-svn: 328695	2018-03-28 11:21:07 +00:00
Andrea Di Biagio	010924e35c	[X86][BtVer2] Fix the number of micro opcodes for a bunch of YMM instructions. The Jaguar backend natively supports 128-bit data types. Operations on YMM registers are split into two COPs (complex operations). Each COP consumes a slot in the dispatch group, and in the reorder buffer. The scheduling model for Jaguar should mark those instructions as `let NumMicroOps = 2`. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328694	2018-03-28 10:49:33 +00:00
Christof Douma	a1e77c0e02	[ARM] Support float literals under XO Follow up patch of r328313 to support the UseVMOVSR constraint. Removed some unneeded instructions from the test and removed some stray comments. Differential Revision: https://reviews.llvm.org/D44941 llvm-svn: 328691	2018-03-28 10:02:26 +00:00
Matt Arsenault	bd49eccca1	AMDGPU: Really implement getFrameRegister Currently this seems to only really be used for debug info. llvm-svn: 328677	2018-03-27 23:26:59 +00:00
Jessica Paquette	2519ee7081	[MachineOutliner] AArch64: Don't outline ADRPs with un-outlinable operands If an ADRP appears with, say, a CPI operand, we shouldn't outline it. This moves the check for unsafe operands so that it occurs before the special-case for ADRPs. Also add a test for outlining ADRPs. llvm-svn: 328674	2018-03-27 22:23:48 +00:00
Tim Renouf	e4208bfa5b	[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader Summary: For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders). This commit fixes that to use offset 0x10 instead of offset 0 for a compute shader, per the PAL ABI spec. Reviewers: kzhuravl, nhaehnle, timcorringham Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D44468 Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f llvm-svn: 328673	2018-03-27 21:35:00 +00:00
Sterling Augustine	33dc01861a	Initialize variable added in r328617. llvm-svn: 328667	2018-03-27 21:11:57 +00:00
Simon Pilgrim	a2f26788a3	[X86] Add WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classes Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer. Differential Revision: https://reviews.llvm.org/D44924 llvm-svn: 328664	2018-03-27 20:38:54 +00:00
Matt Arsenault	17f3338015	AMDGPU: Fix not preserving CSR VGPR if used for SGPR spills Before this was not done if the function had no calls in it. This is still a possible issue with any callable function, regardless of calls present. llvm-svn: 328659	2018-03-27 19:42:55 +00:00
Matt Arsenault	95329f8c53	AMDGPU: Set natural stack alignment in DataLayout Only 4 byte alignment is ever useful, so increasing anything beyond this may require realigning the stack. llvm-svn: 328656	2018-03-27 19:26:40 +00:00
Matt Arsenault	0a0c871f60	AMDGPU: Fix crash when MachinePointerInfo invalid The combine on a select of a load only triggers for addrspace 0, and discards the MachinePointerInfo. The conservative default needs to be used for this. llvm-svn: 328652	2018-03-27 18:39:45 +00:00
Matt Arsenault	e9f3679031	AMDGPU: Fix FP restore from being reordered with stack ops In a function, s5 is used as the frame base SGPR. If a function is calling another function, during the call sequence it is copied to a preserved SGPR and restored. Before it was possible for the scheduler to move stack operations before the restore of s5, since there's nothing to associate a frame index access with the restore. Add an implicit use of s5 to the adjcallstack pseudo which ends the call sequence to preven this from happening. I'm not 100% satisfied with this solution, but I'm not sure what else would be better. llvm-svn: 328650	2018-03-27 18:38:51 +00:00
Krzysztof Parzyszek	0375cd46ef	[Hexagon] Implement TTI::shouldMaximizeVectorBandwidth llvm-svn: 328648	2018-03-27 18:10:47 +00:00
Stefan Pintilie	659f040351	[Power9] Fix the resource list for the COPY instruction. The COPY instruction was listed as a 4 cycle instruction. It is now listed correctly as a 2 cycle ALU instruction. llvm-svn: 328647	2018-03-27 17:51:53 +00:00
Krzysztof Parzyszek	0a15d24134	[Hexagon] Rudimentary support for auto-vectorization for HVX This implements a set of TTI functions that the loop vectorizer uses. The only purpose of this is to enable testing. Auto-vectorization is disabled by default, enabled by -hexagon-autohvx. llvm-svn: 328639	2018-03-27 17:07:52 +00:00
Rafael Auler	d058b882be	[AArch64] Decorate AArch64 instrs with OPERAND_PCREL Summary: This is a canonical way to teach objdump to print the target symbols for branches when disassembling AArch64 code. Reviewers: evandro, t.p.northover, espindola Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D44851 llvm-svn: 328638	2018-03-27 16:58:01 +00:00
Simon Pilgrim	5f7ab4fedf	[X86][Btver2] Add MMX_PMOVMSKBrr to MOVMSK scheduler class llvm-svn: 328620	2018-03-27 12:26:12 +00:00
Strahinja Petrovic	06cf6a6490	[PowerPC] Secure PLT support This patch supports secure PLT mode for PowerPC 32 architecture. Differential Revision: https://reviews.llvm.org/D42112 llvm-svn: 328617	2018-03-27 11:23:53 +00:00
Alexander Richardson	e8059b1de4	[MIPS] Add static_assert that all Fixups are handled in getFixupKind Summary: I recently added a new Fixup kind to our fork of LLVM but forgot to add it to the table in MipsAsmBackend.cpp. With this static_assert the error would have been caught instead of zero-initializing the array entries for the new fixups. Reviewers: sdardis, atanasyan Reviewed By: atanasyan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44895 llvm-svn: 328616	2018-03-27 10:08:12 +00:00
Simon Pilgrim	28e7bcbba6	[X86] Add WriteCRC32 scheduler class Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis. Differential Revision: https://reviews.llvm.org/D44647 llvm-svn: 328582	2018-03-26 21:06:14 +00:00
Krzysztof Parzyszek	4a5a80c370	[Hexagon] Assertion failure in HexagonSubtarget.cpp In restoreLatency, replace range-for loop with std::find. Patch by Jyotsna Verma. llvm-svn: 328574	2018-03-26 19:04:58 +00:00
Simon Pilgrim	fcf49df21c	[X86][Btver2] Add (U)COMISD/(U)COMISD scheduler costs Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) llvm-svn: 328573	2018-03-26 19:01:06 +00:00
Reid Kleckner	41fb2dba9c	[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32 Summary: Re-lands r328386 and r328443, reverting r328482. Incorporates fixes from @mstorsjo in D44876 (thanks!) so that small parameters in i8 and i16 do not end up in the SysV register parameters (EDI, ESI, etc). I added tests for how we receive small parameters, since that is the important part. It's always safe to store more bytes than will be read, but the assumptions you make when loading them are what really matter. I also tested this by self-hosting clang and it passed tests on win64. Reviewers: mstorsjo, hans Subscribers: hiraditya, mstorsjo, llvm-commits Differential Revision: https://reviews.llvm.org/D44900 llvm-svn: 328570	2018-03-26 18:49:48 +00:00
Simon Pilgrim	f33d905293	[X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes (PR36881) Give the bit count instructions their own scheduler classes instead of forcing them into existing classes. These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar). Differential Revision: https://reviews.llvm.org/D44879 llvm-svn: 328566	2018-03-26 18:19:28 +00:00
Mandeep Singh Grang	1b9ff45157	[XCore] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace all std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: dblaikie, RKSimon, robertlytton Reviewed By: robertlytton Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44875 llvm-svn: 328564	2018-03-26 18:08:26 +00:00
Lei Huang	be0afb0870	[Power9]Legalize and emit code for quad-precision convert from double-precision Legalize and emit code for quad-precision floating point operation xscvdpqp and add option to guard the quad precision operation support. Differential Revision: https://reviews.llvm.org/D44746 llvm-svn: 328558	2018-03-26 17:46:25 +00:00
Stefan Pintilie	26d4f923c4	[PowerPC] Infrastructure work. Implement getting the opcode for a spill in one place. A new function getOpcodeForSpill should now be the only place to get the opcode for a given spilled register. Differential Revision: https://reviews.llvm.org/D43086 llvm-svn: 328556	2018-03-26 17:39:18 +00:00
Tim Corringham	7116e8963d	[AMDGPU] Improve disassembler error handling Summary: llvm-objdump now disassembles unrecognised opcodes as data, using the .long directive. We treat unrecognised opcodes as being 32 bit values, so move along 4 bytes rather than the single byte which previously resulted in a cascade of bogus disassembly following an unrecognised opcode. While no solution can always disassemble code that contains embedded data correctly this provides a significant improvement. The disassembler will now cope with an arbitrary length section as it no longer truncates it to a multiple of 4 bytes, and will use the .byte directive for trailing bytes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44685 llvm-svn: 328553	2018-03-26 17:06:33 +00:00
Simon Pilgrim	86ea53123d	[X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costs We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR..... llvm-svn: 328551	2018-03-26 17:02:02 +00:00
David Blaikie	535ca36e5e	Remove an unneeded (& mislayered) include from Target/TargetLoweringObjectFile on a CodeGen header llvm-svn: 328549	2018-03-26 16:57:31 +00:00
David Blaikie	a1b2bf4c71	Remove unneeded (& mislayered) include from TargetMachine.cpp on a CodeGen header llvm-svn: 328548	2018-03-26 16:52:10 +00:00
Krzysztof Parzyszek	a212204453	[Pipeliner] Use latency to compute RecMII The patch contains severals changes needed to pipeline an example that was transformed so that a Phi with a subreg is converted to copies. The pipeliner wasn't working for a couple of reasons. - The RecMII was 3 instead of 2 due to the extra copies. - Copy instructions contained a latency of 1. - The node order algorithm was not choosing the best "bottom" node, which caused an instruction to be scheduled that had a predecessor and successor already scheduled. - Updated the Hexagon Machine Scheduler to check if the node is latency bound when adding the cost for a 0-latency dependence. The RecMII was 3 because the computation looks at the number of nodes in the recurrence. The extra copy is an extra node but it shouldn't increase the latency. The new RecMII computation looks at the latency of the instructions in the recurrence. We changed the latency of the dependence of a copy to 0. The latency computation for the copy also checks the use of the copy (similar to a reg_sequence). The node order algorithm was not choosing the last instruction in the recurrence for a bottom up traversal. This was when the last instruction is a copy. A check was added when choosing the instruction to check for NodeNum if the maxASAP is the same. This means that the scheduler will not end up with another node in the recurrence that has both a predecessor and successor already scheduled. The cost computation in Hexagon Machine Scheduler adds cost when an instruction can be packetized with a zero-latency instruction. We should only do this if the schedule is latency bound. Patch by Brendon Cahoon. llvm-svn: 328542	2018-03-26 16:33:16 +00:00
Simon Pilgrim	8815105cd5	[X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costs llvm-svn: 328541	2018-03-26 16:24:13 +00:00
Simon Pilgrim	aa40148cae	[X86][Btver2] Account for the "+i" integer pipe transfer costs (1cy use of JALU0 for GPR PRF write) llvm-svn: 328536	2018-03-26 16:10:08 +00:00
Krzysztof Parzyszek	56f0fc4716	[Hexagon] Give priority to post-incremementing memory accesses in LSR llvm-svn: 328506	2018-03-26 15:32:03 +00:00
Simon Pilgrim	0b73b29388	[X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costs Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) This also adds missing vcvttss2si tests llvm-svn: 328505	2018-03-26 15:30:47 +00:00
Simon Pilgrim	3aa9344605	[X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costs These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults. llvm-svn: 328501	2018-03-26 14:44:24 +00:00
Simon Pilgrim	67df1cf597	[X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costs The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps llvm-svn: 328497	2018-03-26 14:03:40 +00:00
Nicolai Haehnle	4f850eabb6	AMDGPU: Introduce common SOP_Pseudo and VOP_Pseudo TableGen base classes Differential revision: https://reviews.llvm.org/D44820 Change-Id: I732979e2964006aa15d78a333d8886e6855f319a llvm-svn: 328496	2018-03-26 13:56:53 +00:00
Simon Pilgrim	caa203aed5	[X86][Btver2] Double the AGU and schedule pipe resources for YMM Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model. llvm-svn: 328491	2018-03-26 13:15:20 +00:00
Hans Wennborg	311b63f13b	Revert r328386 "[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32" This broke Chromium (see crbug.com/825748). It looks like mstorsjo's follow-up patch at D44876 fixes this, but let's revert back to green for now until that's ready to land. (Also reverts r328443.) > Both GCC and MSVC only look at the low byte of a boolean when it is > passed. llvm-svn: 328482	2018-03-26 10:07:51 +00:00
Martin Storsjo	439824622a	[ARM] Simplify constructing the ARMArchFeature string. NFC. Differential Revision: https://reviews.llvm.org/D44819 llvm-svn: 328478	2018-03-26 08:41:10 +00:00
Craig Topper	6f28d3c954	[X86] Fix the SchedRW for intrinsic register form of SQRT/RCP/RSQRT. llvm-svn: 328474	2018-03-26 05:05:12 +00:00
Craig Topper	cdfcf8ecda	[X86] Merge the SSE and AVX versions of fp divs and sqrts in the SandyBridge/Haswell/Broadwell/Skylake scheduler models. I've used Agner's data as best I could to get the values to converge on. llvm-svn: 328473	2018-03-26 05:05:10 +00:00
Craig Topper	fbf2d850e3	[X86] Add itinerary to intrinsic version of sqrtss, rcpss, and rsqrtss instructions. llvm-svn: 328472	2018-03-26 04:20:36 +00:00
Craig Topper	c049cb7823	[X86] Correct the itineraries for the dot production instructions. llvm-svn: 328471	2018-03-26 02:17:15 +00:00
Craig Topper	4367874bc5	[X86] Use the same itinerary for VCVTDQ2PD as the SSE version so that the generated scheduler classes will merge. llvm-svn: 328470	2018-03-26 02:17:14 +00:00
Craig Topper	659f85af14	[X86] Swap the itineraries on the memory and register forms of CVTDQ2PD. They were backwards. llvm-svn: 328469	2018-03-26 02:17:13 +00:00
Craig Topper	4bf23eddaf	[X86] Give VMOVSX/ZX the same itinerary as the SSE version so they'll reuse the same generated scheduler class. llvm-svn: 328468	2018-03-26 02:17:12 +00:00
Craig Topper	6e8d99bbea	[X86] Give vpmsadbw the same itinerary as the SSE version so they'll be able to share the same generated scheduler class. llvm-svn: 328466	2018-03-25 23:52:06 +00:00
Craig Topper	15fef89ad9	[X86] Move (v)movss to port 5 only for Skylake. Move (v)movups/d to port 015 for Skylake. This matches Agner's data and is consistent with what the EVEX instructions were doing on SKX. llvm-svn: 328465	2018-03-25 23:40:56 +00:00
Simon Pilgrim	68a8fbc102	[X86] Use WriteResPair for WriteIDiv to cleanup sched defs. NFCI. llvm-svn: 328460	2018-03-25 20:16:53 +00:00
Simon Pilgrim	fecb0b7874	[X86][SkylakeClient] Fix missing comma llvm-svn: 328458	2018-03-25 19:17:17 +00:00
Simon Pilgrim	351e4fa0e2	[ARM] Remove sched model instregex entries that don't match any instructions (D44687) Reviewed by @javed.absar llvm-svn: 328457	2018-03-25 19:07:17 +00:00
Simon Pilgrim	854ac7490d	[X86] Add missing full stop to comment. NFCI. llvm-svn: 328456	2018-03-25 18:49:48 +00:00
Craig Topper	972bdbd415	[X86][SkylakeClient] Fix a set of regular expressions that were checking for optionally starting with 'Y' instead of 'V' These bad regexs were introduced by r328435 llvm-svn: 328454	2018-03-25 17:33:14 +00:00
Simon Pilgrim	562e8b4eae	[X86][MMX] MOVQ2DQ/MOVDQ2Q are better described as WriteVecMove than WriteMove Not that it makes a difference to current cost values, but will when we try to better model GPR-SIMD transfer costs llvm-svn: 328453	2018-03-25 17:28:06 +00:00
Simon Pilgrim	25acc0a79b	[X86][SkylakeServer] Merge multiple instregex. NFCI llvm-svn: 328452	2018-03-25 17:25:37 +00:00
Craig Topper	a985919d3e	[X86] Update cost model for Goldmont. Add fsqrt costs for Silvermont Add fdiv costs for Goldmont using table 16-17 of the Intel Optimization Manual. Also add overrides for FSQRT for Goldmont and Silvermont. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44644 llvm-svn: 328451	2018-03-25 15:58:12 +00:00
Simon Pilgrim	e3547af7be	[X86] Add the ability to override memory folding latency to schedules and add 1uop for memory folds for Intel models The Intel models need an extra 1uop for memory folded instructions, plus a lot of instructions take a non-default memory latency which should allow us to use the multiclass a lot more to tidy things up. Differential Revision: https://reviews.llvm.org/D44840 llvm-svn: 328446	2018-03-25 10:21:19 +00:00
Craig Topper	e8f4e747bf	[X86] Consistently prefix all defs in X86ScheduleSLM.td with 'SLM'. llvm-svn: 328444	2018-03-25 01:28:43 +00:00
Martin Storsjo	98720156b9	[X86] Update a partially stale comment, since SVN r328386. NFC. llvm-svn: 328443	2018-03-24 23:00:00 +00:00
Simon Pilgrim	31a9633724	[X86][SkylakeClient] Merge xmm/ymm instructions instregex entries to reduce regex matches to reduce compile time llvm-svn: 328435	2018-03-24 20:40:14 +00:00
Simon Pilgrim	c21deec37b	[X86][Broadwell] Merge xmm/ymm instructions instregex entries to reduce regex matches to reduce compile time llvm-svn: 328434	2018-03-24 19:37:28 +00:00
Mandeep Singh Grang	98bc25a0f2	[RISCV] Use init_array instead of ctors for RISCV target, by default Summary: LLVM defaults to the newer .init_array/.fini_array scheme for static constructors rather than the less desirable .ctors/.dtors (the UseCtors flag defaults to false). This wasn't being respected in the RISC-V backend because it fails to call TargetLoweringObjectFileELF::InitializeELF with the the appropriate flag for UseInitArray. This patch fixes this by implementing RISCVELFTargetObjectFile and overriding its Initialize method to call InitializeELF(TM.Options.UseInitArray). Reviewers: asb, apazos Reviewed By: asb Subscribers: mgorny, rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, kito-cheng, shiva0217, llvm-commits Differential Revision: https://reviews.llvm.org/D44750 llvm-svn: 328433	2018-03-24 18:37:19 +00:00
Simon Pilgrim	2b5967f510	[X86][Haswell] Merge xmm/ymm instructions instregex entries to reduce regex matches to reduce compile time llvm-svn: 328432	2018-03-24 18:36:01 +00:00
Simon Pilgrim	efcf1d85b3	[X86][SandyBridge] Merge xmm/ymm instructions instregex entries to reduce regex matches to reduce compile time llvm-svn: 328431	2018-03-24 18:12:59 +00:00
Mandeep Singh Grang	db00e2e20f	[Hexagon] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace all std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: kparzysz Reviewed By: kparzysz Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44857 llvm-svn: 328430	2018-03-24 17:34:37 +00:00
Mandeep Singh Grang	860adef9e6	[AMDGPU] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Reviewers: tstellar, RKSimon, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44856 llvm-svn: 328429	2018-03-24 17:15:04 +00:00
Craig Topper	097b47a0fc	[X86] Add a new disassembler opcode map for 3DNow. Stop treating 3DNow as an attribute. This reduces the size of llvm-mc by at least 150k since we no longer have to multiply the attribute across 7 tables. llvm-svn: 328416	2018-03-24 07:48:54 +00:00
Craig Topper	e865641aea	[X86] Merge the Has3DNow0F0FOpcode TSFlag into the OpMap encoding. NFC The 3DNow instructions are encoded a little weird, but we can still represent it as an opcode map. llvm-svn: 328410	2018-03-24 06:04:12 +00:00
Craig Topper	2c0a62ab9a	[X86] Add a DAG combine to simplify PMULDQ/PMULUDQ nodes These nodes only use the lower 32 bits of their inputs so we can use SimplifyDemandedBits to simplify them. Differential Revision: https://reviews.llvm.org/D44375 llvm-svn: 328405	2018-03-24 01:52:01 +00:00
Craig Topper	bc6d2ec8ce	[X86] Correct the value AdSizeX in X86II enum. NFC Should be NFC since nothing used the enum value. The instruction descriptions are generated from tablegen which had the correct value. llvm-svn: 328398	2018-03-24 00:02:46 +00:00
David Blaikie	36a0f226b1	Fix layering by moving ValueTypes.h from CodeGen to IR ValueTypes.h is implemented in IR already. llvm-svn: 328397	2018-03-23 23:58:31 +00:00
David Blaikie	13e77db2df	Fix layering of MachineValueType.h by moving it from CodeGen to Support This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395	2018-03-23 23:58:25 +00:00
David Blaikie	bf121cf44a	Fix layering by moving Support/CodeGenCWrappers.h to Target This includes llvm-c/TargetMachine.h which is logically part of libTarget (since libTarget implements llvm-c/TargetMachine.h's functions). llvm-svn: 328394	2018-03-23 23:58:21 +00:00
David Blaikie	ab7f17f4ec	Fix layering by moving X86DisassemblerDecoderCommon to Support This is used from llvm tblgen and the X86Disassembler - the only common library (apart from TableGen, which probably doesn't make sense to have as a dependency from a release tool (rather than a use-while-building-llvm tool) of LLVM) llvm-svn: 328393	2018-03-23 23:58:20 +00:00
David Blaikie	6054e650ff	Move TargetLoweringObjectFile from CodeGen to Target to fix layering It's implemented in Target & include from other Target headers, so the header should be in Target. llvm-svn: 328392	2018-03-23 23:58:19 +00:00
Reid Kleckner	e27b410661	[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32 Both GCC and MSVC only look at the low byte of a boolean when it is passed. llvm-svn: 328386	2018-03-23 23:38:53 +00:00
Krzysztof Parzyszek	998df2ca4f	[Hexagon] Make findLoopInstr member of HexagonInstrInfo llvm-svn: 328367	2018-03-23 20:43:02 +00:00
Krzysztof Parzyszek	8038dad7db	[Hexagon] Correct update of instruction offet in HW loop fixup llvm-svn: 328366	2018-03-23 20:41:44 +00:00
Krzysztof Parzyszek	bcf0a96f9e	[Hexagon] Boost profit for word-mask immediates, reduce for others This avoids unnecessary splitting due to uninteresting immediates. llvm-svn: 328364	2018-03-23 20:11:00 +00:00
Krzysztof Parzyszek	ca93f5e605	[Hexagon] Assume all extendable branches to be of size 8 in relaxation The branch relaxation pass collects sizes of all instructions at the beginning, before any changes have been made. It then performs one pass over all branches to see which ones need to be extended. It does not account for the case when a previously valid branch becomes out-of-range due to relaxing other branches. This approach fixes this problem by assuming from the beginning that all extendable branches have been extended. This may cause unneeded relaxation in some cases, but avoids iteration and recomputing instruction sizes. llvm-svn: 328360	2018-03-23 19:47:13 +00:00
Krzysztof Parzyszek	6f503b96fb	[Hexagon] Incorrectly removing dead flag and adding kill flag The HexagonExpandCondsets pass is incorrectly removing the dead flag on a definition that is really dead, and adding a kill flag to a use that is tied to a definition. This causes an assert later during the machine scheduler when querying the live interval information. Patch by Brendon Cahoon. llvm-svn: 328357	2018-03-23 19:39:37 +00:00
Benjamin Kramer	faa9b438ce	[Hexagon] Silence unused variable warning in Release builds llvm-svn: 328356	2018-03-23 19:39:16 +00:00
Krzysztof Parzyszek	e247526cc9	[Hexagon] Fold offset in base+immediate loads/stores Optimize Ry = add(Rx,#n); memw(Ry+#0) = Rz => memw(Rx,#n) = Rz. Patch by Jyotsna Verma. llvm-svn: 328355	2018-03-23 19:30:34 +00:00
Craig Topper	4529d3abcb	[X86] Add itinerary to RCPSS*_Int and similar instructions. llvm-svn: 328353	2018-03-23 19:15:05 +00:00
Craig Topper	02fb3907f1	[X86] Add itineraries to ADD.*_DB instructions to match their normal counterparts. llvm-svn: 328352	2018-03-23 19:15:03 +00:00
Tony Tye	7a893d4e34	[AMDGPU] Remove use of OpenCL triple environment and replace with function attribute for AMDGPU - Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target. - Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS. Differential Revision: https://reviews.llvm.org/D43736 llvm-svn: 328349	2018-03-23 18:45:18 +00:00
Krzysztof Parzyszek	5f7ba9a74c	[Hexagon] Always generate mux out of predicated transfers if possible HexagonGenMux would collapse pairs of predicated transfers if it assumed that the predicated .new forms cannot be created. Turns out that generating mux is preferable in almost all cases. Introduce an option -hexagon-gen-mux-threshold that controls the minimum distance between the instruction defining the predicate and the later of the two transfers. If the distance is closer than the threshold, mux will not be generated. Set the threshold to 0 by default. llvm-svn: 328346	2018-03-23 18:43:09 +00:00
Krzysztof Parzyszek	80f10e4fe5	[Hexagon] Avoid early if-conversion for one sided branches Patch by Anand Kodnani. llvm-svn: 328344	2018-03-23 18:00:18 +00:00
Simon Pilgrim	6c63e6c222	[X86][Btver2] Cleanup TEST instructions to use JFPA (+JFPX on ymms) function unit llvm-svn: 328343	2018-03-23 17:59:22 +00:00
Ana Pazos	41573804f2	[ARM] Fix "Constant pool entry out of range!" in Thumb1 mode This patch fixes PR36658, "Constant pool entry out of range!" in Thumb1 mode. In ARMConstantIslands::optimizeThumb2JumpTables() in Thumb1 mode, adjustBBOffsetsAfter() is not calculating postOffset correctly by properly accounting for the padding that is required for the constant pool that immediately follows the jump table branch instruction. Reviewers: t.p.northover, eli.friedman Reviewed By: t.p.northover Subscribers: chrib, tstellar, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D44709 llvm-svn: 328341	2018-03-23 17:53:27 +00:00
Krzysztof Parzyszek	570c6440cd	[Hexagon] Two fixes in early if-conversion - Fix checking for vector predicate registers. - Avoid speculating llvm.lifetime.end intrinsic. Patch by Harsha Jagasia and Brendon Cahoon. llvm-svn: 328339	2018-03-23 17:46:09 +00:00
Simon Pilgrim	e5c0a041ff	[X86][Btver2] Cleanup MOVMSK instructions to use JFPA function unit Add missing non-VEX and (V)PMOVMSKB instructions to the pattern llvm-svn: 328338	2018-03-23 17:38:59 +00:00
Krzysztof Parzyszek	c98802de09	[Hexagon] Copy subregisters in HexagonStoreWiden When converting an instruction to the wider version, copy any subregisters if the original operand has a subregister. Patch by Brendon Cahoon. llvm-svn: 328333	2018-03-23 17:22:55 +00:00
Simon Pilgrim	256f149bf0	[X86][Btver2] Vector permutes use a JFPU01 scheduler pipe and JFPX/JVALU function unit llvm-svn: 328331	2018-03-23 16:17:56 +00:00
Simon Pilgrim	ee282b3160	[X86][Btver2] Vector store instructions use a JFPU1 scheduler pipe and JSAGU/JSTC function units llvm-svn: 328328	2018-03-23 15:35:13 +00:00
Zaara Syeda	6535993625	Re-commit: [MachineLICM] Add functions to MachineLICM to hoist invariant stores This patch adds functions to allow MachineLICM to hoist invariant stores. Currently, MachineLICM does not hoist any store instructions, however when storing the same value to a constant spot on the stack, the store instruction should be considered invariant and be hoisted. The function isInvariantStore iterates each operand of the store instruction and checks that each register operand satisfies isCallerPreservedPhysReg. The store may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore. This patch also adds the PowerPC changes needed to consider the stack register as caller preserved. Differential Revision: https://reviews.llvm.org/D40196 llvm-svn: 328326	2018-03-23 15:28:15 +00:00
Simon Pilgrim	1335b9c0ca	[X86][Btver2] Cleanup DPPS/DPPD instructions to use JFPA/JFPM function units llvm-svn: 328324	2018-03-23 15:17:50 +00:00
John Brawn	e3b44f9de6	[AArch64] Don't reduce the width of loads if it prevents combining a shift Loads and stores can only shift the offset register by the size of the value being loaded, but currently the DAGCombiner will reduce the width of the load if it's followed by a trunc making it impossible to later combine the shift. Solve this by implementing shouldReduceLoadWidth for the AArch64 backend and make it prevent the width reduction if this is what would happen, though do allow it if reducing the load width will let us eliminate a later sign or zero extend. Differential Revision: https://reviews.llvm.org/D44794 llvm-svn: 328321	2018-03-23 14:47:07 +00:00
Simon Pilgrim	5792e10ffb	[X86][Btver2] Fix MicroOps counts for DPPS/YMM memory folded instructions This was due to a misunderstanding over what llvm calls a micro-op (retirement unit) is actually called a macro-op on the AMD/Jaguar target. Folded loads don't affect num macro ops. llvm-svn: 328320	2018-03-23 14:45:03 +00:00
Simon Pilgrim	8619962c73	[X86][Btver2] Cleanup SSE42 PCMPISTR/PCMPESTR string instructions to correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units Fixes throughput to match Agner/Fam16h-SoG as well. llvm-svn: 328318	2018-03-23 14:27:26 +00:00
Christof Douma	4a025cc79d	[ARM] Support float literals under XO When targeting execute-only and fp-armv8, float constants in a compare resulted in instruction selection failures. This is now fixed by using vmov.f32 where possible, otherwise the floating point constant is lowered into a integer constant that is moved into a floating point register. This patch also restores using fpcmp with immediate 0 under fp-armv8. Change-Id: Ie87229706f4ed879a0c0cf66631b6047ed6c6443 llvm-svn: 328313	2018-03-23 13:02:03 +00:00
Simon Pilgrim	9ea14bbbb0	[X86][Znver1] Fix instregex entries that don't match any instructions (D44687) Reviewed by @GGanesh and @craig.topper llvm-svn: 328309	2018-03-23 12:08:23 +00:00
Simon Pilgrim	2755893834	[X86][SandyBridge] Fix missing comma that was causing string concatenation of 2 instregex entries Found while updating D44687 llvm-svn: 328308	2018-03-23 11:56:38 +00:00
Simon Pilgrim	a1e3ea01ef	[X86][Btver2] Vector move/load/store instructions use a JFPU01 scheduler pipe and JFPX/JVALU function unit as well as the AGUs llvm-svn: 328304	2018-03-23 11:27:31 +00:00
Florian Hahn	588e640ea1	[AArch64] Clean-up a few over-eager regexps in models. Patch by Simon Pilgrim <llvm-dev@redking.me.uk> That is a slightly modified version of the AArch64 changes from Simon's D44687 . llvm-svn: 328303	2018-03-23 11:00:42 +00:00
Martin Storsjo	e1a64fe95c	[ARM] Error out on .arm assembler directives on windows Windows on arm is thumb only. Differential Revision: https://reviews.llvm.org/D43005 llvm-svn: 328298	2018-03-23 09:10:03 +00:00
Craig Topper	dfeea84d63	[X86] Give VPCMPEQQ the same itinerary as its SSE counterpart. llvm-svn: 328296	2018-03-23 06:58:55 +00:00
Craig Topper	4787b7f434	[X86] Correct the latencies of SNB integer vector multiplies based on Agner's data. Add missing MMX multiplies. llvm-svn: 328295	2018-03-23 06:41:43 +00:00
Craig Topper	659c66dfc1	[X86] Match vpblendvb/vblendvps/vblendvpd itineraries to the SSE equivalent. Change pblendvb/blendvps/blendvpd to use WriteFVarBlend llvm-svn: 328294	2018-03-23 06:41:41 +00:00
Craig Topper	7580a7997d	[X86] Change VPSADBW itinerary to SSE_INTALU_ITINS_P to match the SSE version. llvm-svn: 328293	2018-03-23 06:41:40 +00:00

... 4 5 6 7 8 ...

47199 Commits