llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	56ea488d8b	[AMDGPU] Allow SDWA in instructions with immediates and SGPRs An encoding does not allow to use SDWA in an instruction with scalar operands, either literals or SGPRs. That is however possible to copy these operands into a VGPR first. Several copies of the value are produced if multiple SDWA conversions were done. To cleanup MachineLICM (to hoist copies out of loops), MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace SGPR to VGPR copy with immediate copy right to the VGPR) runs are added after the SDWA pass. Differential Revision: https://reviews.llvm.org/D33583 llvm-svn: 304219	2017-05-30 16:49:24 +00:00
Mark Searles	00ce96f6ee	[AMDGPU] Require waitcnt before barrier for all targets; adjust tests. Differential Revision: https://reviews.llvm.org/D33576 llvm-svn: 304217	2017-05-30 16:22:43 +00:00
Craig Topper	f6d4dc5b4a	[SelectionDAG] Set ISD::FPOWI to Expand by default Summary: Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie". This patch changes the default for this opcode to Expand and removes the hack from LegalizeDAG.cpp. It also removes all the code in the targets that set this opcode to Expand themselves since they can just rely on the default. Reviewers: spatel, RKSimon, efriedma Reviewed By: RKSimon Subscribers: jfb, dschuff, sbc100, jgravelle-google, nemanjai, javed.absar, andrew.w.kaylor, llvm-commits Differential Revision: https://reviews.llvm.org/D33530 llvm-svn: 304215	2017-05-30 15:27:55 +00:00
Andrew V. Tischenko	8b04826663	This patch closes PR28513: an optimization of multiplication by different constants. It's implemented on DAG combiner level. llvm-svn: 304209	2017-05-30 13:00:44 +00:00
Ulrich Weigand	3f484e68cc	[SystemZ] Add decimal floating-point instructions This adds assembler / disassembler support for the decimal floating-point instructions. Since LLVM does not yet have support for decimal float types, these cannot be used for codegen at this point. llvm-svn: 304203	2017-05-30 10:15:16 +00:00
Ulrich Weigand	f32adf6944	[SystemZ] Add hexadecimal floating-point instructions This adds assembler / disassembler support for the hexadecimal floating-point instructions. Since the Linux ABI does not use any hex float data types, these are not useful for codegen. llvm-svn: 304202	2017-05-30 10:13:23 +00:00
Zoran Jovanovic	375b60de74	[mips] Expansion of LI.S and LI.D Author: smaksimovic Reviewers: dsanders sdardis Introduces LI.S and LI.D pseudo instructions with floating point operands. Differential Revision: https://reviews.llvm.org/D14390 llvm-svn: 304198	2017-05-30 09:33:43 +00:00
Kristof Beyls	2af1e90eb2	Fix PR33031: correct the estimate of maximum offset for instructions spilling/filling the stack. llvm-svn: 304196	2017-05-30 06:58:41 +00:00
Jonas Paulsson	fe0c0935c8	[SystemZ] Improve buildVector() in SystemZISelLowering.cpp. Use VLREP when inserting one or more loads into a vector. This is more efficient than to first load and then use a VLVGP. Review: Ulrich Weigand llvm-svn: 304152	2017-05-29 13:22:23 +00:00
Nikolai Bozhenov	82f0801c1b	[Nios2] Target registration Reviewers: craig.topper, hfinkel, joerg, lattner, zvi Reviewed By: craig.topper Subscribers: oren_ben_simhon, igorb, belickim, tvvikram, mgorny, llvm-commits, pavel.v.chupin, DavidKreitzer Differential Revision: https://reviews.llvm.org/D32669 Patch by AndreiGrischenko <andrei.l.grischenko@intel.com> llvm-svn: 304144	2017-05-29 09:48:30 +00:00
Diana Picus	0c05cce4e0	[ARM] GlobalISel: Extract helper. NFCI. Create a helper to deal with the common code for merging incoming values together after they've been split during call lowering. There's likely more stuff that can be commoned up here, but we'll leave that for later. llvm-svn: 304143	2017-05-29 09:09:54 +00:00
Diana Picus	bf4aed2c38	[ARM] GlobalISel: Support array returns These are a bit rare in practice, but they don't require anything special compared to array parameters, so support them as well. llvm-svn: 304137	2017-05-29 08:19:19 +00:00
Hiroshi Inoue	e3c14ebbfa	[PPC] Fix assertion failure during binary encoding with -mcpu=pwr9 Summary clang -c -mcpu=pwr9 test/CodeGen/PowerPC/build-vector-tests.ll causes an assertion failure during the binary encoding. The failure occurs when a D-form load instruction takes two register operands instead of a register + an immediate. This patch fixes the problem and also adds an assertion to catch this failure earlier before the binary encoding (i.e. during lit test). The fix is from Nemanja Ivanovic @nemanjai. Differential Revision: https://reviews.llvm.org/D33482 llvm-svn: 304133	2017-05-29 07:12:39 +00:00
Diana Picus	8cca8cb0ce	[ARM] GlobalISel: Support array parameters/arguments Clang coerces structs into arrays, so it's a good idea to support them. Most of the support boils down to getting the splitToValueTypes helper to actually split types. We then use G_INSERT/G_EXTRACT to deal with the parts. llvm-svn: 304132	2017-05-29 07:01:52 +00:00
Zachary Turner	df1832cf86	Resubmit "[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables." This was reverted due to buildbot breakages and I was not familiar with this code to investigate it. But while trying to get a useful backtrace for the author, it turns out the fix was very obvious. Resubmitting this patch as is, and will submit the fix in a followup so that the fix is not hidden in the larger CL. llvm-svn: 304122	2017-05-29 02:19:37 +00:00
Zachary Turner	5b199be769	Revert "[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables." This reverts commit 28cb1003507f287726f43c771024a1dc102c45fe as well as all subsequent followups. llvm-tblgen currently segfaults with this change, and it seems it has been broken on the bots all day with no fixes in preparation. See, for example: http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/ llvm-svn: 304121	2017-05-29 01:48:53 +00:00
Dylan McKay	74fc1ce0c2	[AVR] Remove SREG from CPI's Uses; authored by Florian Zeitz Summary: CPI does not read the status register, but only writes it. Reviewers: dylanmckay Reviewed By: dylanmckay Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33223 llvm-svn: 304116	2017-05-29 00:10:14 +00:00
Geoff Berry	2739ebafb6	[AArch64][Falkor] Combine sched details files into one. NFC. llvm-svn: 304109	2017-05-28 22:20:44 +00:00
Geoff Berry	b542fb3817	[AArch64][Falkor] Fix some sched details. - Remove all uses of base sched model entries and set them all to Unsupported so all the opcodes are described in AArch64SchedFalkorDetails.td. - Remove entries for unsupported half-float opcodes. - Remove entries for unsupported LSE extension opcodes. - Add entry for MOVbaseTLS (and set Sched in base td file entry to WriteSys) and a few other pseudo ops. - Fix a few FP load/store with reg offset entries to use the LSLfast predicates. - Add Q size BIF/BIT/BSL entries. - Fix swapped Q/D sized CLS/CLZ/CNT/RBIT entires. - Fix pre/post increment address register latency (this operand is always dest 0). - Fix swapped FCVTHD/FCVTHS/FCVTDH/FCVTDS entries. - Fix XYZ resource over usage on LD[1-4] opcodes. llvm-svn: 304108	2017-05-28 21:48:31 +00:00
Ayman Musa	d9f1fe43a8	[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables. X86 backend holds huge tables in order to map between the register and memory forms of each instruction. This TableGen Backend automatically generated all these tables with the appropriate flags for each entry. Differential Revision: https://reviews.llvm.org/D32684 llvm-svn: 304088	2017-05-28 12:55:36 +00:00
Ayman Musa	0b4f97d5e9	[X86] Adding FoldGenRegForm helper field (for memory folding tables tableGen backend) to X86Inst class and set its value for the relevant instructions. Some register-register instructions can be encoded in 2 different ways, this happens when 2 register operands can be folded (separately). For example if we look at the MOV8rr and MOV8rr_REV, both instructions perform exactly the same operation, but are encoded differently. Here is the relevant information about these instructions from Intel's 64-ia-32-architectures-software-developer-manual: Opcode Instruction Op/En 64-Bit Mode Compat/Leg Mode Description 8A /r MOV r8,r/m8 RM Valid Valid Move r/m8 to r8. 88 /r MOV r/m8,r8 MR Valid Valid Move r8 to r/m8. Here we can see that in order to enable the folding of the output and input registers, we had to define 2 "encodings", and as a result we got 2 move 8-bit register-register instructions. In the X86 backend, we define both of these instructions, usually one has a regular name (MOV8rr) while the other has "_REV" suffix (MOV8rr_REV), must be marked with isCodeGenOnly flag and is not emitted from CodeGen. Automatically generating the memory folding tables relies on matching encodings of instructions, but in these cases where we want to map both memory forms of the mov 8-bit (MOV8rm & MOV8mr) to MOV8rr (not to MOV8rr_REV) we have to somehow point from the MOV8rr_REV to the "regular" appropriate instruction which in this case is MOV8rr. This field enable this "pointing" mechanism - which is used in the TableGen backend for generating memory folding tables. Differential Revision: https://reviews.llvm.org/D32683 llvm-svn: 304087	2017-05-28 12:39:37 +00:00
Matthias Braun	88c8c9847d	AArch64/PEI: Do not add reserved regs to liveins We do not track liveness for reserved registers. It is unnecessary to add them to block livein lists. llvm-svn: 304059	2017-05-27 03:38:02 +00:00
Quentin Colombet	7a43eddf28	[AArch64][GlobalISel] Add the Localizer pass for the O0 pipeline This should fix most of the issue we have right now with constants being spilled all over the place. llvm-svn: 304052	2017-05-27 01:34:07 +00:00
Matthias Braun	b4f74224ff	AArch64: Fix cmpxchg O0 expansion - Rewrite livein calculation to use the computeLiveIns() helper function. This is slightly less efficient but easier to reason about and doesn't unnecessarily add pristine and reserved registers[1] - Zero the status register at the beginning of the loop to make sure it has a defined value. - Remove kill flags of values that need to stay alive throughout the loop. [1] An upcoming commit of mine will tighten the MachineVerifier to catch these. llvm-svn: 304048	2017-05-26 23:48:59 +00:00
Alexei Starovoitov	3c585d3a8f	[bpf] disallow global_addr+off folding Wrong assembly code is generated for a simple program with clang. If clang only produces IR and llc is used for IR lowering and optimization, correct assembly code is generated. The main reason is that clang feeds default Reloc::Static to llvm and llc feeds no RelocMode to llvm, where for llc case, BPF backend picks up Reloc::PIC_ mode. This leads different IR lowering behavior and clang permits global_addr+off folding while llc doesn't. This patch introduces isOffsetFoldingLegal function into BPF backend and the function always return false. This will make clang and llc behave the same for the lowering. Bug https://bugs.llvm.org//show_bug.cgi?id=33183 has more detailed explanation. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 304043	2017-05-26 22:32:41 +00:00
Davide Italiano	ef9bfe9531	[Mips] Placate GCC's -Wmisleading-indentation. NFCI. llvm-svn: 304041	2017-05-26 21:56:19 +00:00
Matthias Braun	ac4307c41e	LivePhysRegs: Rework constructor + documentation; NFC - Take reference instead of pointer to a TRI that cannot be nullptr. - Improve documentation comments. llvm-svn: 304038	2017-05-26 21:51:00 +00:00
Sumanth Gundapaneni	a6cf2fd5ec	[Hexagon] Cleanup of unused function isCalleeSaveReg (NFC) llvm-svn: 304034	2017-05-26 21:09:54 +00:00
Konstantin Zhuravlyov	b2ff8dfea0	Resubmit r303859 with test fixed. [AMDGPU] add intrinsic for s_getpc Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc. Patch by Tim Corringham llvm-svn: 304031	2017-05-26 20:38:26 +00:00
Benjamin Kramer	debb3c35e0	Make helper functions static. NFC. llvm-svn: 304029	2017-05-26 20:09:00 +00:00
Dmitry Preobrazhensky	6a2431df0b	[AMDGPU][MC][GFX9] Corrected encoding of flat_scratch* for SDWA opcodes See bug 33171: https://bugs.llvm.org/show_bug.cgi?id=33171 Reviewers: Sam Kolton Differential Revision: https://reviews.llvm.org/D33553 llvm-svn: 304015	2017-05-26 18:01:29 +00:00
Tom Stellard	dde28a8c92	AMDGPU/GlobalISel: Mark 32-bit float constants as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33212 llvm-svn: 304003	2017-05-26 16:40:03 +00:00
Sam Kolton	363f47a2c7	[AMDGPU] SDWA: add disassembler support for GFX9 Summary: Added decoder methods and tests Reviewers: vpykhtin, artem.tamazov, dp Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D33545 llvm-svn: 303999	2017-05-26 15:52:00 +00:00
John Brawn	9009d2905d	[ARM] Fix lowering of misaligned memcpy/memset Currently getOptimalMemOpType returns i32 for large enough sizes without checking for alignment, leading to poor code generation when misaligned accesses aren't permitted as we generate a word store then later split it up into byte stores. This means we inadvertantly go over the MaxStoresPerMemcpy limit and for memset we splat the memset value into a word then immediately split it up again. Fix this by leaving it up to FindOptimalMemOpLowering to figure out which type to use, but also fix a bug there where it wasn't correctly checking if misaligned memory accesses are allowed. Differential Revision: https://reviews.llvm.org/D33442 llvm-svn: 303990	2017-05-26 13:59:12 +00:00
Andrew V. Tischenko	fdb264e263	The fix for PR22004: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands." llvm-svn: 303985	2017-05-26 13:23:34 +00:00
Nirav Dave	6ff50bf242	Fix signedness of constant. NFC. llvm-svn: 303980	2017-05-26 12:53:10 +00:00
Tim Shen	a76f20c364	[PPC] Add text for assert. llvm-svn: 303940	2017-05-25 23:40:46 +00:00
Tim Shen	a2b85da879	[PPC] Fix atomics lowering in DAG lowering. I forgot to forward the chain, causing some missing instruction dependencies. The test crashes the compiler without this patch. Inspired by the test case, D33519 also tries to remove the extra sync. Differential Revision: https://reviews.llvm.org/D33573 llvm-svn: 303931	2017-05-25 22:58:35 +00:00
Kyle Butt	13379d7c99	PPC: Correct Size for GETtlsADDR PPC::GETtlsADDR is lowered to a branch and a nop, by the assembly printer. Its size was incorrectly marked as 4, correct it to 8. The incorrect size can cause incorrect branch relaxation in PPCBranchSelector under the right conditions. llvm-svn: 303904	2017-05-25 19:37:41 +00:00
Nico Weber	b3d83a092a	Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots. llvm-svn: 303902	2017-05-25 19:19:29 +00:00
Manoj Gupta	d536180fdc	[AArch64]: add 'a' inline asm operand modifier. Summary: This is used in the Linux kernel, and effectively just means "print an address". This brings back r193593. Reviewed by: Renato Golin Reviewers: t.p.northover, rengolin, richard.barton.arm, kristof.beyls Subscribers: aemerson, javed.absar, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D33558 llvm-svn: 303901	2017-05-25 19:07:57 +00:00
Tim Corringham	32d0d38679	[AMDGPU] add intrinsic for s_getpc Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32862 llvm-svn: 303859	2017-05-25 14:04:14 +00:00
Oren Ben Simhon	7bf27f03f2	[X86] Adding vpopcntd and vpopcntq instructions AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq). Differential Revision: https://reviews.llvm.org/D33169 llvm-svn: 303858	2017-05-25 13:45:23 +00:00
Tony Jiang	0a429f040e	[PowerPC] Fix a performance bug for PPC::XXSLDWI. There are some VectorShuffle Nodes in SDAG which can be selected to XXSLDWI instruction, this patch recognizes them and does the selection to improve the PPC performance. llvm-svn: 303822	2017-05-24 23:48:29 +00:00
Nirav Dave	bb20b5d5c3	[AArch64] Prevent nested ADDs from address calc in splitStoreSplat. NFC In preparation for late-stage store merging. llvm-svn: 303800	2017-05-24 19:55:49 +00:00
Zaara Syeda	932978315b	P9: D-form vector load/store. Differential Revision: https://reviews.llvm.org/D33248 llvm-svn: 303780	2017-05-24 17:50:37 +00:00
Matthew Simpson	6349380fa4	Revert r291254: [AArch64] Reduce vector insert/extract cost for Falkor The default vector insert/extract cost is more profitable on Falkor than the reduced cost. llvm-svn: 303771	2017-05-24 16:48:39 +00:00
Nirav Dave	d20066cbad	[AMDGPU] Prevent too large store merges in AMDGPU Subtargets. NFCI. Various address spaces on the SI and R600 subtargets have stricter limits on memory access size that other address spaces. Use canMergeStoresTo predicate to prevent the DAGCombiner from creating these stores as they will be split up during legalization. llvm-svn: 303767	2017-05-24 15:59:09 +00:00
Vadzim Dambrouski	b07351f4f8	[MSP430] Fix PR33050: Don't use ADD16ri to lower FrameIndex. Use ADDframe pseudo instruction instead. This will fix machine verifier error, and will help to fix PR32146. Differential Revision: https://reviews.llvm.org/D33452 llvm-svn: 303758	2017-05-24 15:08:30 +00:00
Marek Olsak	8973a0a22c	Revert "AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns" This reverts commit e065977c4b5f68ab845400b256f6a3822b1325fa. It doesn't work. S_LOAD_DWORD_IMM_ci and friends aren't selected by any of the patterns, so it was putting 32-bit literals into the 8-bit field. llvm-svn: 303754	2017-05-24 14:53:50 +00:00
Krzysztof Parzyszek	e3ec97b031	[Hexagon] Fix comment in HexagonPacketizer::runOnMachineFunction Patch by Wei-Ren Chen. Differential Revision: https://reviews.llvm.org/D33439 llvm-svn: 303745	2017-05-24 13:43:42 +00:00
Jonas Paulsson	8624b7e1ce	[LoopVectorizer] Let target prefer scalar addressing computations. The loop vectorizer usually vectorizes any instruction it can and then extracts the elements for a scalarized use. On SystemZ, all elements containing addresses must be extracted into address registers (GRs). Since this extraction is not free, it is better to have the address in a suitable register to begin with. By forcing address arithmetic instructions and loads of addresses to be scalar after vectorization, two benefits result: * No need to extract the register * LSR optimizations trigger (LSR isn't handling vector addresses currently) Benchmarking show improvements on SystemZ with this new behaviour. Any other target could try this by returning false in the new hook prefersVectorizedAddressing(). Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand https://reviews.llvm.org/D32422 llvm-svn: 303744	2017-05-24 13:42:56 +00:00
Jonas Paulsson	081b5a1e9d	[SystemZ] Fix register modelling in expandLoadStackGuard() EXPENSIVE_CHECKS found this bug (https://bugs.llvm.org/show_bug.cgi?id=33047), which this patch fixes. The EAR instruction defines a GR32, not a GR64. Review: Ulrich Weigand llvm-svn: 303743	2017-05-24 13:15:48 +00:00
Simon Pilgrim	9f46d1d479	Strip trailing whitespace. NFCI. llvm-svn: 303736	2017-05-24 11:02:27 +00:00
Florian Hahn	d211fe7c26	[ARM] Remove ThumbTargetMachines. (NFC) Summary: Thumb code generation is controlled by ARMSubtarget and the concrete ThumbLETargetMachine and ThumbBETargetMachine are not needed. Eric Christopher suggested removing the unneeded target machines in https://reviews.llvm.org/D33287. I think it still makes sense to keep separate TargetMachines for big and little endian as we probably do not want to have different endianess for difference functions in a single compilation unit. The MIPS backend has two separate TargetMachines for big and little endian as well. Reviewers: echristo, rengolin, kristof.beyls, t.p.northover Reviewed By: echristo Subscribers: aemerson, javed.absar, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D33318 llvm-svn: 303733	2017-05-24 10:18:57 +00:00
Javed Absar	a32e3a1acf	[ARM] Add VLDx/VSTx sched defs for machine-schedulers. NFCI This patch adds missing scheds for Neon VLDx/VSTx instructions. This will help one write schedulers easier/faster in the future for ARM sub-targets. Existing models will not affected by this patch. Reviewed by: Renato Golin, Diana Picus Differential Revision: https://reviews.llvm.org/D33120 llvm-svn: 303717	2017-05-24 05:32:48 +00:00
Vadzim Dambrouski	49dd6e68c2	[MSP430] Add subtarget features for hardware multiplier. Also add more processors to make -mcpu option behave similar to gcc. Differential Revision: https://reviews.llvm.org/D33335 llvm-svn: 303695	2017-05-23 21:49:42 +00:00
Simon Pilgrim	c910a70b21	[AMDGPU] Add INDIRECT_BASE_ADDR to R600_Reg32 class (PR33045) This fixes 17 of the 41 -verify-machineinstrs test failures identified in PR33045 Differential Revision: https://reviews.llvm.org/D33451 llvm-svn: 303691	2017-05-23 21:27:15 +00:00
Changpeng Fang	1dbace195d	AMDGPU/SI: Move the local memory usage related checking after calling convention checking in PromoteAlloca Summary: Promoting Alloca to Vector and Promoting Alloca to LDS are two independent handling of Alloca and should not affect each other. As a result, we should not give up promoting to vector if there is not enough LDS. This patch factors out the local memory usage related checking out and replace it after the calling convention checking. Reviewer: arsenm Differential Revision: http://reviews.llvm.org/D33139 llvm-svn: 303684	2017-05-23 20:25:41 +00:00
Geoff Berry	d6ac96f953	[AArch64][Falkor] Refine sched details for LSLfast/ASRfast. llvm-svn: 303682	2017-05-23 19:57:45 +00:00
Stanislav Mekhanoshin	53a21292f8	[AMDGPU] Combine and (srl) into shl (bfe) Perform DAG combine: and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb Where nb is a number of trailing zeroes in mask. It replaces two instructions with two and BFE is generally a more expensive one. However this is only done if we are selecting a byte or word at an aligned boundary which results in a proper SDWA operand pattern. It is only done if SDWA is supported. TODO: improve SDWA pass to actually convert this pattern. It is not done now because we have an immediate in the instruction, which has be moved into a VGPR. Differential Revision: https://reviews.llvm.org/D33455 llvm-svn: 303681	2017-05-23 19:54:48 +00:00
Geoff Berry	e6366f505f	[AArch64][Falkor] Fix sched details for FMOV of WZR/XZR. llvm-svn: 303680	2017-05-23 19:54:28 +00:00
Oleg Ranevskyy	09df0020fc	[ARM] Temporarily disable globals promotion to constant pools to prevent miscompilation Summary: A temporary workaround for PR32780 - rematerialized instructions accessing the same promoted global through different constant pool entries. The patch turns off the globals promotion optimization leaving all its code in place, so that it can be easily turned on once PR32780 is fixed. Since this is a miscompilation issue causing generation of misbehaving code, and the problem is very subtle, the patch might be valuable enough to get into 4.0.1. Reviewers: efriedma, jmolloy Reviewed By: efriedma Subscribers: aemerson, javed.absar, llvm-commits, rengolin, asl, tstellar Differential Revision: https://reviews.llvm.org/D33446 llvm-svn: 303679	2017-05-23 19:38:37 +00:00
Nirav Dave	6c910c0dd8	[DAG] Add AddressSpace parameter to canMergeStoresTo. NFC. llvm-svn: 303673	2017-05-23 18:53:02 +00:00
Marek Olsak	7dadd86a35	AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns This is just a cleanup. Also, it adds checking that ByteCount is aligned to 4. Reviewers: arsenm, nhaehnle, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28994 llvm-svn: 303658	2017-05-23 17:14:34 +00:00
Stanislav Mekhanoshin	a96ec3f360	[AMDGPU] Convert shl (add) into add (shl) shl (or\|add x, c2), c1 => or\|add (shl x, c1), (c2 << c1) This allows to fold a constant into an address in some cases as well as to eliminate second shift if the expression is used as an address and second shift is a result of a GEP. Differential Revision: https://reviews.llvm.org/D33432 llvm-svn: 303641	2017-05-23 15:59:58 +00:00
Simon Atanasyan	57253043a4	[mips] Remove unused class field. NFC llvm-svn: 303639	2017-05-23 15:00:30 +00:00
Simon Atanasyan	039b02ec78	[mips] Change type of MipsSubtarget ctor arguments s/std::string/StringRef/. NFC llvm-svn: 303638	2017-05-23 15:00:26 +00:00
Sam Kolton	f7659d71eb	[AMDGPU] SDWA: Add assembler support for GFX9 Summary: Added separate pseudo and real instruction for GFX9 SDWA instructions. Currently supports only in assembler. Depends D32493 Reviewers: vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D33132 llvm-svn: 303620	2017-05-23 10:08:55 +00:00
Florian Hahn	abb4218b98	[AArch64] Make instruction fusion more aggressive. Summary: This patch makes instruction fusion more aggressive by * adding artificial edges between the successors of FirstSU and SecondSU, similar to BaseMemOpClusterMutation::clusterNeighboringMemOps. * updating PostGenericScheduler::tryCandidate to keep clusters together, similar to GenericScheduler::tryCandidate. This change increases the number of AES instruction pairs generated on Cortex-A57 and Cortex-A72. This doesn't change code at all in most benchmarks or general code, but we've seen improvement on kernels using AESE/AESMC and AESD/AESIMC. Reviewers: evandro, kristof.beyls, t.p.northover, silviu.baranga, atrick, rengolin, MatzeB Reviewed By: evandro Subscribers: aemerson, rengolin, MatzeB, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33230 llvm-svn: 303618	2017-05-23 09:33:34 +00:00
Igor Breger	617be6e475	[GlobalISel][X86] G_LOAD/G_STORE vec256/512 support Summary: mark G_LOAD/G_STORE vec256/512 legal for AVX/AVX512. Implement instruction selection. Reviewers: zvi, guyblank Reviewed By: zvi Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33268 llvm-svn: 303617	2017-05-23 08:23:51 +00:00
Akira Hatanaka	e8ae3346a3	[AArch64] Fix PRR33100. This commit fixes a bug introduced in r301019 where optimizeLogicalImm would replace a logical node's immediate operand that was CSE'd and was also an operand of another node. This commit fixes the bug by replacing the logical node instead of its immediate operand. rdar://problem/32295276 llvm-svn: 303607	2017-05-23 06:08:37 +00:00
Krzysztof Parzyszek	9a23d40ee8	[Hexagon] Fix definitions of vector predicate loads and stores This fixes http://llvm.org/PR33048. llvm-svn: 303572	2017-05-22 20:02:53 +00:00
Stanislav Mekhanoshin	5fa289f0d8	[AMDGPU] Narrow lshl from 64 to 32 bit if possible Turn expensive 64 bit shift into 32 bit if shift does not overflow int: shl (ext x) => zext (shl x) Differential Revision: https://reviews.llvm.org/D33367 llvm-svn: 303569	2017-05-22 16:58:10 +00:00
Valery Pykhtin	74cb9c8831	[AMDGPU] Fix incorrect register usage tracking in GCNUpwardTracker Differential revision: https://reviews.llvm.org/D33289 llvm-svn: 303548	2017-05-22 13:09:40 +00:00
Simon Atanasyan	e0b726f2fa	[mips] Support micromips attribute passed by front-end This patch adds handling of the `micromips` and `nomicromips` attributes passed by front-end. The patch depends on D33363. Differential revision: https://reviews.llvm.org/D33364 llvm-svn: 303545	2017-05-22 12:47:41 +00:00
James Molloy	6110be9759	Re-apply r302416: [ARM] Clear the constant pool cache on explicit .ltorg directives Re-applying now that PR32825 which was raised on the commit this fixed up is now known to have also been fixed by this commit. Original commit message: Multiple ldr pseudoinstructions with the same constant value will reuse the same constant pool entry. However, if the constant pool is explicitly flushed with a .ltorg directive, we should not try to reference constants in the previous pool any longer, since they may be out of range. This fixes assembling hand-written assembler source which repeatedly loads the same constant value, across a binary size larger than the pc-relative fixup range for ldr instructions (4096 bytes). Such assembler source already uses explicit .ltorg instructions to emit constant pools with regular intervals. However if we try to reuse constants emitted in earlier pools, they end up out of range. This makes the output of the testcase match what binutils gas does (prior to this patch, it would fail to assemble). Differential Revision: https://reviews.llvm.org/D32847 llvm-svn: 303540	2017-05-22 09:42:07 +00:00
Strahinja Petrovic	ab9573f37c	[MIPS] Add support to match more patterns for DINS instruction This patch adds support for recognizing patterns to match DINS instruction. Differential Revision: https://reviews.llvm.org/D31465 llvm-svn: 303537	2017-05-22 09:06:44 +00:00
James Molloy	5cc75ae8f9	Revert "[ARM] Clear the constant pool cache on explicit .ltorg directives" This reverts commit r302416. This was a fixup for r286006, which has now been reverted so this doesn't apply (either in concept or in code). This commit itself has no problems, but the underlying issue it was fixing has now disappeared from the codebase. llvm-svn: 303536	2017-05-22 08:49:28 +00:00
Igor Breger	014fc566e7	[GlobalISel][X86] Fix G_TRUNC instruction selection. Updated tests with -verify-machineinstrs flag. It fixes 3 tests failed with machine verifier enabled and listed in PR27481 llvm-svn: 303502	2017-05-21 11:13:56 +00:00
Hiroshi Inoue	37e63b1b21	Summary PPC backend eliminates compare instructions by using record-form instructions in PPCInstrInfo::optimizeCompareInstr, which is called from peephole optimization pass. This patch improves this optimization to eliminate more compare instructions in two types of common case. - comparison against a constant 1 or -1 The record-form instructions set CR bit based on signed comparison against 0. So, the current implementation does not exploit the record-form instruction for comparison against a non-zero constant. This patch enables record-form optimization for constant of 1 or -1 if possible; it changes the condition "greater than -1" into "greater than or equal to 0" and "less than 1" into "less than or equal to 0". With this patch, compare can be eliminated in the following code sequence, as an example. uint64_t a, b; if ((a \| b) & 0x8000000000000000ull) { ... } else { ... } - andi for 32-bit comparison on PPC64 Since record-form instructions execute 64-bit signed comparison and so we have limitation in eliminating 32-bit comparison, i.e. with cmplwi, using the record-form. The original implementation already has such checks but andi. is not recognized as an instruction which executes implicit zero extension and hence safe to convert into record-form if used for equality check. %1 = and i32 %a, 10 %2 = icmp ne i32 %1, 0 br i1 %2, label %foo, label %bar In this simple example, LLVM generates andi. + cmplwi + beq on PPC64. This patch make it possible to eliminate the cmplwi for this case. I added andi. for optimization targets if it is safe to do so. Differential Revision: https://reviews.llvm.org/D30081 llvm-svn: 303500	2017-05-21 06:00:05 +00:00
Dmitry Preobrazhensky	ce941c9c38	[AMDGPU][MC] Corrected disassembler to decode instructions with 2 literals See bug 32922: https://bugs.llvm.org//show_bug.cgi?id=32922 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D32912 llvm-svn: 303428	2017-05-19 14:27:52 +00:00
Dmitry Preobrazhensky	9321e8fcec	[AMDGPU][MC] Fixed bugs in export instruction See Bugs 33019, 33056: https://bugs.llvm.org//show_bug.cgi?id=33019 https://bugs.llvm.org//show_bug.cgi?id=33056 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33288 llvm-svn: 303423	2017-05-19 13:36:09 +00:00
Guy Blank	548e22a1a7	[X86][AVX512] Make i1 illegal in the CodeGen This patch defines the i1 type as illegal in the X86 backend for AVX512. For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended. This should produce better scalar code for i1 types since GPRs will be used instead of mask registers. Differential Revision: https://reviews.llvm.org/D32273 llvm-svn: 303421	2017-05-19 12:35:15 +00:00
Daniel Sanders	a1b2db7919	[globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to per-function predicates. Summary: This causes them to be re-computed more often than necessary but resolves objections that were raised post-commit on r301750. Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls Reviewed By: qcolombet Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32861 llvm-svn: 303418	2017-05-19 11:08:33 +00:00
Hans Wennborg	b00ffd8cb7	Revert r302938 "Add LiveRangeShrink pass to shrink live range within BB." This also reverts follow-ups r303292 and r303298. It broke some Chromium tests under MSan, and apparently also internal tests at Google. llvm-svn: 303369	2017-05-18 18:50:05 +00:00
Reid Kleckner	96ab8726a3	[IR] De-virtualize ~Value to save a vptr Summary: Implements PR889 Removing the virtual table pointer from Value saves 1% of RSS when doing LTO of llc on Linux. The impact on time was positive, but too noisy to conclusively say that performance improved. Here is a link to the spreadsheet with the original data: https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing This change makes it invalid to directly delete a Value, User, or Instruction pointer. Instead, such code can be rewritten to a null check and a call Value::deleteValue(). Value objects tend to have their lifetimes managed through iplist, so for the most part, this isn't a big deal. However, there are some places where LLVM deletes values, and those places had to be migrated to deleteValue. I have also created llvm::unique_value, which has a custom deleter, so it can be used in place of std::unique_ptr<Value>. I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which derives from User outside of lib/IR. Code in IR cannot include MemorySSA headers or call the MemoryAccess object destructors without introducing a circular dependency, so we need some level of indirection. Unfortunately, no class derived from User may have any virtual methods, because adding a virtual method would break User::getHungOffOperands(), which assumes that it can find the use list immediately prior to the User object. I've added a static_assert to the appropriate OperandTraits templates to help people avoid this trap. Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv Reviewed By: chandlerc Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D31261 llvm-svn: 303362	2017-05-18 17:24:10 +00:00
Francis Visoiu Mistrih	8b61764cbb	[LegacyPassManager] Remove TargetMachine constructors This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency. The patterns replaced here are: * Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`. * Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`. * MachineFunctionPasses now use MF.getTarget(). * Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS. This fixes a crash when running `llc -start-before prologepilog`. PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults. Related to PR30324. Differential Revision: https://reviews.llvm.org/D33222 llvm-svn: 303360	2017-05-18 17:21:13 +00:00
Sam Kolton	ebfdaf7394	[AMDGPU] SDWA operands should not intersect with potential MIs Summary: There should be no intesection between SDWA operands and potential MIs. E.g.: ``` v_and_b32 v0, 0xff, v1 -> src:v1 sel:BYTE_0 v_and_b32 v2, 0xff, v0 -> src:v0 sel:BYTE_0 v_add_u32 v3, v4, v2 ``` In that example it is possible that we would fold 2nd instruction into 3rd (v_add_u32_sdwa) and then try to fold 1st instruction into 2nd (that was already destroyed). So if SDWAOperand is also a potential MI then do not apply it. Reviewers: vpykhtin, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32804 llvm-svn: 303347	2017-05-18 12:12:03 +00:00
Igor Breger	842b5b36ba	[GlobalISel][X86] G_ADD/G_SUB vector legalizer/selector support. Summary: G_ADD/G_SUB vector legalizer/selector support. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33232 llvm-svn: 303345	2017-05-18 11:10:56 +00:00
Simon Pilgrim	6bba6068be	[X86][AVX512] Add 512-bit vector ctpop costs + tests llvm-svn: 303342	2017-05-18 10:42:34 +00:00
Lama Saba	2ea271b54a	[X86] Replace slow LEA instructions in X86 According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303333	2017-05-18 08:11:50 +00:00
Davide Italiano	9ae69a75ec	[Target/X86] Remove unneeded return. NFCI. llvm-svn: 303323	2017-05-18 02:36:42 +00:00
Matt Arsenault	2b1f9aa577	AMDGPU: Start defining a calling convention Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308	2017-05-17 21:56:25 +00:00
Kyle Butt	f6c61ef64d	CodeGen: Power: Add lowering for shifts of v1i128. When legalizing vector operations on vNi128, they will be split to v1i128 because that is a legal type on ppc64, but then the compiler will crash in selection dag because it fails to select for these operations. This patch fixes shift operations. Logical shift right and left shift can be performed in the vector unit, but algebraic shift right requires being split. Differential Revision: https://reviews.llvm.org/D32774 llvm-svn: 303307	2017-05-17 21:54:41 +00:00
Michael Liao	ab12984634	Fix PR33028 - '-verify-mahcineinstrs' starts to complain allocatable live-in physical registers on non-entry or non-landing-pad basic blocks. - Refactor the XBEGIN translation to define EAX on a dedicated fallback code path due to XABORT. Add a pseudo instruction to define EAX explicitly to avoid add physical register live-in. Differential Revision: https://reviews.llvm.org/D33168 llvm-svn: 303306	2017-05-17 21:48:00 +00:00
Matt Arsenault	2525e4e4c2	AMDGPU: Expand frame indexes to be relative to scratch wave offset In order for an arbitrary callee to access an object in a caller's stack frame, the 32-bit offset used as the private pointer needs to be relative to the kernel's scratch wave offset register. Convert to this by finding the difference from the current stack frame and scaling by the wavefront size. llvm-svn: 303303	2017-05-17 21:23:14 +00:00
Matt Arsenault	156d3ae0b6	AMDGPU: Change mubuf soffset register when SP relative Check the MachinePointerInfo for whether the access is supposed to be relative to the stack pointer. No tests because this is used in later commits implementing calls. llvm-svn: 303301	2017-05-17 21:02:58 +00:00
Simon Pilgrim	23ef26728a	[X86][AVX512] Add 512-bit vector ctlz costs + tests llvm-svn: 303300	2017-05-17 21:02:18 +00:00
Matt Arsenault	98f2946ab3	AMDGPU: Make better use of op_sel with high components Handle more general swizzles. llvm-svn: 303296	2017-05-17 20:30:58 +00:00
Simon Pilgrim	d0365967c4	[X86][AVX512] Add 512-bit vector cttz costs + tests llvm-svn: 303293	2017-05-17 20:22:54 +00:00
Dehao Chen	02828a93e8	Only enable LiveRangeShrink for x86. Summary: Moving LiveRangeShrink to x86 as this pass is mostly useful for archtectures with great register pressure. Reviewers: MatzeB, qcolombet Reviewed By: qcolombet Subscribers: jholewinski, jyknight, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33294 llvm-svn: 303292	2017-05-17 20:18:13 +00:00
Matt Arsenault	786eeea23e	AMDGPU: Try to use op_sel when selecting packed instructions Avoids instructions to pack a vector when the source is really a scalar being broadcast. Also be smarter and look for per-component fneg. Doesn't yet handle scalar from upper half of register or other swizzles. llvm-svn: 303291	2017-05-17 20:00:00 +00:00
Jacob Gravelle	c63fb00f13	[WebAssembly][NFC] Update expected testsuite failures for newly passing tests Summary: r303050 fixes crashes when calling scalarizeMaskedMemIntrin pass from WebAssembly backend. This updates expected test failures for that. Reviewers: sbc100 Subscribers: jfb, llvm-commits, dschuff Differential Revision: https://reviews.llvm.org/D33295 llvm-svn: 303288	2017-05-17 19:45:22 +00:00
Matt Arsenault	ea8a4ed588	AMDGPU: Use appropriate soffset for spilling This needs to be the frame offset register, and not the global scratch wave offset register. For kernels, these are the same. llvm-svn: 303287	2017-05-17 19:37:57 +00:00
Matt Arsenault	ee324ffc1f	AMDGPU: Fix min3/max3 combines for f16/i16 Fix missing instruction definitions for min3/max3. llvm-svn: 303284	2017-05-17 19:25:06 +00:00
Simon Pilgrim	a9a92a1a6a	[X86][AVX512] Add 512-bit vector bitreverse costs + tests llvm-svn: 303283	2017-05-17 19:20:20 +00:00
Krzysztof Parzyszek	2b0533126e	[PPC] Properly update register save area offsets The variables MinGPR/MinG8R were not updated properly when resetting the offsets, which in the included testcase lead to saving the CR register in the same location as R30. This fixes another issue reported in PR26519. Differential Revision: https://reviews.llvm.org/D33017 llvm-svn: 303257	2017-05-17 13:25:09 +00:00
Igor Breger	28f290fab8	[GlobalISel][X86] Support add i64 in IA32. Summary: support G_UADDE instruction selection. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D33096 llvm-svn: 303255	2017-05-17 12:48:08 +00:00
Jonas Paulsson	8722ade770	[SystemZ] Modelling of costs of divisions with a constant power of 2. Such divisions will eventually be implemented with shifts which should be reflected in the cost function. Review: Ulrich Weigand llvm-svn: 303254	2017-05-17 12:46:26 +00:00
Diana Picus	eafa4aa910	Reland r303247: [ARM] GlobalISel: Remove dead instruction selection code It only failed on llvm-clang-x86_64-expensive-checks-win, probably because the TableGen stuff hasn't been regenerated. Requires a clean build. llvm-svn: 303252	2017-05-17 12:42:52 +00:00
Diana Picus	36e4ba0f6e	Revert "[ARM] GlobalISel: Remove dead instruction selection code" This reverts commit r303247 because the tests are failing on some bots. Sorry! llvm-svn: 303249	2017-05-17 11:56:07 +00:00
Diana Picus	68d21c864e	[ARM] GlobalISel: Remove dead instruction selection code We can now generate code for selecting G_ADD, G_SUB and G_MUL. Remove the hand-written versions. llvm-svn: 303247	2017-05-17 11:39:26 +00:00
Daniel Cederman	4af795b499	[Sparc] Remove execute permissions from non-executable text files Reviewers: jyknight, lero_chris, venkatra Reviewed By: jyknight Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27127 llvm-svn: 303245	2017-05-17 11:05:20 +00:00
Francis Visoiu Mistrih	b52e036600	BitVector: add iterators for set bits Differential revision: https://reviews.llvm.org/D32060 llvm-svn: 303227	2017-05-17 01:07:53 +00:00
Amara Emerson	c9916d7e97	Re-commit r302678, fixing PR33053. The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions which didn't have a lowering. llvm-svn: 303211	2017-05-16 21:29:22 +00:00
Tim Shen	3bef27cc6f	[PPC] Lower load acquire/seq_cst trailing fence to cmp + bne + isync. Summary: This fixes pr32392. The lowering pipeline is: llvm.ppc.cfence in IR -> PPC::CFENCE8 in isel -> Actual instructions in expandPostRAPseudo. The reason why expandPostRAPseudo is chosen is because previous passes are likely eliminating instructions like cmpw 3, 3 (early CSE) and bne- 7, .+4 (some branch pass(s)). Differential Revision: https://reviews.llvm.org/D32763 llvm-svn: 303205	2017-05-16 20:18:06 +00:00
Reid Kleckner	0ad69fc89f	Revert "[X86] Replace slow LEA instructions in X86" This reverts commit r303183, it broke various buildbots and introduced sanitizer errors. llvm-svn: 303199	2017-05-16 19:55:03 +00:00
Renato Golin	d69570e017	Revert "[ARM] Mark LEApcrel instructions as isAsCheapAsAMove" Revert "[ARM] Mark LEApcrel as not having side effects" This reverts commit r303054 and r303053, as they broke the ARM self-hosting buildbots: http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/1550 http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost-neon/builds/1349 http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/1845 Offline investigation on course. llvm-svn: 303193	2017-05-16 17:59:07 +00:00
Stanislav Mekhanoshin	acca0f5c02	[AMDGPU] Use GCNRPTracker dumper methods in scheduler Differential Revision: https://reviews.llvm.org/D33244 llvm-svn: 303186	2017-05-16 16:31:45 +00:00
Stanislav Mekhanoshin	b10860788f	[AMDGPU] Cache live-ins and register pressure in scheduler Using LIS can be quite expensive, so caching of calculated region live-ins and pressure is implemented. It does two things: 1. Caches the info for the second stage when we schedule with decreased target occupancy. 2. Tracks the basic block from top to bottom thus eliminating the need to scan whole register file liveness at every region split in the middle of the block. The scheduling is now done in 3 stages instead of two, with the first one being really a no-op and only used to collect scheduling regions as sent by the scheduler driver. There is no functional change to the current behavior, only compilation speed is affected. In general computeBlockPressure() could be simplified if we switch to backward RP tracker, because scheduler sends regions within a block starting from the last upward. We could use a natural order of upward tracker to seamlessly change between regions of the same block, since live reg set of a previous tracked region would become a live-out of the next region. That however requires fixing upward tracker to properly account defs and uses of the same instruction as both are contributing to the current pressure. When we converge on the produced pressure we should be able to switch between them back and forth. In addition, backward tracker is less expensive as it uses LIS in recede less often than forward uses it in advance. At the moment the worst known case compilation time has improved from 26 minutes to 8.5. Differential Revision: https://reviews.llvm.org/D33117 llvm-svn: 303184	2017-05-16 16:11:26 +00:00
Lama Saba	52e892577d	[X86] Replace slow LEA instructions in X86 According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303183	2017-05-16 16:01:36 +00:00
Stanislav Mekhanoshin	464cecf81e	[AMDGPU] Turn register pressure estimation into forward tracker This factors register pressure estimation mechanism from the GCNSchedStrategy into the forward tracker to unify interface with other strategies and expose it to other interested phases. Differential Revision: https://reviews.llvm.org/D33105 llvm-svn: 303179	2017-05-16 15:43:52 +00:00
Chad Rosier	8b12a03215	Fix an improperly placed curly bracket. NFC. llvm-svn: 303165	2017-05-16 12:43:23 +00:00
NAKAMURA Takumi	994a43d27a	AMDGPUCodeGen: Fix warnings in r303111. [-Wunused-variable] llvm-svn: 303137	2017-05-16 04:01:23 +00:00
Peter Collingbourne	6f0ecca3b5	IR: Give function GlobalValue::getRealLinkageName() a less misleading name: dropLLVMManglingEscape(). This function gives the wrong answer on some non-ELF platforms in some cases. The function that does the right thing lives in Mangler.h. To try to discourage people from using this function, give it a different name. Differential Revision: https://reviews.llvm.org/D33162 llvm-svn: 303134	2017-05-16 00:39:01 +00:00
Davide Italiano	60d36c7506	[AMDGPU] Kill now unused phiInfoElementGetDebugLoc(). NFCI. llvm-svn: 303122	2017-05-15 22:10:15 +00:00
Tim Northover	203c6f055d	AArch64: use linker-private symbols for globals in MachO. We don't use section-relative relocations on AArch64, so all symbols must be at least visible to the linker (i.e. properly global or l_whatever, but not L_whatever). llvm-svn: 303118	2017-05-15 21:51:38 +00:00
Adam Nemet	e29686e5c1	[SLP] Enable 64-bit wide vectorization on AArch64 ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. * Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116	2017-05-15 21:15:01 +00:00
Hans Wennborg	bd6e9e77a7	Revert r302678 "[AArch64] Enable use of reduction intrinsics." This caused PR33053. Original commit message: > The new experimental reduction intrinsics can now be used, so I'm enabling this > for AArch64. We will need this for SVE anyway, so it makes sense to do this for > NEON reductions as well. > > The existing code to match shufflevector patterns are replaced with a direct > lowering of the reductions to AArch64-specific nodes. Tests updated with the > new, simpler, representation. > > Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 303115	2017-05-15 20:59:32 +00:00
Jan Sjodin	a06bfe054e	Re-submit AMDGPUMachineCFGStructurizer. Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303111	2017-05-15 20:18:37 +00:00
Tim Northover	8b96c7e9b5	AArch64: diagnose unrecognized features in .cpu directive. We were silently ignoring any features we couldn't match up, which led to errors in an inline asm block missing the conventional "\n\t". llvm-svn: 303108	2017-05-15 19:42:15 +00:00
Geoff Berry	e369653bf3	[AArch64][Falkor] Fix sched details for FMOV llvm-svn: 303099	2017-05-15 18:50:22 +00:00
Jan Sjodin	0e289822fa	Revert 303091. llvm-svn: 303098	2017-05-15 18:39:47 +00:00
Jan Sjodin	e9d2ddc9dd	Add AMDGPUMachineCFGStructurizer. Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303091	2017-05-15 18:13:56 +00:00
Simon Pilgrim	55ff57861a	[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146) Follow up to D33147 NVPTXTargetLowering::LowerCall was trusting the default argument values. Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146. Differential Revision: https://reviews.llvm.org/D33189 llvm-svn: 303082	2017-05-15 17:17:44 +00:00
Florian Hahn	af91e7e6d2	[AArch64] Enable FeatureFuseAES on Cortex-A72. This patch enables fusing dependent AESE/AESMC and AESD/AESIMC instruction pairs on Cortex-A72, as recommended in the Software Optimization Guide, section 4.10. llvm-svn: 303073	2017-05-15 15:15:22 +00:00
Dmitry Preobrazhensky	167f8b69e3	[AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64 See bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33123 llvm-svn: 303070	2017-05-15 14:28:23 +00:00
Dmitry Preobrazhensky	03852a9dca	[AMDGPU][MC] Removed V_MQSAD_U16_U8 This instruction does not really exist See Bug 33018: https://bugs.llvm.org//show_bug.cgi?id=33018 Reviewers: vpykhtin, artem.tamazov Differential Revision: https://reviews.llvm.org/D33126 llvm-svn: 303055	2017-05-15 12:37:03 +00:00
John Brawn	9486becf09	[ARM] Mark LEApcrel instructions as isAsCheapAsAMove Doing this means that if an LEApcrel is used in two places we will rematerialize instead of generating two MOVs. This is particularly useful for printfs using the same format string, where we want to generate an address into a register that's going to get corrupted by the call. Differential Revision: https://reviews.llvm.org/D32858 llvm-svn: 303054	2017-05-15 11:57:54 +00:00
John Brawn	43132c46a6	[ARM] Mark LEApcrel as not having side effects Doing this lets us hoist it out of loops, and I've also marked it as rematerializable the same as the thumb1 and thumb2 counterparts. It looks like it being marked as such was just a mistake, as the commit that made that change only mentions LEApcrelJT and in thumb1 and thumb2 only the LEApcrelJT instructions were marked as having side-effects, so it looks like the intent was to only mark LEApcrelJT as having side-effects but LEApcrel was accidentally marked as such also. Differential Revision: https://reviews.llvm.org/D32857 llvm-svn: 303053	2017-05-15 11:50:21 +00:00
Simon Pilgrim	f8389656e3	[NVPTX] Don't rely on default arguments to SelectionDAG::getMemIntrinsicNode. NFC. NFC followup to D33147, this explicitly sets all the arguments (instead of relying on the defaults) to SelectionDAG::getMemIntrinsicNode to help identify -verify-machineinstrs issues. llvm-svn: 303047	2017-05-15 10:47:48 +00:00
Zvi Rackover	e6b278bc65	[X86] Utilize SelectionDAG::getSelect(). NFC. Replace SelectionDAG::getNode(ISD::SELECT, ...) and SelectionDAG::getNode(ISD::VSELECT, ...) with SelectionDAG::getSelect(...) Saves a few lines of code and in some cases saves the need to explicitly check the type of the desired node. llvm-svn: 303024	2017-05-14 21:30:38 +00:00
Simon Pilgrim	d0ef9d8e93	[X86][AVX1] Account for cost of extract/insert of 256-bit shifts llvm-svn: 303023	2017-05-14 20:52:11 +00:00
Simon Pilgrim	f96b4ab92d	[X86][AVX2] Fix costs for v4i64 ashr by splat llvm-svn: 303022	2017-05-14 20:25:42 +00:00
Simon Pilgrim	de4467b182	[X86][AVX1] Account for cost of extract/insert of 256-bit shifts by splat llvm-svn: 303021	2017-05-14 20:02:34 +00:00
Craig Topper	ceea1a76a1	[X86] Remove unused value from IntrinsicType enum. NFC llvm-svn: 303018	2017-05-14 19:38:06 +00:00
Simon Pilgrim	d3f0d03cc5	[X86][AVX1] Account for cost of extract/insert of 256-bit SDIV/UDIV by mul sequences llvm-svn: 303017	2017-05-14 18:52:15 +00:00
Simon Pilgrim	5bef9c627e	[X86][XOP] XOP's general v16i8 shifts will be used instead of v8i16 shift + mask. Tweak cost model to match what lowering actually does. llvm-svn: 303013	2017-05-14 17:59:46 +00:00
Simon Pilgrim	aa8dffb69b	[X86][SSE] Account for cost of extract/insert of v32i8 vector shifts llvm-svn: 303012	2017-05-14 17:36:07 +00:00
Simon Pilgrim	4599eaa09a	[X86][XOP] Account for cost of extract/insert of 256-bit vector shifts llvm-svn: 303010	2017-05-14 13:38:53 +00:00
Simon Pilgrim	f3ee9c6997	[X86][AVX] Allow 32-bit targets to peek through subvectors to extract constant splats for vXi64 shifts. llvm-svn: 303009	2017-05-14 11:46:26 +00:00
Simon Pilgrim	ef46c2762a	[x86, SSE] AVX1 PR28129 (256-bit all-ones rematerialization) Further perf tests on Jaguar indicate that: vxorps %ymm0, %ymm0, %ymm0 vcmpps $15, %ymm0, %ymm0, %ymm0 is consistently faster (by about 9%) than: vpcmpeqd %xmm0, %xmm0, %xmm0 vinsertf128 $1, %xmm0, %ymm0, %ymm0 Testing equivalent code on a SandyBridge (E5-2640) puts it slightly (~3%) faster as well. Committed on behalf of @dtemirbulatov Differential Revision: https://reviews.llvm.org/D32416 llvm-svn: 302989	2017-05-13 13:42:35 +00:00
Dylan McKay	0c4debc123	[AVR] When lowering Select8/Select16, put newly generated MBBs in the same spot Contributed by Dr. Gergő Érdi. Fixes a bug. Raised from (https://github.com/avr-rust/rust/issues/49). llvm-svn: 302973	2017-05-13 00:22:34 +00:00
Dylan McKay	0c707da6ac	[AVR] Remove an unused variable llvm-svn: 302970	2017-05-13 00:00:26 +00:00
Changpeng Fang	161e8c39af	AMDGPU/SI: Don't promote to vector if the load/store is volatile. Summary: We should not change volatile loads/stores in promoting alloca to vector. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D33107 llvm-svn: 302943	2017-05-12 20:31:12 +00:00
Simon Pilgrim	a1978aaefd	[NVPTX] Don't flag StoreRetVal memory chain operands as ReadMem (PR32146) This fixes 47 of the 75 NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146. Differential Revision: https://reviews.llvm.org/D33147 llvm-svn: 302942	2017-05-12 19:56:43 +00:00
Tim Shen	10c64e6aea	[PPC] Move the combine "a << (b % (sizeof(a) * 8)) -> (PPCshl a, b)" to the backend. NFC. Summary: Eli pointed out that it's unsafe to combine the shifts to ISD::SHL etc., because those are not defined for b > sizeof(a) * 8, even after some of the combiners run. However, PPCISD::SHL defines that behavior (as the instructions themselves). Move the combination to the backend. The tests in shift_mask.ll still pass. Reviewers: echristo, hfinkel, efriedma, iteratee Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D33076 llvm-svn: 302937	2017-05-12 19:25:37 +00:00
Geoff Berry	ddbbf6416c	[AArch64][Falkor] Refine modeling of multiply accumulate forwarding. llvm-svn: 302933	2017-05-12 18:57:10 +00:00
Simon Pilgrim	b146e61828	Strip trailing whitespace. NFCI. llvm-svn: 302927	2017-05-12 17:42:36 +00:00
Craig Topper	8df66c602a	[KnownBits] Add bit counting methods to KnownBits struct and use them where possible This patch adds min/max population count, leading/trailing zero/one bit counting methods. The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give. Differential Revision: https://reviews.llvm.org/D32931 llvm-svn: 302925	2017-05-12 17:20:30 +00:00
Tom Stellard	a0d67c748a	AMDGPU/GlobalISel: Mark 32-bit integer constants as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33115 llvm-svn: 302919	2017-05-12 16:46:46 +00:00
James Y Knight	d4e1b00e7c	[SPARC] Support 'f' and 'e' inline asm constraints. Based on patch by Patrick Boettcher and Chris Dewhurst. Differential Revision: https://reviews.llvm.org/D29116 llvm-svn: 302911	2017-05-12 15:59:10 +00:00
Simon Pilgrim	7f03231cc6	Use SDValue::getOperand() helper. NFCI. llvm-svn: 302894	2017-05-12 13:08:45 +00:00
Leslie Zhai	a1149e01d2	[AVR] Migrate to new StructType::get owing to Supress all uses of LLVM_END_WITH_NULL Reviewers: dylanmckay, jroelofs, RKSimon, serge-sans-paille Reviewed By: serge-sans-paille Differential Revision: https://reviews.llvm.org/D33119 llvm-svn: 302885	2017-05-12 09:08:03 +00:00
Reid Kleckner	43bbeb4c9f	Issue diagnostics when returning FP values on x86_64 without SSE1/2 Avoid using report_fatal_error, because it will ask the user to file a bug. If the user attempts to disable SSE on x86_64 and them use floating point, that's a bug in their code, not a bug in the compiler. This is just a start. There are other ways to crash the backend in this configuration, but they should be updated to follow this pattern. Differential Revision: https://reviews.llvm.org/D27522 llvm-svn: 302835	2017-05-11 22:43:02 +00:00
Guozhi Wei	22e7da9597	[PPC] Change the register constraint of the first source operand of instruction mtvsrdd to g8rc_nox0 According to Power ISA V3.0 document, the first source operand of mtvsrdd is constant 0 if r0 is specified. So the corresponding register constraint should be g8rc_nox0. This bug caused wrong output generated by 401.bzip2 when -mcpu=power9 and fdo are specified. Differential Revision: https://reviews.llvm.org/D32880 llvm-svn: 302834	2017-05-11 22:17:35 +00:00
Chad Rosier	aeffffdb44	[AArch64][MachineCombine] Fold FNMUL+FSUB -> FNMADD. Differential Revision: http://reviews.llvm.org/D33101. llvm-svn: 302822	2017-05-11 20:07:24 +00:00
Davide Italiano	0dcc015a81	[AMDGPU] Placate unused variable warning in release builds. llvm-svn: 302821	2017-05-11 19:58:52 +00:00
Vadzim Dambrouski	38e30197c3	[MSP430] Generate EABI-compliant libcalls Updates the MSP430 target to generate EABI-compatible libcall names. As a byproduct, adjusts the hardware multiplier options available in the MSP430 target, adds support for promotion of the ISD::MUL operation for 8-bit integers, and correctly marks R11 as used by call instructions. Patch by Andrew Wygle. Differential Revision: https://reviews.llvm.org/D32676 llvm-svn: 302820	2017-05-11 19:56:14 +00:00
Matt Arsenault	47ccafe787	AMDGPU: Remove tfe bit from flat instruction definitions We don't use it and it was removed in gfx9, and the encoding bit repurposed. Additionally actually using it requires changing the output register class, which wasn't done anyway. llvm-svn: 302814	2017-05-11 17:38:33 +00:00
Matt Arsenault	bf5482e4bb	AMDGPU: Pull fneg out of extract_vector_elt This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. llvm-svn: 302813	2017-05-11 17:26:25 +00:00
Stanislav Mekhanoshin	33a97ec4ed	[AMDGPU] Fix incorrect register pressure calculation Earlier fix D32572 introduced a bug where live-ins were calculated for basic block instead of scheduling region. This change fixes it. Differential Revision: https://reviews.llvm.org/D33086 llvm-svn: 302812	2017-05-11 17:16:55 +00:00
Nemanja Ivanovic	96c3d626a2	[PowerPC] Eliminate integer compare instructions - vol. 1 This patch is the first in a series of patches to provide code gen for doing compares in GPRs when the compare result is required in a GPR. It adds the infrastructure to select GPR sequences for i1->i32 and i1->i64 extensions. This first patch handles equality comparison on i32 operands with the result sign or zero extended. Differential Revision: https://reviews.llvm.org/D31847 llvm-svn: 302810	2017-05-11 16:54:23 +00:00
Igor Breger	a44fc83d9f	[GlobalISel][X86] Remove hand-written G_FADD/F_SUB selection. Now it handle by TableGen. llvm-svn: 302793	2017-05-11 12:15:03 +00:00
Chandler Carruth	97500a9918	[x86] Fix a failure to select with AVX-512 when the type legalizer manages to form a VSELECT with a non-i1 element type condition. Those are technically allowed in SDAG (at least, the generic type legalization logic will form them and I wouldn't want to try to audit everything te preclude forming them) so we need to be able to lower them. This isn't too hard to implement. We mark VSELECT as custom so we get a chance in C++, add a fast path for i1 conditions to get directly handled by the patterns, and a fallback when we need to manually force the condition to be an i1 that uses the vptestm instruction to turn a non-mask into a mask. This, unsurprisingly, generates awful code. But it at least doesn't crash. This was actually impacting open source packages built with LLVM for AVX-512 in the wild, so quickly landing a patch that at least stops the immediate bleeding. I think I've found where to fix the codegen quality issue, but less confident of that change so separating it out from the thing that doesn't change the result of any existing test case but causes mine to not crash. llvm-svn: 302785	2017-05-11 10:52:16 +00:00
Simon Pilgrim	a4a13a0da0	Strip trailing whitespace. NFCI. llvm-svn: 302784	2017-05-11 10:03:05 +00:00
Diana Picus	9cfbc6d94f	[ARM][GlobalISel] Legalize narrow scalar ops by widening This is the same as r292827 for AArch64: we widen 8- and 16-bit ADD, SUB and MUL to 32 bits since we only have TableGen patterns for 32 bits. See the commit message for r292827 for more details. At this point we could just remove some of the tests for regbankselect and instruction-select, since we're not going to see any narrow operations at those levels anymore. Instead I decided to update them with G_ANYEXT/G_TRUNC operations, so we can validate the full sequences generated by the legalizer. llvm-svn: 302782	2017-05-11 09:45:57 +00:00
Serge Guelton	f4dc59ba8e	Remove spurious cast of nullptr. NFC. Conversion rules allow automatic casting of nullptr to any pointer type. llvm-svn: 302780	2017-05-11 08:53:00 +00:00
Serge Guelton	1b421c259f	Remove now useless trailing nullptr in StructType::get llvm-svn: 302779	2017-05-11 08:46:02 +00:00
Diana Picus	657bfd3302	[ARM][GlobalISel] Support for G_ANYEXT G_ANYEXT can be introduced by the legalizer when widening scalars. Add support for it in the register bank info (same mapping as everything else) and in the instruction selector. When selecting it, we treat it as a COPY, just like G_TRUNC. On this occasion we get rid of some assertions in selectCopy so we can reuse it. This shouldn't be a problem at the moment since we're not supporting any complicated cases (e.g. FPR, different register banks). We might want to separate the paths when we do. llvm-svn: 302778	2017-05-11 08:28:31 +00:00
Igor Breger	c7b5977bb1	[GlobalISel][X86] G_ICMP support. Summary: support G_ICMP for scalar types i8/i16/i64. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits, krytarowski Differential Revision: https://reviews.llvm.org/D32995 llvm-svn: 302774	2017-05-11 07:17:40 +00:00
Igor Breger	db75455990	[X86] Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp. NFC Summary: Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp so it can be used by GloabalIsel instruction selector. This is a pre-commit for a patch I'm working on to support G_ICMP. NFC. Reviewers: zvi, guyblank, delena Reviewed By: guyblank, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33038 llvm-svn: 302767	2017-05-11 06:36:37 +00:00
Matt Arsenault	3c5e4237c6	AMDGPU: Make some packed shuffles free VOP3P instructions can encode access to either half of the register. llvm-svn: 302730	2017-05-10 21:29:33 +00:00
Matt Arsenault	acdc7659cc	AMDGPU: Add new subtarget features for gfx9 flat instructions Flat instructions gain an immediate offset, and 2 new sets of segment specific flat instructions are added. llvm-svn: 302729	2017-05-10 21:19:05 +00:00
Quentin Colombet	307e29124c	[AArch64][RegisterBankInfo] Change the default mapping of fp stores. For stores, check if the stored value is defined by a floating point instruction and if yes, we return a default mapping with FPR instead of GPR. llvm-svn: 302679	2017-05-10 15:19:41 +00:00
Amara Emerson	816542ceb3	[AArch64] Enable use of reduction intrinsics. The new experimental reduction intrinsics can now be used, so I'm enabling this for AArch64. We will need this for SVE anyway, so it makes sense to do this for NEON reductions as well. The existing code to match shufflevector patterns are replaced with a direct lowering of the reductions to AArch64-specific nodes. Tests updated with the new, simpler, representation. Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 302678	2017-05-10 15:15:38 +00:00
Ulrich Weigand	93b369ed11	[SystemZ] Add miscellaneous instructions This adds a few missing instructions for the assembler and disassembler. Those should be the last missing general- purpose (Chapter 7) instructions for the z10 ISA. llvm-svn: 302667	2017-05-10 14:20:15 +00:00
Ulrich Weigand	d3604dc72c	[SystemZ] Add missing arithmetic instructions This adds the remaining general arithmetic instructions for assembler / disassembler use. Most of these are not useful for codegen; a few might be, and those are listed in the README.txt for future improvements. llvm-svn: 302665	2017-05-10 14:18:47 +00:00
Jonas Paulsson	11d251c05c	[SystemZ] Implement getRepRegClassFor() This method must return a valid register class, or the list-ilp isel scheduler will crash. For MVT::Untyped nullptr was previously returned, but now ADDR128BitRegClass is returned instead. This is needed just as long as list-ilp (and probably also list-hybrid) is still there. Review: Ulrich Weigand, A Trick https://reviews.llvm.org/D32802 llvm-svn: 302649	2017-05-10 13:03:25 +00:00
Dmitry Preobrazhensky	da61a7f9ef	[AMDGPU][MC] Corrected v_madak/madmk to avoid printing "_e32" in disassembler output See bug 32927: https://bugs.llvm.org//show_bug.cgi?id=32927 Reviewers: vpykhtin, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D32913 llvm-svn: 302648	2017-05-10 13:00:28 +00:00
Ulrich Weigand	c7eb5a95b2	[SystemZ] Add decimal integer instructions This adds the set of decimal integer (BCD) instructions for assembler / disassembler use. llvm-svn: 302646	2017-05-10 12:42:45 +00:00
Ulrich Weigand	33a441adf9	[SystemZ] Add crypto instructions This adds the set of message-security assist instructions for assembler / disassembler use. llvm-svn: 302645	2017-05-10 12:42:00 +00:00
Ulrich Weigand	435cd1a3e4	[SystemZ] Add translate/convert instructions This adds the set of character-set translate and convert instructions for assembler / disassembler use. llvm-svn: 302644	2017-05-10 12:41:12 +00:00
Ulrich Weigand	eb17909536	[SystemZ] Add missing memory/string instructions This adds a number of missing memory and string instructions for assembler / disassembler use. llvm-svn: 302643	2017-05-10 12:40:15 +00:00
Martin Storsjo	605b0466ea	[AArch64] Fix a comment to match the code. NFC. For the ELF case, the default/preferred form is the generic one, not the short one as used for Apple - fix the comment to say so. Currently it is a copy-paste typo. Make the comments on the darwin default a bit more verbose. Use enum names instead of literal 0/1 to further increase readability and reduce fragility. Differential Revision: https://reviews.llvm.org/D32963 llvm-svn: 302634	2017-05-10 10:51:32 +00:00
Amara Emerson	836b0f48c1	Add a late IR expansion pass for the experimental reduction intrinsics. This pass uses a new target hook to decide whether or not to expand a particular intrinsic to the shuffevector sequence. Differential Revision: https://reviews.llvm.org/D32245 llvm-svn: 302631	2017-05-10 09:42:49 +00:00
Igor Breger	fda31e64e0	[GlobalISel][X86] G_ZEXT i1 to i32/i64 support. Summary: Support G_ZEXT i1 to i32/i64 instruction selection. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32965 llvm-svn: 302623	2017-05-10 06:52:58 +00:00
Stanislav Mekhanoshin	7e3794d5c3	[AMDGPU] Fixed typo in GCNRegPressure, NFC VGRP -> VGPR, SGRP -> SGPR llvm-svn: 302586	2017-05-09 20:50:04 +00:00
Matthew Simpson	78fd46b230	[AArch64] Consider widening instructions in cost calculations The AArch64 instruction set has a few "widening" instructions (e.g., uaddl, saddl, uaddw, etc.) that take one or more doubleword operands and produce quadword results. The operands are automatically sign- or zero-extended as appropriate. However, in LLVM IR, these extends are explicit. This patch updates TTI to consider these widening instructions as single operations whose cost is attached to the arithmetic instruction. It marks extends that are part of a widening operation "free" and applies a sub-target specified overhead (zero by default) to the arithmetic instructions. Differential Revision: https://reviews.llvm.org/D32706 llvm-svn: 302582	2017-05-09 20:18:12 +00:00
Serge Guelton	e38003f839	Suppress all uses of LLVM_END_WITH_NULL. NFC. Use variadic templates instead of relying on <cstdarg> + sentinel. This enforces better type checking and makes code more readable. Differential Revision: https://reviews.llvm.org/D32541 llvm-svn: 302571	2017-05-09 19:31:13 +00:00
Jacques Pienaar	0dbcc34f6b	[lanai] Add computeKnownBitsForTargetNode for Lanai. Summary: computeKnownBitsForTargetNode was not defined for Lanai which resulted in additional AND's with 0x1 for the output of SETCC instructions. Reviewers: eliben, majnemer Reviewed By: majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29605 llvm-svn: 302568	2017-05-09 18:35:26 +00:00
Craig Topper	f893d49f0c	[X86] Add more patterns for BZHI isel This patch adds more patterns that a reasonable person might write that can be compiled to BZHI. This adds support for (~0U >> (32 - b)) & a; and a << (32 - b) >> (32 - b); This was inspired by the code in APInt::clearUnusedBits. This can pass an index of 32 to the bzhi instruction which a quick test of Haswell hardware shows will not mask any bits. Though the description text in the Intel manual says the "index is saturated to OperandSize-1". The pseudocode in the same manual indicates no bits will be zeroed for this case. I think this is still missing cases where the subtract portion is an 8-bit operation. Differential Revision: https://reviews.llvm.org/D32616 llvm-svn: 302549	2017-05-09 16:32:11 +00:00
Guy Blank	0c42d8c35b	VX512] Only look at lower bit in constant scalar masks for scalar masked instructions only the lower bit of the mask is relevant. so for constant masks we should either do an unmasked operation or no operation, depending on the value of the lower bit. This patch handles cases where the lower bit is '1'. Differential Revision: https://reviews.llvm.org/D32805 llvm-svn: 302546	2017-05-09 16:16:48 +00:00
Tim Shen	04de70d3a7	[Atomic] Remove IsStore/IsLoad in the interface, and pass the instruction instead. NFC. Now both emitLeadingFence and emitTrailingFence take the instruction itself, instead of taking IsLoad/IsStore pairs. Instruction::mayReadFromMemory and Instrucion::mayWriteToMemory are used for determining those two booleans. The instruction argument is also useful for later D32763, in emitTrailingFence. For emitLeadingFence, it seems to have cleaner interface with the proposed change. Differential Revision: https://reviews.llvm.org/D32762 llvm-svn: 302539	2017-05-09 15:27:17 +00:00
Aaron Ballman	3234647df6	Amend r302535; ifndef and ifdef are different, as it turns out. llvm-svn: 302537	2017-05-09 15:12:03 +00:00
Aaron Ballman	06297e839a	ARMRegisterBankInfo.h requires LLVM_BUILD_GLOBAL_ISEL to be defined. If it is not defined, then ARMGenRegisterBank.inc is not table generated and the inclusion of this header causes the build to fail. llvm-svn: 302535	2017-05-09 14:59:48 +00:00
Serge Pavlov	d526b13e61	Add extra operand to CALLSEQ_START to keep frame part set up previously Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors. This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments. The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it. The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481. Differential Revision: https://reviews.llvm.org/D32394 llvm-svn: 302527	2017-05-09 13:35:13 +00:00
Simon Dardis	659c43f11a	Revert "[MIPS] Add support to match more patterns for DINS instruction" This reverts commit rL302512. This broke the mips buildbots. llvm-svn: 302526	2017-05-09 13:18:48 +00:00
Simon Pilgrim	ca3a63a849	[X86][SSE42] Lower v2i64/v4i64 ASHR(X, 63) as PCMPGTQ(0, X) Similar to what we do for vXi8 ASHR(X, 7), use SSE42's PCMPGTQ to splat the sign instead of using the PSRAD+PSHUFD. Avoiding bitcasts this improves combines that utilize computeNumSignBits, permits memory folding and reduces pipe pressure. Although it does require a second register, given that this is a (cheap) zero register the impact is minimal. Differential Revision: https://reviews.llvm.org/D32973 llvm-svn: 302525	2017-05-09 13:14:40 +00:00
Nikolai Bozhenov	b7bf386e80	[X86] Clang option -fuse-init-array has no effect when generating for MCU target Reviewers: Eugene.Zelenko, dschuff, craig.topper Reviewed By: craig.topper Subscribers: ahatanak, aaboud, DavidKreitzer, llvm-commits, cfe-commits Differential Revision: https://reviews.llvm.org/D32543 Patch by AndreiGrischenko <andrei.l.grischenko@intel.com> llvm-svn: 302513	2017-05-09 10:14:03 +00:00
Strahinja Petrovic	27ae4c3259	[MIPS] Add support to match more patterns for DINS instruction This patch adds support for recognizing patterns to match DINS instruction. Differential Revision: https://reviews.llvm.org/D31465 llvm-svn: 302512	2017-05-09 10:02:00 +00:00
Diana Picus	95640a1c4d	[ARM GlobalISel] Remove hand-written G_FADD selection Remove the code selecting G_FADD - now that TableGen can handle more opcodes, it's not needed anymore. llvm-svn: 302511	2017-05-09 08:32:42 +00:00
Tim Northover	c48c993b75	ARM: use divmod libcalls on embedded MachO platforms too. The separated libcalls are implemented in terms of __divmodsi4 and __udivmodsi4 anyway, so we should always use them if possible. llvm-svn: 302462	2017-05-08 20:00:14 +00:00
Quentin Colombet	55a72b3b05	[AArch64][RegisterBankInfo] Change the default mapping of fp loads. This fixes PR32550, in a way that does not imply running the greedy mode at O0. The fix consists in checking if a load is used by any floating point instruction and if yes, we return a default mapping with FPR instead of GPR. llvm-svn: 302453	2017-05-08 18:16:31 +00:00
Quentin Colombet	0e41a41b87	[AArch64][RegisterBankInfo] Fix mapping cost for GPR. In r292478, we changed the order of the enum that is referenced by PMI_FirstXXX. This had the side effect of changing the cost of the mapping of all the loads, instead of just the FPRs ones. Reinstate the higher cost for all but GPR loads. Note: This did not have any external visible effects: - For Fast mode, the cost would have been higher, but we don't care because we don't try to use alternative mappings. - For Greedy mode, the higher cost of the GPR loads, would have triggered the use of the supposedly alternative mapping, that would be in fact the same GPR mapping but with a lower cost. llvm-svn: 302452	2017-05-08 18:16:23 +00:00
Craig Topper	8297e52285	[ARM] Use a Changed flag to avoid making a pass's return value dependent on a compare with a Statistic object. Statistic compile to always be 0 in release build so this compare would always return false. And in the debug builds Statistic are global variables and remember their values across pass runs. So this compare returns true anytime the pass runs after the first time it modifies something. This was found after reviewing all usages of comparison operators on a Statistic object. We had some internal code that did a compare with a statistic that caused a mismatch in output between debug and release builds. So we did an audit out of paranoia. llvm-svn: 302450	2017-05-08 18:02:51 +00:00
Simon Pilgrim	df39b03f29	[X86][SSE] Improve combineLogicBlendIntoPBLENDV to use general masks. Currently combineLogicBlendIntoPBLENDV can only match ASHR to detect sign splatting of a bit mask, this patch generalises this to use computeNumSignBits instead. This is a first step in several things we can do to improve PBLENDV support: * Better matching of X86ISD::ANDNP patterns. * Handle floating point cases. * Better vector and bitcast support in computeNumSignBits. * Recognise that PBLENDV only uses the sign bit of the mask, we should be able strip away sign splats (ASHR, PCMPGT isNeg tests etc.). Differential Revision: https://reviews.llvm.org/D32953 llvm-svn: 302424	2017-05-08 14:16:39 +00:00
Simon Pilgrim	f5ca255d18	[ARM][NEON] Add support for ISD::ABS lowering Update NEON int_arm_neon_vabs intrinsic to use the ISD::ABS opcode directly Added constant folding tests. Differential Revision: https://reviews.llvm.org/D32938 llvm-svn: 302417	2017-05-08 10:37:34 +00:00
Martin Storsjo	fd4c158a84	[ARM] Clear the constant pool cache on explicit .ltorg directives Multiple ldr pseudoinstructions with the same constant value will reuse the same constant pool entry. However, if the constant pool is explicitly flushed with a .ltorg directive, we should not try to reference constants in the previous pool any longer, since they may be out of range. This fixes assembling hand-written assembler source which repeatedly loads the same constant value, across a binary size larger than the pc-relative fixup range for ldr instructions (4096 bytes). Such assembler source already uses explicit .ltorg instructions to emit constant pools with regular intervals. However if we try to reuse constants emitted in earlier pools, they end up out of range. This makes the output of the testcase match what binutils gas does (prior to this patch, it would fail to assemble). Differential Revision: https://reviews.llvm.org/D32847 llvm-svn: 302416	2017-05-08 10:26:24 +00:00
Simon Pilgrim	7a28a3ac78	[AARCH64][NEON] Add support for ISD::ABS lowering Update int_aarch64_neon_abs intrinsic to use the ISD::ABS opcode directly Differential Revision: https://reviews.llvm.org/D32940 llvm-svn: 302415	2017-05-08 10:25:18 +00:00
Igor Breger	810c6257f1	[GlobalISel][X86] G_GEP selection support. Summary: [GlobalISel][X86] G_GEP selection support. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32396 llvm-svn: 302412	2017-05-08 09:40:43 +00:00
Igor Breger	605b965ae5	[GlobalISel][X86] G_MUL legalizer/selector support. Summary: G_MUL legalizer/selector/regbank support. Use only Tablegen-erated instruction selection. This patch dealing with legal operations only. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: krytarowski, rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D32698 llvm-svn: 302410	2017-05-08 09:03:37 +00:00
Dean Michael Berris	9bcaed867a	[XRay] Custom event logging intrinsic This patch introduces an LLVM intrinsic and a target opcode for custom event logging in XRay. Initially, its use case will be to allow users of XRay to log some type of string ("poor man's printf"). The target opcode compiles to a noop sled large enough to enable calling through to a runtime-determined relative function call. At runtime, when X-Ray is enabled, the sled is replaced by compiler-rt with a trampoline to the logic for creating the custom log entries. Future patches will implement the compiler-rt parts and clang-side support for emitting the IR corresponding to this intrinsic. Reviewers: timshen, dberris Subscribers: igorb, pelikan, rSerge, timshen, echristo, dberris, llvm-commits Differential Revision: https://reviews.llvm.org/D27503 llvm-svn: 302405	2017-05-08 05:45:21 +00:00
Simon Pilgrim	2d1c6d6e8d	[X86][AVX1] Improve 256-bit vector costs for integer unary intrinsics. Account for subvector extraction/insertion, helps prevent the vectorizers from selecting 256-bit vectors that will have to be split anyhow on AVX1 targets. llvm-svn: 302378	2017-05-07 20:58:55 +00:00
Simon Pilgrim	33f7397cc0	[X86][AVX512] Relax assertion and just exit combine for unsupported types (PR32907) llvm-svn: 302361	2017-05-06 20:53:52 +00:00
Simon Pilgrim	fea153f341	[X86][AVX512] Move v2i64/v4i64 VPABS lowering to tablegen Extend NoVLX targets to use the 512-bit versions llvm-svn: 302359	2017-05-06 19:11:59 +00:00
Simon Pilgrim	f15a2f4d94	[X86] Reduce code for setting operations actions by merging into loops across multiple types/ops. NFCI. llvm-svn: 302357	2017-05-06 18:17:56 +00:00
Simon Pilgrim	98f1d02677	[NVPTX] Add support for ISD::ABS lowering Use the ISD::ABS opcode directly Differential Revision: https://reviews.llvm.org/D32944 llvm-svn: 302356	2017-05-06 17:42:09 +00:00
Simon Pilgrim	781cb10104	[X86][SSE] Break register dependencies on v16i8/v8i16 BUILD_VECTOR on SSE41 rL294581 broke unnecessary register dependencies on partial v16i8/v8i16 BUILD_VECTORs, but on SSE41 we (currently) use insertion for full BUILD_VECTORs as well. By allowing full insertion to occur on SSE41 targets we can break register dependencies here as well. llvm-svn: 302355	2017-05-06 17:30:39 +00:00
Quentin Colombet	245994d968	[RegisterBankInfo] Uniquely allocate instruction mapping. This is a step toward having statically allocated instruciton mapping. We are going to tablegen them eventually, so let us reflect that in the API. NFC. llvm-svn: 302316	2017-05-05 22:48:22 +00:00
Krzysztof Parzyszek	ee93e009c8	[Hexagon] Disable predicated calls by default llvm-svn: 302307	2017-05-05 22:13:57 +00:00
Krzysztof Parzyszek	e260332838	[Hexagon] Remove C6 and C7 as separate registers These are M0 and M1. Removing duplicated registers reduces the number of explicit register aliasing. llvm-svn: 302306	2017-05-05 22:12:12 +00:00
Krzysztof Parzyszek	d0c71ef8ab	[RDF] Remove covered parts of reached uses for phi and use in same block llvm-svn: 302305	2017-05-05 22:10:32 +00:00
Matthias Braun	4682ac6c83	ARM: Compute MaxCallFrame size early This exposes a method in MachineFrameInfo that calculates MaxCallFrameSize and calls it after instruction selection in the ARM target. This avoids ARMBaseRegisterInfo::canRealignStack()/ARMFrameLowering::hasReservedCallFrame() giving different answers in early/late phases of codegen. The testcase shows a particular nasty example result of that where we would fail to properly align an alloca. Differential Revision: https://reviews.llvm.org/D32622 llvm-svn: 302303	2017-05-05 22:04:05 +00:00
Kannan Narayanan	5e73b04b84	[AMDGPU] In the new waitcnt insertion pass, use getHeader instead of getTopBlock to find the loop header. Differential Revision: https://reviews.llvm.org/D32831 llvm-svn: 302290	2017-05-05 21:10:17 +00:00
Simon Pilgrim	430a335b7b	[X86] Use SDValue::getConstantOperandVal helper. NFCI. llvm-svn: 302286	2017-05-05 20:53:52 +00:00
Konstantin Zhuravlyov	6ccb076aeb	AMDGPU/AMDHSA: Set COMPUTE_PGM_RSRC2:LDS_SIZE to 0 This field is populated by the CP Differential Revision: https://reviews.llvm.org/D32619 llvm-svn: 302277	2017-05-05 20:13:55 +00:00
Alexei Starovoitov	7bab73b1f8	[bpf] fix a bug which causes incorrect big endian reloc fixup o Add bpfeb support in BPF dwarfdump unit test case Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@fb.com> llvm-svn: 302265	2017-05-05 18:05:00 +00:00
Craig Topper	f0aeee01c3	[KnownBits] Add wrapper methods for setting and clear all bits in the underlying APInts in KnownBits. This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown. Differential Revision: https://reviews.llvm.org/D32637 llvm-svn: 302262	2017-05-05 17:36:09 +00:00
Jun Bum Lim	94d42533eb	[AArch64] Remove AArch64AddressTypePromotion pass Summary: Remove the AArch64AddressTypePromotion pass as we migrated all transformations done in this pass into CGP in r299379. Reviewers: qcolombet, jmolloy, javed.absar, mcrosier Reviewed By: qcolombet Subscribers: aemerson, rengolin, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D31623 llvm-svn: 302245	2017-05-05 16:05:41 +00:00
Simon Pilgrim	ac3c4b6da4	[X86][AVX512] Improve support and testing for CTLZ of 512-bit vectors without CDI llvm-svn: 302233	2017-05-05 13:31:52 +00:00
Simon Pilgrim	e9c5d7b70b	[X86] Remove duplicate operation actions. NFCI. llvm-svn: 302230	2017-05-05 12:34:55 +00:00
Simon Pilgrim	c89aa0bee5	[X86][AVX512CDI] Move v2i64/v4i64 and v4i32/v8i32 VPLZCNT lowering to tablegen Extend NoVLX targets to use the 512-bit versions llvm-svn: 302229	2017-05-05 12:20:34 +00:00
Simon Pilgrim	73b88d5183	Remove unused variable llvm-svn: 302226	2017-05-05 11:55:38 +00:00
John Brawn	1b74f8c51f	[ARM] Add support for ORR and ORN instruction substitutions Recently support was added for substituting one intruction for another by negating or inverting the immediate, but ORR and ORN were missed so this patch adds them. This one is slightly different to the others in that ORN only exists in thumb, so we only do the substitution in thumb. Differential Revision: https://reviews.llvm.org/D32534 llvm-svn: 302224	2017-05-05 11:31:25 +00:00
Simon Pilgrim	1d47a15d89	[X86][AVX] Add LowerIntUnary helpers to split unary vector ops in half. NFCI. Same as LowerIntArith helpers but for unary ops instead of binary. llvm-svn: 302222	2017-05-05 10:59:24 +00:00
Andrew Ng	807ca72e66	[X86] Remove unused code from X86 optimize LEAs. NFC. This patch removes unused code which is no longer required because of changes to the DIExpression::prepend function. llvm-svn: 302219	2017-05-05 09:21:35 +00:00
Daniel Jasper	07a1771959	Initialize new member X86Operand::FrontendSize in all codepaths. This fixes MSAN-builds after r302179. llvm-svn: 302214	2017-05-05 07:31:40 +00:00
Marek Olsak	584d2c05d4	AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5 Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32645 llvm-svn: 302200	2017-05-04 22:25:20 +00:00
Simon Pilgrim	11a1637a10	Strip trailing whitespace. NFCI. llvm-svn: 302192	2017-05-04 20:55:16 +00:00
Krzysztof Parzyszek	038a0546db	[PPC] When restoring R30 (PIC base pointer), mark it as <def> This happened on the PPC32/SVR4 path and was discovered when building FreeBSD on PPC32. It was a typo-class error in the frame lowering code. This fixes PR26519. llvm-svn: 302183	2017-05-04 19:14:54 +00:00
Reid Kleckner	6d2ea6ec80	[ms-inline-asm] Use the frontend size only for ambiguous instructions This avoids problems on code like this: char buf[16]; __asm { movups xmm0, [buf] mov [buf], eax } The frontend size in this case (1) is wrong, and the register makes the instruction matching unambiguous. There are also enough bytes available that we shouldn't complain to the user that they are potentially using an incorrectly sized instruction to access the variable. Supersedes D32636 and D26586 and fixes PR28266 llvm-svn: 302179	2017-05-04 18:19:52 +00:00
Jonas Paulsson	4fd156261e	[SystemZ] Make copyPhysReg() add impl-use operands of super reg. When a 128 bit COPY is lowered into two instructions, an impl-use operand of the super-reg should be added to each new instruction in case one of the sub-regs is undefined. Review: Ulrich Weigand llvm-svn: 302146	2017-05-04 13:33:30 +00:00
Simon Dardis	080d478bd2	[mips][XRay] Use the base version of emitXRayTable Follow up rL290858 by removing the MIPS specific version of XRayTable emission in favour of the basic version. This resolves a buildbot failure where the ELF sections were malformed causing the linker to reject the object files with xray related sections. Reviewers: dberris, slthakur Differential Revision: https://reviews.llvm.org/D32808 llvm-svn: 302138	2017-05-04 11:03:50 +00:00
Igor Breger	70583606b1	[X86][AVX-512] Allow EVEX encoded instruction selection when available for mul v8i32. Differential Revision: https://reviews.llvm.org/D32679 llvm-svn: 302127	2017-05-04 07:34:58 +00:00
Sam Parker	df337704f0	[ARM] ACLE Chapter 9 intrinsics Added the integer data processing intrinsics from ACLE v2.1 Chapter 9 but I have missed out the saturation_occurred intrinsics for now. For the instructions that read and write the GE bits, a chain is included and the only instruction that reads these flags (sel) is only selectable via the implemented intrinsic. Differential Revision: https://reviews.llvm.org/D32281 llvm-svn: 302126	2017-05-04 07:31:28 +00:00
Oren Ben Simhon	51de0330eb	[X86] Disabling PLT in Regcall CC Functions According to psABI, PLT stub clobbers XMM8-XMM15. In Regcall calling convention those registers are used for passing parameters. Thus we need to prevent lazy binding in Regcall. Differential Revision: https://reviews.llvm.org/D32430 llvm-svn: 302124	2017-05-04 07:22:49 +00:00
Igor Breger	c6eccdd5c0	[AVX] Fix vpcmpeqq predicate. Summary: Fix vpcmpeqq predicate. AVX512 version of vpcmpeqq is not equivalent to AVX one. Split from https://reviews.llvm.org/D32679 Reviewers: craig.topper, zvi, aymanmus Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32786 llvm-svn: 302119	2017-05-04 06:24:52 +00:00
Reid Kleckner	5c0bdef5aa	Mark functions as not having CFI once we finalize an x86 stack frame We'll set it back to true in emitPrologue if it gets called. It doesn't get called for naked functions. Fixes PR32912 llvm-svn: 302092	2017-05-03 23:13:42 +00:00
Craig Topper	d938fd1397	[KnownBits] Add zext, sext, and trunc methods to KnownBits This patch adds zext, sext, and trunc methods to KnownBits and uses them where possible. Differential Revision: https://reviews.llvm.org/D32784 llvm-svn: 302088	2017-05-03 22:07:25 +00:00
Ahmed Bougacha	a1991bdde2	[AArch64] armv8-A doesn't have CRC. That's only a required extension as of v8.1a. Remove it from the "generic" CPU as well: it should only support the base ISA (and binutils agrees). Also unify the MC tests into crc.s and arm64-crc32.s llvm-svn: 302077	2017-05-03 20:33:52 +00:00
Krzysztof Parzyszek	2af5037d34	[Hexagon] Use automatically-generated scheduling information for HVX Patch by Jyotsna Verma. llvm-svn: 302073	2017-05-03 20:10:36 +00:00
Reid Kleckner	a0b45f4bfc	[IR] Abstract away ArgNo+1 attribute indexing as much as possible Summary: Do three things to help with that: - Add AttributeList::FirstArgIndex, which is an enumerator currently set to 1. It allows us to change the indexing scheme with fewer changes. - Add addParamAttr/removeParamAttr. This just shortens addAttribute call sites that would otherwise need to spell out FirstArgIndex. - Remove some attribute-specific getters and setters from Function that take attribute list indices. Most of these were only used from BuildLibCalls, and doesNotAlias was only used to test or set if the return value is malloc-like. I'm happy to split the patch, but I think they are probably easier to review when taken together. This patch should be NFC, but it sets the stage to change the indexing scheme to this, which is more convenient when indexing into an array: 0: func attrs 1: retattrs 2...: arg attrs Reviewers: chandlerc, pete, javed.absar Subscribers: david2050, llvm-commits Differential Revision: https://reviews.llvm.org/D32811 llvm-svn: 302060	2017-05-03 18:17:31 +00:00
Simon Pilgrim	03ccf91d85	[X86][LWP] Add stack folding mappings and tests for LWPINS/LWPVAL instructions llvm-svn: 302049	2017-05-03 16:46:30 +00:00
Simon Pilgrim	eada39d050	Silence a 'enum and non-enum used in conditional' warning. llvm-svn: 302048	2017-05-03 16:43:57 +00:00
Simon Pilgrim	99b925bdf3	[X86][LWP] Add llvm support for LWP instructions (reapplied). This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Reapplied - this time without changing line endings of existing files. Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302041	2017-05-03 15:51:39 +00:00
Simon Pilgrim	a271c54324	Revert rL302028 due to accidental line ending changes. llvm-svn: 302038	2017-05-03 15:42:29 +00:00
Krzysztof Parzyszek	d10df49c90	[Hexagon] Handle S2_storerf_io in HexagonInstrInfo llvm-svn: 302036	2017-05-03 15:36:51 +00:00
Krzysztof Parzyszek	700a5f99c7	[Hexagon] Misc fixes in HexagonInstrInfo, NFC Formatting changes + remove unused function. llvm-svn: 302035	2017-05-03 15:34:52 +00:00
Krzysztof Parzyszek	4763c2d999	[Hexagon] Adjust latency between allocframe and the first store on stack Allocframe and the following stores on the stack have a latency of 2 cycles when not in the same packet. This happens because R29 is needed early by the store instruction. Since one of such stores can be packetized along with allocframe and use old value of R29, we can assign it 0 cycle latency while leaving latency of other stores to the default value of 2 cycles. Patch by Jyotsna Verma. llvm-svn: 302034	2017-05-03 15:33:09 +00:00
Krzysztof Parzyszek	19635bdcbb	[Hexagon] Handle J2_jumptpt and J2_jumpfpt in HexagonInstrInfo llvm-svn: 302033	2017-05-03 15:30:46 +00:00
Krzysztof Parzyszek	0a8043e1b3	[Hexagon] Implement undoing .cur instructions in packetizer The packetizer needs to convert .cur instruction to its regular form if the use is not in the same packet as the .cur. The code in the packetizer handles one type of .cur, which is the vector load case. This patch updates the packetizer so that it can undo all the .cur instructions. In the test case, the .cur is the 128B version, but there are also the post-increment versions. Patch by Brendon Cahoon. llvm-svn: 302032	2017-05-03 15:28:56 +00:00
Krzysztof Parzyszek	4be9d92b69	[Hexagon] Add memory operands to a rewritten load llvm-svn: 302030	2017-05-03 15:26:13 +00:00
Krzysztof Parzyszek	781324fc7e	[Hexagon] Reset spill alignment when variable-sized objects are present llvm-svn: 302029	2017-05-03 15:23:53 +00:00
Simon Pilgrim	b2e0464fde	[X86][LWP] Add llvm support for LWP instructions. This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302028	2017-05-03 15:18:34 +00:00
Guy Blank	d0baa524d0	[X86][AVX512] remove unnecessary case. NFC VFPCLASS is for vector types and not scalar, so it cannot get here. Differential Revision: https://reviews.llvm.org/D32694 llvm-svn: 302023	2017-05-03 13:34:05 +00:00
Jonas Paulsson	f40eac5088	[SystemZ] Properly check number of operands in getCmpOpsType() It is needed to check that the number of operands are 2 when finding the case of a logic combination, e.g. 'and' of two compares. Review: Ulrich Weigand llvm-svn: 302022	2017-05-03 13:33:45 +00:00
Oren Ben Simhon	dbd4bba1ec	[X86] Support of no_caller_saved_registers attribute This patch implements the LLVM part for no_caller_saved_registers attribute as appears here: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=5ed3cc7b66af4758f7849ed6f65f4365be8223be. In order to implement the attribute, we use the dynamic CSR mechanism to remove returned/passed arguments from the function regmask/CSR list. Differential Revision: https://reviews.llvm.org/D31876 llvm-svn: 302020	2017-05-03 13:07:19 +00:00
Dylan McKay	4aedb8a6b7	[AVR] Reserve the Y register in all functions llvm-svn: 302017	2017-05-03 11:56:01 +00:00
Dylan McKay	c30d85bd8a	Revert "[AVR] Enable the frame pointer for all functions" This reverts commit 358ad02d999e88853d2cfc954bd2f668308a51f7. llvm-svn: 302014	2017-05-03 11:36:42 +00:00
Simon Pilgrim	05cfa83843	[X86] Refactored LowerINTRINSIC_W_CHAIN to use a switch statament. NFCI. Pre-commit as requested in D32769. llvm-svn: 302010	2017-05-03 10:40:18 +00:00
Tim Shen	e59d06fe78	[PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8)) back to a single instruction Summary: This is the corresponding llvm change to D28037 to ensure no performance regression. Reviewers: bogner, kbarton, hfinkel, iteratee, echristo Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D28329 llvm-svn: 301990	2017-05-03 00:07:02 +00:00
Tim Northover	4a01ffbd6a	ARM: avoid handing a deleted node back to TableGen during ISel. When we replaced the multiplicand the destination node might already exist. When that happens the original gets CSEd and deleted. However, it's actually used as the offset so nonsense is produced. Should fix PR32726. llvm-svn: 301983	2017-05-02 22:45:19 +00:00
Reid Kleckner	ee4930b688	Re-land r301697 "[IR] Make add/remove Attributes use AttrBuilder instead of AttributeList" This time, I fixed, built, and tested clang. This reverts r301712. llvm-svn: 301981	2017-05-02 22:07:37 +00:00
Joel Jones	6513405735	[AArch64] ILP32 Backend Relocation Support Remove "_NC" suffix and semantics from TLSDESC_LD{64,32}_LO12 and TLSDESC_ADD_LO12 relocations Rearrange ordering in AArch64.def to follow relocation encoding Fix name: R_AARCH64_P32_LD64_GOT_LO12_NC => R_AARCH64_P32_LD32_GOT_LO12_NC Add support for several "TLS", "TLSGD", and "TLSLD" relocations for ILP32 Fix return values from isNonILP32reloc Add implementations for R_AARCH64_ADR_PREL_PG_HI21_NC, R_AARCH64_P32_LD32_GOT_LO12_NC, R_AARCH64_P32_TLSIE_LD32_GOTTPREL_LO12_NC, R_AARCH64_P32_TLSDESC_LD32_LO12, R_AARCH64_LD64_GOT_LO12_NC, TLSLD_LDST128_DTPREL_LO12, TLSLD_LDST128_DTPREL_LO12_NC, TLSLE_LDST128_TPREL_LO12, TLSLE_LDST128_TPREL_LO12_NC Modify error messages to give name of equivalent relocation in the ABI not being used, along with better checking for non-existent requested relocations. Added assembler support for "pg_hi21_nc" Relocation definitions added without implementations: R_AARCH64_P32_TLSDESC_ADR_PREL21, R_AARCH64_P32_TLSGD_ADR_PREL21, R_AARCH64_P32_TLSGD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_ADR_PREL21, R_AARCH64_P32_TLSLD_ADR_PAGE21, R_AARCH64_P32_TLSLD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_LD_PREL19, R_AARCH64_P32_TLSDESC_LD_PREL19, R_AARCH64_P32_TLSGD_ADR_PAGE21, R_AARCH64_P32_TLS_DTPREL, R_AARCH64_P32_TLS_DTPMOD, R_AARCH64_P32_TLS_TPREL, R_AARCH64_P32_TLSDESC Fix encoding: R_AARCH64_P32_TLSDESC_ADR_PAGE21 Reviewers: Peter Smith Patch by: Joel Jones (jjones@cavium.com) Differential Revision: https://reviews.llvm.org/D32072 llvm-svn: 301980	2017-05-02 22:01:48 +00:00
Tim Northover	f9d8eee3db	ARM: add arm1176j-f processor I doubt anyone actually uses it, and I'm not even entirely convinced it exists myself; but it is our default for "clang -arch armv6". Functionally, if it does exist it's identical to the arm1176jz-f from LLVM's point of view (the difference is apparently in the "Security Extensions"). llvm-svn: 301962	2017-05-02 19:06:13 +00:00
Matt Arsenault	5c80618fb7	AMDGPU: Don't promote alloca to LDS for leaf functions LDS use in leaf functions not currently handled. llvm-svn: 301958	2017-05-02 18:33:18 +00:00
Krzysztof Parzyszek	fca6fae463	[Hexagon] Fix uninitialized value caught with valgrind Patch by Colin LeMahieu. llvm-svn: 301957	2017-05-02 18:29:49 +00:00
Krzysztof Parzyszek	57a8bb4343	[Hexagon] Change iconst to emit 27bit relocation Patch by Colin LeMahieu. llvm-svn: 301956	2017-05-02 18:19:11 +00:00
Krzysztof Parzyszek	a750383d0f	[Hexagon] Add extenders for GD_PLT_B22_PCREL and LD_PLT_B22_PCREL Patch by Sid Manning. llvm-svn: 301955	2017-05-02 18:15:33 +00:00
Krzysztof Parzyszek	9aaf923376	[Hexagon] Don't ignore mult-cycle latency information The compiler was generating code that ends up ignoring a multiple latency dependence between two instructions by scheduling the intructions in back-to-back packets. The packetizer needs to end a packet if the latency of the current current insruction and the source in the previous packet is greater than 1 cycle. This case occurs when there is still room in the current packet, but scheduling the instruction causes a stall. Instead, the packetizer should start a new packet. Also, if the current packet already contains a stall, then it is okay to add another instruction to the packet that also causes a stall. This occurs when there are no instructions that can be scheduled in between the producer and consumer instructions. This patch changes the latency for loads to 2 cycles from 3 cycles. This change refects that a load only needs to be separated by one extra packet to eliminate the stall. Patch by Ikhlas Ajbar. llvm-svn: 301954	2017-05-02 18:12:19 +00:00
Krzysztof Parzyszek	32e20b80c6	[Hexagon] Formatting changes, NFC llvm-svn: 301953	2017-05-02 18:09:07 +00:00
Krzysztof Parzyszek	188ab98f67	[Hexagon] Remove unused validSubtarget TSFlags Patch by Colin LeMahieu. llvm-svn: 301952	2017-05-02 18:05:36 +00:00
Krzysztof Parzyszek	b0af1ef741	[Hexagon] Make sure duplexed dealloc_returns are checked for double jumps Patch by Colin LeMahieu. llvm-svn: 301951	2017-05-02 18:03:08 +00:00
Krzysztof Parzyszek	49f7e0a98b	[Hexagon] Move checking AXOK to checker Patch by Colin LeMahieu. llvm-svn: 301949	2017-05-02 18:00:37 +00:00
Krzysztof Parzyszek	57f5046b4a	[Hexagon] Remove unneeded code from HexagonShuffler Patch by Colin LeMahieu. llvm-svn: 301947	2017-05-02 17:58:52 +00:00
Krzysztof Parzyszek	c15f8d2a08	[Hexagon] Extract function that checks endloops with other branches Change location number to point to conflicting branch instruction. Patch by Colin LeMahieu. llvm-svn: 301946	2017-05-02 17:56:11 +00:00
Krzysztof Parzyszek	1cc6bfbc83	[Hexagon] Add new packet iterator which will iterate through duplexes Patch by Colin LeMahieu. llvm-svn: 301945	2017-05-02 17:53:51 +00:00
Zachary Turner	a0aae2757d	Revert "Remove "_NC" suffix and semantics from TLSDESC_LD{64,32}_LO12 and" This reverts commit c08155afc5d3230792da2ad30a046a8617735a73. This is causing undefined symbol errors with some of the constants. llvm-svn: 301944	2017-05-02 17:51:27 +00:00
Krzysztof Parzyszek	107f82d128	[Hexagon] Check for .cur def without use without using a map data structure Patch by Colin LeMahieu. llvm-svn: 301943	2017-05-02 17:51:14 +00:00
Joel Jones	705103e523	Remove "_NC" suffix and semantics from TLSDESC_LD{64,32}_LO12 and TLSDESC_ADD_LO12 relocations Rearrange ordering in AArch64.def to follow relocation encoding Fix name: R_AARCH64_P32_LD64_GOT_LO12_NC => R_AARCH64_P32_LD32_GOT_LO12_NC Add support for several "TLS", "TLSGD", and "TLSLD" relocations for ILP32 Fix return values from isNonILP32reloc Add implementations for R_AARCH64_ADR_PREL_PG_HI21_NC, R_AARCH64_P32_LD32_GOT_LO12_NC, R_AARCH64_P32_TLSIE_LD32_GOTTPREL_LO12_NC, R_AARCH64_P32_TLSDESC_LD32_LO12, R_AARCH64_LD64_GOT_LO12_NC, TLSLD_LDST128_DTPREL_LO12, TLSLD_LDST128_DTPREL_LO12_NC, TLSLE_LDST128_TPREL_LO12, TLSLE_LDST128_TPREL_LO12_NC Modify error messages to give name of equivalent relocation in the ABI not being used, along with better checking for non-existent requested relocations. Added assembler support for "pg_hi21_nc" Relocation definitions added without implementations: R_AARCH64_P32_TLSDESC_ADR_PREL21, R_AARCH64_P32_TLSGD_ADR_PREL21, R_AARCH64_P32_TLSGD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_ADR_PREL21, R_AARCH64_P32_TLSLD_ADR_PAGE21, R_AARCH64_P32_TLSLD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_LD_PREL19, R_AARCH64_P32_TLSDESC_LD_PREL19, R_AARCH64_P32_TLSGD_ADR_PAGE21, R_AARCH64_P32_TLS_DTPREL, R_AARCH64_P32_TLS_DTPMOD, R_AARCH64_P32_TLS_TPREL, R_AARCH64_P32_TLSDESC Fix encoding: R_AARCH64_P32_TLSDESC_ADR_PAGE21 Reviewers: Peter Smith Patch by: Joel Jones (jjones@cavium.com) Differential Revision: https://reviews.llvm.org/D32072 llvm-svn: 301939	2017-05-02 17:14:31 +00:00
Matt Arsenault	b03dd8daae	AMDGPU: Refactor AsmPrinter Avoid analyzing functions multiple times. This allows asserting that each function is only analyzed once. llvm-svn: 301938	2017-05-02 17:14:00 +00:00
Matt Arsenault	7b82b4bddb	AMDGPU: Make intrinsics speculatable llvm-svn: 301937	2017-05-02 16:57:44 +00:00
Marek Olsak	a302a736ec	AMDGPU: Add AMDGPU_HS calling convention Reviewers: arsenm, nhaehnle Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D32644 llvm-svn: 301930	2017-05-02 15:41:10 +00:00
Simon Pilgrim	24d361f7bf	[X86] Tidyup subvector insert/extract helpers. NFCI. Use getConstantOperandVal where possible. llvm-svn: 301912	2017-05-02 11:08:15 +00:00
Simon Pilgrim	7aca5218b0	Fix typo in comment. NFCI. llvm-svn: 301911	2017-05-02 10:43:33 +00:00
Diana Picus	8abcbbb24b	[ARM] GlobalISel: Use TableGen instruction selector Emit and use the TableGen instruction selector for ARM. At the moment, this allows us to remove the hand-written code for selecting G_SDIV and G_UDIV. Future commits will focus on increasing the code coverage for it and removing more dead code from the current instruction selector. llvm-svn: 301905	2017-05-02 09:40:49 +00:00
Dylan McKay	28355efdad	[AVR] Save/restore the frame pointer for all functions A recent commit I made made it so that we only did this for signal or interrupt handlers. This broke normal functions. llvm-svn: 301893	2017-05-02 01:57:48 +00:00
Nemanja Ivanovic	b89c27f515	[PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE Fixes PR30730. This is a re-commit of a pulled commit. The commit was pulled because some software projects contained uses of Altivec vectors that violated alignment requirements. Known issues have now been fixed. Committing on behalf of Lei Huang. Differential Revision: https://reviews.llvm.org/D26861 llvm-svn: 301892	2017-05-02 01:47:34 +00:00
Dylan McKay	634339ab40	[AVR] Fix a bug where the frame pointer is clobbered Because it was a callee-saved register, we automatically generated code to spill and unspill its original value so that it is restored after the function returns. The problem is that this code was being generated before the epilogue. The epilogue itself uses the Y register, which could be prematurely restored by the CSR restoration process. This removes R29R28 from the CSR list and changes the prologue/epilogue code to handle it explicitly. llvm-svn: 301887	2017-05-02 00:11:34 +00:00
Dylan McKay	3bb6eb238e	[AVR] Enable the frame pointer for all functions This is a temporary measure while we figure out a way to get the frame pointer working correctly. llvm-svn: 301881	2017-05-01 23:16:59 +00:00
Simon Pilgrim	8d196c88a6	[X86] Reduce code for setting operations actions by merging into loops across multiple types/ops. NFCI. llvm-svn: 301879	2017-05-01 23:09:01 +00:00
Quentin Colombet	cdf8c81127	[AArch64] Move GISel accessor initialization from TargetMachine to Subtarget. NFC llvm-svn: 301841	2017-05-01 21:53:19 +00:00
Simon Pilgrim	ab1a82764f	[X86][AVX] Rename LowerVectorBroadcast to lowerBuildVectorAsBroadcast. NFCI. Since the shuffle refactor, this is only used during BUILD_VECTOR lowering. llvm-svn: 301834	2017-05-01 20:56:35 +00:00
Krzysztof Parzyszek	4a1c3f0aaa	[Hexagon] Replace CVI_VM_CUR_LD type with CVI_VM_LD A .cur instruction can be identified by checking isCVINew() && mayLoad(). Patch by Colin LeMahieu. llvm-svn: 301829	2017-05-01 20:16:35 +00:00
Krzysztof Parzyszek	55db483a46	[Hexagon] Improving error reporting for writing to read only registers Patch by Colin LeMahieu. llvm-svn: 301828	2017-05-01 20:10:41 +00:00
Krzysztof Parzyszek	e96d27a997	[Hexagon] Give better error messages for solo instruction errors Patch by Colin LeMahieu. llvm-svn: 301827	2017-05-01 20:06:01 +00:00
Krzysztof Parzyszek	e12d1e70cb	[Hexagon] Improve shuffle error reporting Patch by Colin LeMahieu. llvm-svn: 301823	2017-05-01 19:41:43 +00:00
Tim Northover	9bb6931c25	X86: initialize a few subtarget variables. Otherwise an indeterminate value gets read, causing a bunch of UBSan failures. llvm-svn: 301819	2017-05-01 17:50:15 +00:00
Sanjoy Das	e6bca0eecb	Rename WeakVH to WeakTrackingVH; NFC This relands r301424. llvm-svn: 301812	2017-05-01 17:07:49 +00:00
Derek Schuff	2fa3604831	[WebAssembly] Fix use of SDNodeFlags after API change in r301803 llvm-svn: 301811	2017-05-01 16:49:39 +00:00
Gabor Horvath	43b72d538f	Remove unnecessary conditions as suggested by clang-tidy. NFC Patch by: Gergely Angeli! Differential Revision: https://reviews.llvm.org/D31936 llvm-svn: 301807	2017-05-01 16:18:42 +00:00
Amara Emerson	d28f0cd448	Generalize the specialized flag-carrying SDNodes by moving flags into SDNode. This removes BinaryWithFlagsSDNode, and flags are now all passed by value. Differential Revision: https://reviews.llvm.org/D32527 llvm-svn: 301803	2017-05-01 15:17:51 +00:00
Dylan McKay	59e7fe3da8	[AVR] Implement non-constant bit rotations This lets us do bit rotations of variable amount. llvm-svn: 301794	2017-05-01 09:48:55 +00:00
Igor Breger	2452ef0ea2	[GlobalISel][X86] Prioritize Tablegen-erated instruction selection. NFC Summary: Prioritizes Tablegen-erated instruction selection over C++ instruction selection. Remove G_ADD/G_SUB C++ selection - implemented by Tablegen. Reviewers: dsanders, zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32677 llvm-svn: 301792	2017-05-01 07:06:08 +00:00
Igor Breger	c08a783521	[GlobalISel][X86] G_SEXT/G_ZEXT support. Reviewers: zvi, guyblank Reviewed By: zvi Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32591 llvm-svn: 301790	2017-05-01 06:30:16 +00:00
Igor Breger	a9edb88d46	[GlobalISel][X86] G_LOAD/G_STORE pointer selection support. Summary: [GlobalISel][X86] G_LOAD/G_STORE pointer selection support. Reviewers: zvi, guyblank Reviewed By: zvi, guyblank Subscribers: dberris, rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D32217 llvm-svn: 301788	2017-05-01 06:08:32 +00:00
Dylan McKay	2e8718bcbb	[AVR] Fix a bug so that we now emit R_AVR_16 fixups with the correct offset Before this, the LDS/STS instructions would have their opcodes overwritten while linking. llvm-svn: 301782	2017-04-30 23:33:52 +00:00
Amaury Sechet	8ac81f3924	Do not legalize large add with addc/adde, introduce addcarry and do it with uaddo/addcarry Summary: As per discution on how to get better codegen an large int legalization, it became clear that using a glue for the carry was preventing several desirable optimizations. Passing the carry down as a value allow for more flexibility. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D29872 llvm-svn: 301775	2017-04-30 19:24:09 +00:00
Craig Topper	778f57b4f1	[APInt] Replace calls to setBits with more specific calls to setBitsFrom and setLowBits where possible. llvm-svn: 301768	2017-04-30 07:44:58 +00:00
Craig Topper	d503644a4a	[X86] Clear KnownBits instead of reconstructing it. NFC llvm-svn: 301767	2017-04-30 07:44:55 +00:00
Simon Atanasyan	3979f43813	[mips] Emit R_MICROMIPS_TLS_GOTTPREL relocation for %gottprel in case of microMIPS In case of microMIPS mode %gottprel operator should emit microMIPS relocation R_MICROMIPS_TLS_GOTTPREL, not R_MIPS_TLS_GOTTPREL. Differential Revision: http://reviews.llvm.org/D32617 llvm-svn: 301763	2017-04-30 04:27:23 +00:00
Daniel Sanders	e9fdba39e0	[globalisel][tablegen] Compute available feature bits correctly. Summary: Predicate<> now has a field to indicate how often it must be recomputed. Currently, there are two frequencies, per-module (RecomputePerFunction==0) and per-function (RecomputePerFunction==1). Per-function predicates are currently recomputed more frequently than necessary since the only predicate in this category is cheap to test. Per-module predicates are now computed in getSubtargetImpl() while per-function predicates are computed in selectImpl(). Tablegen now manages the PredicateBitset internally. It should only be necessary to add the required includes. Also fixed a problem revealed by the test case where constrainSelectedInstRegOperands() would attempt to tie operands that BuildMI had already tied. Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: rovka Subscribers: kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32491 llvm-svn: 301750	2017-04-29 17:30:09 +00:00
Simon Dardis	9d580e8528	[mips][FastISel] Fix a nullptr deference. r301392 introduced a potential nullptr deference causing compilation failures. llvm-svn: 301746	2017-04-29 16:31:40 +00:00
Matt Arsenault	2a80369ae4	AMDGPU: Fix copies from physical registers in SIFixSGPRCopies This would assert when there were multiple defs of a physical register. We just need to move all of the users of it. llvm-svn: 301730	2017-04-29 01:26:34 +00:00
Hans Wennborg	0f88d863b4	Revert r301697 "[IR] Make add/remove Attributes use AttrBuilder instead of AttributeList" This broke the Clang build. (Clang-side patch missing?) Original commit message: > [IR] Make add/remove Attributes use AttrBuilder instead of > AttributeList > > This change cleans up call sites and avoids creating temporary > AttributeList objects. > > NFC llvm-svn: 301712	2017-04-28 23:01:32 +00:00
Krzysztof Parzyszek	072ddb383c	[RDF] Correctly calculate lane masks for defs llvm-svn: 301700	2017-04-28 21:57:53 +00:00
Krzysztof Parzyszek	0b3acbb1dd	[Hexagon] Do not move a block if it is on a fall-through path llvm-svn: 301698	2017-04-28 21:54:11 +00:00
Reid Kleckner	608c8b63b3	[IR] Make add/remove Attributes use AttrBuilder instead of AttributeList This change cleans up call sites and avoids creating temporary AttributeList objects. NFC llvm-svn: 301697	2017-04-28 21:48:28 +00:00
Reid Kleckner	859f8b544a	Make getParamAlignment use argument numbers The method is called "get Param Alignment", and is only used for return values exactly once, so it should take argument indices, not attribute indices. Avoids confusing code like: IsSwiftError = CS->paramHasAttr(ArgIdx, Attribute::SwiftError); Alignment = CS->getParamAlignment(ArgIdx + 1); Add getRetAlignment to handle the one case in Value.cpp that wants the return value alignment. This is a potentially breaking change for out-of-tree backends that do their own call lowering. llvm-svn: 301682	2017-04-28 20:34:27 +00:00
Matthias Braun	744c215e29	TargetLowering: Add finalizeLowering() function; NFC Adds a new method finalizeLowering to TargetLoweringBase. This is in preparation for an upcoming commit. This function is meant for target specific adjustments to MachineFrameInfo or register reservations. Move the freezeRegisters() and the hasCopyImplyingStackAdjustment() handling into the new function to prove the concept. As an added bonus GlobalISel no longer missed the hasCopyImplyingStackAdjustment() handling with this. Differential Revision: https://reviews.llvm.org/D32621 llvm-svn: 301679	2017-04-28 20:25:05 +00:00
Marek Olsak	2d82590f64	AMDGPU: Add new amdgcn.init.exec intrinsics v2: More tests, bug fixes, cosmetic changes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D31762 llvm-svn: 301677	2017-04-28 20:21:58 +00:00
Daniel Berlin	4d0fe64ae3	Kill off the old SimplifyInstruction API by converting remaining users. llvm-svn: 301673	2017-04-28 19:55:38 +00:00
Reid Kleckner	6652a52e2b	Use Argument::hasAttribute and AttributeList::ReturnIndex more This eliminates many extra 'Idx' induction variables in loops over arguments in CodeGen/ and Target/. It also reduces the number of places where we assume that ReturnIndex is 0 and that we should add one to argument numbers to get the corresponding attribute list index. NFC llvm-svn: 301666	2017-04-28 18:37:16 +00:00
Adrian Prantl	109b236850	Clean up DIExpression::prependDIExpr a little. (NFC) llvm-svn: 301662	2017-04-28 17:51:05 +00:00
Alexei Starovoitov	f7bd5ebd3b	[bpf] add bigendian support to disassembler . swap 4-bit register encoding, 16-bit offset and 32-bit imm to support big endian archs . add a test Reported-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 301653	2017-04-28 16:51:01 +00:00
Simon Pilgrim	cce5097ce4	Move variable local to where ita used. NFCI. llvm-svn: 301646	2017-04-28 14:42:15 +00:00
Diana Picus	6f975692e5	[ARM] GlobalISel: fixup r301632 Actually remove ARMInstructionSelector.h... Forgot to stage the removal in the previous commit. llvm-svn: 301633	2017-04-28 09:20:31 +00:00
Diana Picus	674888d84c	[ARM] GlobalISel: Get rid of ARMInstructionSelector.h. NFC. Declare the ARMInstructionSelector in an anonymous namespace, to make it more in line with the other targets which were migrated to this in r299637 in order to avoid TableGen'erated headers being included in non-GlobalISel builds. llvm-svn: 301632	2017-04-28 09:10:38 +00:00
Andrew Ng	03e35b6bc0	[DebugInfo][X86] Improve X86 Optimize LEAs handling of debug values. This is a follow up to the fix in r298360 to improve the handling of debug values when redundant LEAs are removed. The fix in r298360 effectively discarded the debug values. This patch now attempts to preserve the debug values by using the DWARF DW_OP_stack_value operation via prependDIExpr. Moved functions appendOffset and prependDIExpr from Local.cpp to DebugInfoMetadata.cpp and made them available as static member functions of DIExpression. Differential Revision: https://reviews.llvm.org/D31604 llvm-svn: 301630	2017-04-28 08:44:30 +00:00
Craig Topper	053cf4da9d	[WebAssembly] Update calls to computeKnownBits after the changes from r301620. I didn't realize WebAssembly wasn't a default build target so I missed that changes were needed. llvm-svn: 301629	2017-04-28 08:15:33 +00:00
Clement Courbet	5f0ab9e51d	[X86][NFC] Refactor RepMovsRepeats in preparation for D32481. Differential Revision: https://reviews.llvm.org/D32583 llvm-svn: 301628	2017-04-28 07:56:31 +00:00
Craig Topper	d0af7e8ab8	[SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 llvm-svn: 301620	2017-04-28 05:31:46 +00:00
Craig Topper	0e03e74e95	[SelectionDAG] Use various APInt methods to reduce temporary APInt creation This patch uses various APInt methods to reduce the number of temporary APInts. These were all found while working through converting SelectionDAG's computeKnownBits to also use the KnownBits struct recently added to the ValueTracking version. llvm-svn: 301618	2017-04-28 04:57:59 +00:00
Craig Topper	24e71017aa	[APInt] Use inplace shift methods where possible. NFCI llvm-svn: 301612	2017-04-28 03:36:24 +00:00
Sam Kolton	5d99386b4d	[AMDGPU] DPP: add support for GFX9 Reviewers: artem.tamazov Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32588 llvm-svn: 301551	2017-04-27 15:42:38 +00:00
Krzysztof Parzyszek	14f10e03e0	Fix typo and place comment close to its target Patch by Wei-Ren Chen. Differential Revision: https://reviews.llvm.org/D32594 llvm-svn: 301546	2017-04-27 14:38:21 +00:00
Zoran Jovanovic	ffef3e3c6a	[mips][microMIPS] Adding code size reduction pass for MicroMIPS Author: milena.vujosevic.janicic Reviewers: sdardis The code implements size reduction pass for MicroMIPS. Load and store instructions are examined and transformed, if possible. lw32 instruction is transformed into 16-bit instruction lwsp sw32 instruction is transformed into 16-bit instruction swsp Arithmetic instrcutions are examined and transformed, if possible. addu32 instruction is transformed into 16-bit instruction addu16 subu32 instruction is transformed into 16-bit instruction subu16 Differential Revision: https://reviews.llvm.org/D15144 llvm-svn: 301540	2017-04-27 13:10:48 +00:00
Jonas Paulsson	ac4e022d72	[SystemZ] Remove incorrect assert in SystemZTTIImpl In getCmpSelInstrCost(), CondTy may actually be scalar while ValTy is a vector when LoopVectorizer is the caller. Therefore the assert that CondTy must be a vector type if ValTy is was wrong and is now removed. Review: Ulrich Weigand llvm-svn: 301533	2017-04-27 11:01:18 +00:00
Diana Picus	4f46be327c	[ARM] GlobalISel: Fix extended stack operands Fix a crash when trying to extend a value passed as a sign- or zero-extended stack parameter. The cause of the crash was that we were setting the size of the loaded value to 32 bits, and then tyring to extend again to 32 bits. This patch addresses the issue by also introducing a G_TRUNC after the load. This will leave the unused bits to their original values set by the caller, while being consistent about the types. For values that are not extended, we just use a smaller load. llvm-svn: 301531	2017-04-27 10:23:30 +00:00
Igor Breger	360d0f23ee	[GlobalISel][X86] handle not symmetric G_COPY Summary: handle not symmetric G_COPY Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32420 llvm-svn: 301523	2017-04-27 08:02:03 +00:00
Clement Courbet	7b0ec39494	[CodeGen][NFC] Rename 'Src' to 'Val'. 'Src' looks like it was borrowed from memcpy, 'Val' makes more sense for memset and is consistent with naming within the function. Differential Revision: https://reviews.llvm.org/D32580 llvm-svn: 301521	2017-04-27 07:22:30 +00:00
Konstantin Zhuravlyov	97a663b6a2	AMDGPU: Fix assert in scheduler Assert is triggered if DBG_VALUE is first instruction in BB Differential Revision: https://reviews.llvm.org/D32572 llvm-svn: 301511	2017-04-27 03:22:44 +00:00
Matthias Braun	90834df0b4	Lanai: Remove unnecessary canRealignStack() override; NFC It was doing the same as the base implementation and was irritating me when I was searching for backends that have custom behavior for canRealignStack. llvm-svn: 301495	2017-04-26 23:37:01 +00:00
Dmitry Preobrazhensky	43d297eb45	[AMDGPU][MC] Added arg checks for vmcnt, expcnt, lgkmcnt helpers Summary of changes: - corrected vmcnt, expcnt, lgkmcnt helpers to checks their argument for truncation; - added saturated versions of these helpers. See bug 32711 for details: https://bugs.llvm.org//show_bug.cgi?id=32711 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D32546 llvm-svn: 301439	2017-04-26 17:55:50 +00:00
Craig Topper	b45eabcf82	[ValueTracking] Introduce a KnownBits struct to wrap the two APInts for computeKnownBits This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit. Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch. I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases. Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero\|One) so we don't write it out everywhere. Maybe a method for (Zero\|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with. Differential Revision: https://reviews.llvm.org/D32376 llvm-svn: 301432	2017-04-26 16:39:58 +00:00
Sanjoy Das	2cbeb00f38	Reverts commit r301424, r301425 and r301426 Commits were: "Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts" "Add a new WeakVH value handle; NFC" "Rename WeakVH to WeakTrackingVH; NFC" The changes assumed pointers are 8 byte aligned on all architectures. llvm-svn: 301429	2017-04-26 16:37:05 +00:00
Sanjoy Das	01de557738	Rename WeakVH to WeakTrackingVH; NFC Summary: I plan to use WeakVH to mean "nulls itself out on deletion, but does not track RAUW" in a subsequent commit. Reviewers: dblaikie, davide Reviewed By: davide Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D32266 llvm-svn: 301424	2017-04-26 16:20:52 +00:00
Dmitry Preobrazhensky	c7d35a0d6a	[AMDGPU][MC] Added check for truncation of SOPK imm operand See bug 30827: https://bugs.llvm.org//show_bug.cgi?id=30827 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D32535 llvm-svn: 301418	2017-04-26 15:34:19 +00:00
Dylan McKay	828bd6169c	[AVR] Remove an unused local variable llvm-svn: 301413	2017-04-26 14:47:27 +00:00
Sagar Thakur	b458b468a2	[mips] Fix test mips64fpldst.ll with machine verifier enabled Removed micro mips register classes for gp initialization because gp initialization uses pure mips64 instruction. Even when compiling for micro mips, gp initialization can be done with pure mips64 instructions. Reviewed by Simon Dardis Differential: D32286 llvm-svn: 301394	2017-04-26 11:40:12 +00:00
Ayman Musa	11966ab00b	[X86] Add missing mayLoad/mayStore attributes to some X86 instructions (Continue) Complete the patch committed in rL300190. Differential Revision: https://reviews.llvm.org/D32287 llvm-svn: 301393	2017-04-26 11:34:09 +00:00
Simon Dardis	70f79251bc	[mips] Rework a portion of MipsCC interface. (NFC) r299766 contained a "conditional move or jump depends on uninitialized value" fault, identified by valgrind. This occurred as MipsFastISel::finishCall(..) used CCState over MipsCCState. The latter is required for the TableGen'd calling convention logic due to reliance on pre-analyzing type information to lower call results/returns of vectors correctly. This change modifies the MipsCC AnalyzeCallResult to be useful with both the SelectionDAG and FastISel lowering logic. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D32004 llvm-svn: 301392	2017-04-26 11:10:38 +00:00
Andrew V. Tischenko	c3c6723ab5	PR31007 and PR27884 will be closed: a possibility to compile constants like 0bH is now supported in MS asm. llvm-svn: 301390	2017-04-26 09:56:59 +00:00
Ayman Musa	d9fb157845	[X86][SSE2] Fix asm string for movq (Move Quadword) instruction. Replace "mov{d\|q}" with "movq". Differential Revision: https://reviews.llvm.org/D32220 llvm-svn: 301386	2017-04-26 07:08:44 +00:00
Davide Italiano	0316f7ae7b	[AMDGPU] Garbage collect dead code. NFCI. llvm-svn: 301375	2017-04-26 01:00:52 +00:00
Vadzim Dambrouski	d91fb8c367	[MSP430] Fix PR32769: Select8 and Select16 need to have SR in Uses. If Select pseudo instruction doesn't have use SR, then CMP instructions are being marked as dead and later can be removed by MachineCSE pass. This leads to incorrect code generation. Differential Revision: https://reviews.llvm.org/D32473 llvm-svn: 301372	2017-04-26 00:33:59 +00:00
Dylan McKay	ff49a05565	[AVR] Do not kill the dest register for a pseudo instruction It caused the register to later be dead, which would trigger a verifier error. llvm-svn: 301368	2017-04-25 23:58:20 +00:00
Matt Arsenault	36c3122ecd	AMDGPU: Shift down reserved SP register like scratch wave offset llvm-svn: 301367	2017-04-25 23:40:57 +00:00
Matt Arsenault	df58e825ad	AMDGPU: Clean up VOP3NoMods pattern There is no need to copy the operands or inspect the sources. Also remove some unnecessary clamp/omod usage. llvm-svn: 301363	2017-04-25 21:17:38 +00:00
Konstantin Zhuravlyov	54ba4312a3	AMDGPU: Fix ValueKind code object metadata for images Differential Revision: https://reviews.llvm.org/D32504 llvm-svn: 301360	2017-04-25 20:38:26 +00:00
Krzysztof Parzyszek	9ebbe5bf2e	[Hexagon] Only increment debug counters if debug option is present llvm-svn: 301346	2017-04-25 18:56:14 +00:00
Simon Pilgrim	d68785803b	[SelectionDAG] Added getBuildVector(ArrayRef<SDUse>) helper. llvm-svn: 301322	2017-04-25 16:41:28 +00:00
Dylan McKay	8f515b1ef7	[AVR] Support the LDWRdPtr instruction with the same Src+Dst register llvm-svn: 301313	2017-04-25 15:09:04 +00:00
Matt Arsenault	e22184940b	AMDGPU: Slightly simplify prolog reserved register handling Rely on MachineRegisterInfo's knowledge of used physical registers. Move flat_scratch initialization earlier, so the uses are visible when making these decisions. This will make it easier to add another reserved register at the end for the stack pointer rather than handling another special case. llvm-svn: 301254	2017-04-24 21:08:32 +00:00
Krzysztof Parzyszek	c8e8e2a046	Move value type list from TargetRegisterClass to TargetRegisterInfo Differential Revision: https://reviews.llvm.org/D31937 llvm-svn: 301234	2017-04-24 19:51:12 +00:00
Krzysztof Parzyszek	98ab4c64c4	Revert r301231: Accidentally committed stale files I forgot to commit local changes before commit. llvm-svn: 301232	2017-04-24 19:48:51 +00:00
Krzysztof Parzyszek	c0197066d7	Move value type list from TargetRegisterClass to TargetRegisterInfo Differential Revision: https://reviews.llvm.org/D31937 llvm-svn: 301231	2017-04-24 19:43:45 +00:00
Matt Arsenault	0774ea267a	AMDGPU: Select scratch mubuf offsets when pointer is a constant In call sequence setups, there may not be a frame index base and the pointer is a constant offset from the frame pointer / scratch wave offset register. llvm-svn: 301230	2017-04-24 19:40:59 +00:00
Matt Arsenault	df6539f44b	AMDGPU: Set StackGrowsUp in MCAsmInfo Not sure what this does though. llvm-svn: 301229	2017-04-24 19:40:51 +00:00
Stanislav Mekhanoshin	bd5394be3d	[AMDGPU] Merge M0 initializations Merges equivalent initializations of M0 and hoists them into a common dominator block. Technically the same code can be used with any register, physical or virtual. Differential Revision: https://reviews.llvm.org/D32279 llvm-svn: 301228	2017-04-24 19:37:54 +00:00
Krzysztof Parzyszek	44e25f37ae	Move size and alignment information of regclass to TargetRegisterInfo 1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221	2017-04-24 18:55:33 +00:00
Yaxun Liu	fd23a0c095	CodeGen: Add a hook for getFenceOperandTy Currently the operand type for ATOMIC_FENCE assumes value type of a pointer in address space 0. This is fine for most targets. However for amdgcn target, the size of pointer in address space 0 depends on triple environment. For amdgiz environment, it is 64 bit but for other environment it is 32 bit. On the other hand, amdgcn target expects 32 bit fence operands independent of the target triple environment. Therefore a hook is need in target lowering for getting the fence operand type. This patch has no effect on targets other than amdgcn. Differential Revision: https://reviews.llvm.org/D32186 llvm-svn: 301215	2017-04-24 18:26:27 +00:00
Matthias Braun	f9796b76e9	X86RegisterInfo: eliminateFrameIndex: Avoid code duplication; NFC Re-Commit of r300922 and r300923 with less aggressive assert (see discussion at the end of https://reviews.llvm.org/D32205) X86RegisterInfo::eliminateFrameIndex() and X86FrameLowering::getFrameIndexReference() both had logic to compute the base register. This consolidates the code. Also use MachineInstr::isReturn instead of manually enumerating tail call instructions (return instructions were not included in the previous list because they never reference frame indexes). Differential Revision: https://reviews.llvm.org/D32206 llvm-svn: 301211	2017-04-24 18:15:00 +00:00
Matt Arsenault	1c0ae3972f	AMDGPU: Add StackPtr and FramePtr registers to MFI These will be necessary for setting up call sequences. llvm-svn: 301208	2017-04-24 18:05:16 +00:00
Matt Arsenault	3e02538a02	AMDGPU: Move trap lowering to DAG Fixes traps in any block besides the entry block, and fixes depending on a live-in physical register by using a virtual register copy. Also happens to stop emitting a nop in the case debug trap is not supported. llvm-svn: 301206	2017-04-24 17:49:13 +00:00
Nicolai Haehnle	5dea645138	AMDGPU: Move v_readlane lane select from VGPR to SGPR Summary: Fix a compiler bug when the lane select happens to end up in a VGPR. Clarify the semantic of the corresponding intrinsic to be that of the corresponding GLSL: the lane select must be uniform across a wave front, otherwise results are undefined. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32343 llvm-svn: 301197	2017-04-24 17:17:36 +00:00
Igor Breger	87aafa073f	[GlobalISel][X86] Lower FormalArgument/Ret using G_MERGE_VALUES/G_UNMERGE_VALUES. Summary: [GlobalISel][X86] Lower FormalArgument/Ret using G_MERGE_VALUES/G_UNMERGE_VALUES. Reviewers: zvi, t.p.northover, guyblank Reviewed By: t.p.northover Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32288 llvm-svn: 301194	2017-04-24 17:05:52 +00:00
Nicolai Haehnle	ef449787d8	AMDGPU: Fix crash when scheduling non-memory SMRD instructions Summary: Fixes piglit spec/arb_shader_clock/execution/* Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D32345 llvm-svn: 301191	2017-04-24 16:53:52 +00:00
Jonas Paulsson	1e8648577c	[SystemZ] Update kill-flag in splitMove(). EarlierMI needs to clear the kill flag on the first operand in case of a store. Review: Ulrich Weigand llvm-svn: 301177	2017-04-24 12:40:28 +00:00
Diana Picus	f53865daa4	[ARM] GlobalISel: Legalize s8 and s16 G_(S\|U)DIV We have to widen the operands to 32 bits and then we can either use hardware division if it is available or lower to a libcall otherwise. At the moment it is not enough to set the Legalizer action to WidenScalar, since for libcalls it won't know what to do (it won't be able to find what size to widen to, because it will find Libcall and not Legal for 32 bits). To hack around this limitation, we request Custom lowering, and as part of that we widen first and then we run another legalizeInstrStep on the widened DIV. llvm-svn: 301166	2017-04-24 09:12:19 +00:00
Sjoerd Meijer	e5b8557d5b	[Arch64AsmParser] better diagnostic for isb Instruction isb takes as an operand either 'sy' or an immediate value. This improves the diagnostic when the string is not 'sy' and adds a test case for this which was missing. This also adds tests to check invalid inputs for dsb and dmb. Differential Revision: https://reviews.llvm.org/D32227 llvm-svn: 301165	2017-04-24 08:22:20 +00:00
Diana Picus	b70e88bdec	[ARM] GlobalISel: Support G_(S\|U)DIV for s32 Add support for both targets with hardware division and without. For hardware division we have to add support throughout the pipeline (legalizer, reg bank select, instruction select). For targets without hardware division, we only need to mark it as a libcall. llvm-svn: 301164	2017-04-24 08:20:05 +00:00
Diana Picus	95a8aa93e2	[ARM] GlobalISel: Select G_CONSTANT with CImm operands When selecting a G_CONSTANT to a MOVi, we need the value to be an Imm operand. We used to just leave the G_CONSTANT operand unchanged, which works in some cases (such as the GEP offsets that we create when referring to stack slots). However, in many other places the G_CONSTANTs are created with CImm operands. This patch makes sure to handle those as well, and to error out gracefully if in the end we don't end up with an Imm operand. Thanks to Oliver Stannard for reporting this issue. llvm-svn: 301162	2017-04-24 06:30:56 +00:00
Simon Pilgrim	06d6263309	[X86][SSE] Add scheduler class support for SSE42 (PCMPGT) instructions llvm-svn: 301142	2017-04-23 21:23:27 +00:00
Renato Golin	4abfb3d741	Revert "[APInt] Fix a few places that use APInt::getRawData to operate within the normal API." This reverts commit r301105, 4, 3 and 1, as a follow up of the previous revert, which broke even more bots. For reference: Revert "[APInt] Use operator<<= where possible. NFC" Revert "[APInt] Use operator<<= instead of shl where possible. NFC" Revert "[APInt] Use ashInPlace where possible." PR32754. llvm-svn: 301111	2017-04-23 12:15:30 +00:00
Ayman Musa	137c44fe64	[X86][MPX] Add load & store instructions of bnd values to getLoadStoreRegOpcode function. This is needed for a follow up patch that generates the memory folding tables. Differential Revision: https://reviews.llvm.org/D32232 llvm-svn: 301109	2017-04-23 08:28:42 +00:00
Craig Topper	474e5de72d	[APInt] Fix a few places that use APInt::getRawData to operate within the normal API. getRawData exposes the internal type of the APInt class directly to its users. Ideally we wouldn't expose such an implementation detail. This patch fixes a few of the easy cases by using truncate, extract, or a rotate. llvm-svn: 301105	2017-04-23 06:41:11 +00:00
Craig Topper	cdd5ae6676	[APInt] Use operator<<= where possible. NFC llvm-svn: 301104	2017-04-23 05:43:02 +00:00
Craig Topper	5f68af0806	[APInt] Use operator<<= instead of shl where possible. NFC llvm-svn: 301103	2017-04-23 05:18:31 +00:00
Craig Topper	ae9672c96d	[APInt] Use ashInPlace where possible. llvm-svn: 301101	2017-04-23 03:45:59 +00:00
Daniel Sanders	2deea1878e	[globalisel][tablegen] Revise API for ComplexPattern operands to improve flexibility. Summary: Some targets need to be able to do more complex rendering than just adding an operand or two to an instruction. For example, it may need to insert an instruction to extract a subreg first, or it may need to perform an operation on the operand. In SelectionDAG, targets would create SDNode's to achieve the desired effect during the complex pattern predicate. This worked because SelectionDAG had a form of garbage collection that would take care of SDNode's that were created but not used due to a later predicate rejecting a match. This doesn't translate well to GlobalISel and the churn was wasteful. The API changes in this patch enable GlobalISel to accomplish the same thing without the waste. The API is now: InstructionSelector::OptionalComplexRendererFn selectArithImmed(MachineOperand &Root) const; where Root is the root of the match. The return value can be omitted to indicate that the predicate failed to match, or a function with the signature ComplexRendererFn can be returned. For example: return OptionalComplexRendererFn( [=](MachineInstrBuilder &MIB) { MIB.addImm(Immed).addImm(ShVal); }); adds two immediate operands to the rendered instruction. Immed and ShVal are captured from the predicate function. As an added bonus, this also reduces the amount of information we need to provide to GIComplexOperandMatcher. Depends on D31418 Reviewers: aditya_nandakumar, t.p.northover, qcolombet, rovka, ab, javed.absar Reviewed By: ab Subscribers: dberris, kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D31761 llvm-svn: 301079	2017-04-22 15:11:04 +00:00
Matthias Braun	d78597ec08	AArch64FrameLowering: Check if the ExtraCSSpill register is actually unused The code assumed that when saving an additional CSR register (ExtraCSSpill==true) we would have a free register throughout the function. This was not true if this CSR register is also used to pass values as in the swiftself case. rdar://31451816 llvm-svn: 301057	2017-04-21 22:42:08 +00:00
Hans Wennborg	9b9a5358dd	Re-commit r301040 "X86: Don't emit zero-byte functions on Windows" In addition to the original commit, tighten the condition for when to pad empty functions to COFF Windows. This avoids running into problems when targeting e.g. Win32 AMDGPU, which caused test failures when this was committed initially. llvm-svn: 301047	2017-04-21 21:48:41 +00:00
Hans Wennborg	04593000d8	Revert r301040 "X86: Don't emit zero-byte functions on Windows" This broke almost all bots. Reverting while fixing. llvm-svn: 301041	2017-04-21 21:10:37 +00:00
Hans Wennborg	cb3e810714	X86: Don't emit zero-byte functions on Windows Empty functions can lead to duplicate entries in the Guard CF Function Table of a binary due to multiple functions sharing the same RVA, causing the kernel to refuse to load that binary. We had a terrific bug due to this in Chromium. It turns out we were already doing this for Mach-O in certain situations. This patch expands the code for that in AsmPrinter::EmitFunctionBody() and renames TargetInstrInfo::getNoopForMachoTarget() to simply getNoop() since it seems it was used for not just Mach-O anyway. Differential Revision: https://reviews.llvm.org/D32330 llvm-svn: 301040	2017-04-21 20:58:12 +00:00
Tim Northover	e31cf3f824	ARM: make sure we use all entries in a vector before forming a vpaddl. Otherwise there's some mismatch, and we'll either form an illegal type or an illegal node. Thanks to Eli Friedman for pointing out the problem with my original solution. llvm-svn: 301036	2017-04-21 20:35:52 +00:00
Konstantin Zhuravlyov	f628406bbd	AMDGPU/GFX9: Enable FastFMAF32 Differential Revision: https://reviews.llvm.org/D32363 llvm-svn: 301029	2017-04-21 19:57:53 +00:00
Konstantin Zhuravlyov	3d1cc88c68	AMDGPU: Temporarily disable packed inlinable literals (v2f16, v2i16) Differential Revision: https://reviews.llvm.org/D32361 llvm-svn: 301028	2017-04-21 19:45:22 +00:00
Konstantin Zhuravlyov	88938d4e67	AMDGPU: Fix S_PACK_HH_B32_B16 - We really ought to zero out lower 16 bits Differential Revision: https://reviews.llvm.org/D32356 llvm-svn: 301026	2017-04-21 19:35:05 +00:00
Yaxun Liu	15a96b1dc8	[AMDGPU] Handle SI_MASKED_UNREACHABLE in instruction emitter SI_MASKED_UNREACHABLE does not have machine instruction encoding. It needs special handling in AMDGPUAsmPrinter::EmitInstruction like some other pseudo instructions. This patch fixes compilation failure of RadeonRays. Differential Revision: https://reviews.llvm.org/D32364 llvm-svn: 301025	2017-04-21 19:32:02 +00:00
Matthias Braun	1a9062408f	Revert "X86RegisterInfo: eliminateFrameIndex: Avoid code duplication; NFC" It seems we have on situation in a sanitizer enable bootstrap build where the return instruction has a frame index operand that does not point to a fixed object and fails the assert added here. This reverts commit r300923. This reverts commit r300922. llvm-svn: 301024	2017-04-21 19:26:45 +00:00
Konstantin Zhuravlyov	c4b18e7099	AMDGPU: Do not lower fast unsafe div for safe, f32, with fp32 denormals Differential Revision: https://reviews.llvm.org/D32085 llvm-svn: 301023	2017-04-21 19:25:33 +00:00
Akira Hatanaka	22e839f4b2	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300932 and r300930, which was causing dag-combine to loop forever. The problem was that optimizeLogicalImm was returning true even when there was no change to the immediate node (which happened when the immediate was all zeros or ones), which caused dag-combine to push and pop the same node to the work list over and over again without making any progress. This commit fixes the bug by returning false early in optimizeLogicalImm if the immediate is all zeros or ones. Also, it changes the code to compare the immediate with 0 or Mask rather than calling countPopulation. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 301019	2017-04-21 18:53:12 +00:00
Joel Jones	a7c4a52188	[AArch64] Refactor instruction selection lowering for addresses. NFCI Factor out the common code used for generating addresses into common templated functions that call overloaded versions of a new function, getTargetNode. Tested with make check-llvm with targets AArch64. Differential Revision: https://reviews.llvm.org/D32169 llvm-svn: 301005	2017-04-21 17:31:03 +00:00
Tim Northover	1061ccca8c	ARM: don't try to create an i8 -> i32 vpaddl. DAG combine was mistakenly assuming that the step-up it was looking at was always a doubling, but it can sometimes be a larger extension in which case we'd crash. llvm-svn: 301002	2017-04-21 17:21:59 +00:00
Daniel Sanders	e7b0d66080	[globalisel][tablegen] Import SelectionDAG's rule predicates and support the equivalent in GIRule. Summary: The SelectionDAG importer now imports rules with Predicate's attached via Requires, PredicateControl, etc. These predicates are implemented as bitset's to allow multiple predicates to be tested together. However, unlike the MC layer subtarget features, each target only pays for it's own predicates (e.g. AArch64 doesn't have 192 feature bits just because X86 needs a lot). Both AArch64 and X86 derive at least one predicate from the MachineFunction or Function so they must re-initialize AvailableFeatures before each function. They also declare locals in <Target>InstructionSelector so that computeAvailableFeatures() can use the code from SelectionDAG without modification. Reviewers: rovka, qcolombet, aditya_nandakumar, t.p.northover, ab Reviewed By: rovka Subscribers: aemerson, rengolin, dberris, kristof.beyls, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D31418 llvm-svn: 300993	2017-04-21 15:59:56 +00:00
Chad Rosier	428556c536	[AArch64][Falkor] Refine modeling of store-release exclusive instructions. llvm-svn: 300987	2017-04-21 14:58:32 +00:00
Joel Jones	97aaa23aec	[Mips] Document Mips Backend Relocation Principles This revision documents the combination of C++ and table-gen code that handles relocations and addresses. Thanks for Simon Dardis for the careful reviews. Differential Revision: https://reviews.llvm.org/D31628 llvm-svn: 300986	2017-04-21 14:49:27 +00:00
Chad Rosier	d631b9e500	[AArch64][Falkor] Refine resource needs of STRQ with register offset. llvm-svn: 300984	2017-04-21 14:33:13 +00:00
Daniel Sanders	419efdd55b	Revert r300964 + r300970 - [globalisel][tablegen] Import SelectionDAG's rule predicates and support the equivalent in GIRule. It's causing llvm-clang-x86_64-expensive-checks-win to fail to compile and I haven't worked out why. Reverting to make it green while I figure it out. llvm-svn: 300978	2017-04-21 14:09:20 +00:00
Chad Rosier	537defeeb5	[AArch64][Falkor] Refine loads/stores that require an extra LD pipe. llvm-svn: 300976	2017-04-21 13:55:41 +00:00
Chad Rosier	bbcc828833	[AArch64][Falkor] Fix number of microops for WriteSTIdx missed in r300892. llvm-svn: 300975	2017-04-21 13:37:01 +00:00
Chad Rosier	4f2e9e237f	[AArch64] Fix a few missed pre/post-inc in Falkor. llvm-svn: 300974	2017-04-21 13:36:57 +00:00
Diana Picus	64a33431eb	[ARM] GlobalISel: Add support for G_TRUNC Select them as copies. We only select if both the source and the destination are on the same register bank, so this shouldn't cause any trouble. llvm-svn: 300971	2017-04-21 13:16:50 +00:00
Diana Picus	f941ec0ecc	[ARM] GlobalISel: Make struct arguments fail elegantly The condition in isSupportedType didn't handle struct/array arguments properly. Fix the check and add a test to make sure we use the fallback path in this kind of situation. The test deals with some common cases where the call lowering should error out. There are still some issues here that need to be addressed (tail calls come to mind), but they can be addressed in other patches. llvm-svn: 300967	2017-04-21 11:53:01 +00:00
Daniel Sanders	279d03527e	[globalisel][tablegen] Import SelectionDAG's rule predicates and support the equivalent in GIRule. Summary: The SelectionDAG importer now imports rules with Predicate's attached via Requires, PredicateControl, etc. These predicates are implemented as bitset's to allow multiple predicates to be tested together. However, unlike the MC layer subtarget features, each target only pays for it's own predicates (e.g. AArch64 doesn't have 192 feature bits just because X86 needs a lot). Both AArch64 and X86 derive at least one predicate from the MachineFunction or Function so they must re-initialize AvailableFeatures before each function. They also declare locals in <Target>InstructionSelector so that computeAvailableFeatures() can use the code from SelectionDAG without modification. Reviewers: rovka, qcolombet, aditya_nandakumar, t.p.northover, ab Reviewed By: rovka Subscribers: aemerson, rengolin, dberris, kristof.beyls, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D31418 llvm-svn: 300964	2017-04-21 10:27:20 +00:00
Clement Courbet	41b4333066	typo llvm-svn: 300963	2017-04-21 09:21:05 +00:00
Clement Courbet	d5f6182bec	use repmovsb when optimizing forminsize llvm-svn: 300960	2017-04-21 09:20:55 +00:00
Clement Courbet	203fc17797	Rename FastString flag. llvm-svn: 300959	2017-04-21 09:20:50 +00:00
Clement Courbet	1ce3b82dea	X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copies when the subtarget has fast strings. This has two advantages: - Speed is improved. For example, on Haswell thoughput improvements increase linearly with size from 256 to 512 bytes, after which they plateau: (e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes). - Code is much smaller (no need to handle boundaries). llvm-svn: 300957	2017-04-21 09:20:39 +00:00
Clement Courbet	8177fee513	Delete dead code llvm-svn: 300952	2017-04-21 07:40:59 +00:00
Artyom Skrobov	8d9643009f	[Thumb1] The recently added tADCS and tSBCS pseudo-instructions were missing `Uses = [CPSR]` Summary: Thanks to Oliver Stannard for helping catch this. Reviewers: olista01, efriedma Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D31815 llvm-svn: 300951	2017-04-21 07:35:21 +00:00
Akira Hatanaka	78ccba6a20	Revert r300932 and r300930. It seems that r300930 was creating an infinite loop in dag-combine when compling the following file: MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c llvm-svn: 300940	2017-04-21 01:31:50 +00:00
Akira Hatanaka	e52caddae8	[AArch64] Use suffix ULL to shift a 64-bit value. llvm-svn: 300932	2017-04-21 00:35:27 +00:00
Akira Hatanaka	19077aaee0	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300913, which broke bots because I didn't fix a call to ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of TargetLoweringOpt and TargetLowering. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300930	2017-04-21 00:05:16 +00:00
Matthias Braun	9610a26251	X86RegisterInfo: eliminateFrameIndex: Avoid code duplication; NFC X86RegisterInfo::eliminateFrameIndex() and X86FrameLowering::getFrameIndexReference() both had logic to compute the base register. This consolidates the code. Also use MachineInstr::isReturn instead of manually enumerating tail call instructions (return instructions were not included in the previous list because they never reference frame indexes). Differential Revision: https://reviews.llvm.org/D32206 llvm-svn: 300923	2017-04-20 23:34:50 +00:00
Matthias Braun	63e3e8ce72	X86RegisterInfo: eliminateFrameIndex: Force SP for AfterFPPop; NFC AfterFPPop is used for tailcall/tailjump instructions. We shouldn't ever have frame-pointer/base-pointer relative addressing for those. After all the frame/base pointer should already be restored to their previous values at the return. Make this fact explicit in preparation for an upcoming refactoring. Differential Revision: https://reviews.llvm.org/D32205 llvm-svn: 300922	2017-04-20 23:34:46 +00:00
Akira Hatanaka	7b06cebe73	Revert "[AArch64] Improve code generation for logical instructions taking" This reverts r300913. This broke bots. llvm-svn: 300916	2017-04-20 23:03:30 +00:00
Akira Hatanaka	e327f09832	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300913	2017-04-20 22:47:56 +00:00
Tim Northover	100b7f6eae	AArch64: lower "fence singlethread" to a pure compiler barrier. Single-threaded fences aren't required to provide any synchronization with other processing elements so there's no need for a DMB. They should still be a barrier for compiler optimizations though. llvm-svn: 300905	2017-04-20 21:57:45 +00:00
Tim Northover	46e58354da	ARM: lower "fence singlethread" to a pure compiler barrier. Single-threaded fences aren't required to provide any synchronization with other processing elements so there's no need for a DMB. They should still be a barrier for compiler optimizations though. llvm-svn: 300904	2017-04-20 21:56:52 +00:00
Chad Rosier	4279c58ec4	[AArch64] Whitespace/ordering fixes for Falkor machine description. NFC. llvm-svn: 300893	2017-04-20 21:11:17 +00:00
Chad Rosier	a56bdbe62d	[AArch64] Refine Falkor machine description for pre/post-inc and stores. llvm-svn: 300892	2017-04-20 21:11:09 +00:00
Tim Northover	8b1240b0f0	ARM: handle post-indexed NEON ops where the offset isn't the access width. Before, we assumed that any ConstantInt offset was precisely the access width, so we could use the "[rN]!" form. ISelLowering only ever created that kind, but further simplification during combining could lead to unexpected constants and incorrect codegen. Should fix PR32658. llvm-svn: 300878	2017-04-20 19:54:02 +00:00
Chad Rosier	9f25dd56a8	[AArch64] Improve scheduling of logical operations on Falkor. llvm-svn: 300871	2017-04-20 18:50:21 +00:00
Weiming Zhao	962c5a3aec	[Thumb-1] Fix corner cases for compressed jump tables Summary: When synthesized TBB/TBH is expanded, we need to avoid the case of: BaseReg is redefined after the load of branching target. E.g.: %R2 = tLEApcrelJT <jt#1> %R1 = tLDRr %R1, %R2 ==> %R2 = tLEApcrelJT <jt#1> %R2 = tLDRspi %SP, 12 %R2 = tLDRspi %SP, 12 tBR_JTr %R1 tTBB_JT %R2, %R1 ` Reviewers: jmolloy Reviewed By: jmolloy Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D32250 llvm-svn: 300870	2017-04-20 18:37:14 +00:00
Benjamin Kramer	58dadd59d9	Fix use-after-frees on memory allocated in a Recycler. This will become asan errors once the patch lands that poisons the memory after free. The x86 change is a hack, but I don't see how to solve this properly at the moment. llvm-svn: 300867	2017-04-20 18:29:14 +00:00
Sam Clegg	90d99413ac	[WebAssembly] Add known failures for wasm object file backend Subscribers: jfb, dschuff Differential Revision: https://reviews.llvm.org/D32300 llvm-svn: 300859	2017-04-20 17:18:15 +00:00
Craig Topper	bcfd2d1789	[APInt] Rename getSignBit to getSignMask getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask. Differential Revision: https://reviews.llvm.org/D32108 llvm-svn: 300856	2017-04-20 16:56:25 +00:00
Petar Jovanovic	2b6fe3ffa6	[mips][msa] Mask vectors holding shift amounts Masked vectors which hold shift amounts when creating the following nodes: ISD::SHL, ISD::SRL or ISD::SRA. Instructions that use said nodes, which have had their arguments altered are sll, srl, sra, bneg, bclr and bset. For said instructions, the shift amount or the bit position that is specified in the corresponding vector elements will be interpreted as the shift amount/bit position modulo the size of the element in bits. The problem lies in compiling with -O2 enabled, where the instructions for formats .w and .d are not generated, but are instead optimized away. In this case, having shift amounts that are either negative or greater than the element bit size results in generation of incorrect results when constant folding. We remedy this by masking the operands for the nodes mentioned above before actually creating them, so that the final result is correct before placed into the constant pool. Patch by Stefan Maksimovic. Differential Revision: https://reviews.llvm.org/D31331 llvm-svn: 300839	2017-04-20 13:26:46 +00:00
John Brawn	66719f63d0	[ARM] Fix handling of mapping symbols when changing sections ChangeSection incorrectly registers LastEMSInfo as belonging to the previous section, not the current section. This happens to work when changing sections using .section, as the previous section is set to the current section before the call to ChangeSection, but not when using .popsection. Differential Revision: https://reviews.llvm.org/D32225 llvm-svn: 300831	2017-04-20 10:18:13 +00:00
John Brawn	5ca5daa6b9	[AArch64] Fix handling of zero immediate in fmov instructions Currently fmov #0 with a vector destination is handle incorrectly and results in fmov #-1.9375 being emitted but should instead give an error. This is due to the way we cope with fmov #0 with a scalar destination being an alias of fmov zr, so fix this by actually doing it through an alias. Differential Revision: https://reviews.llvm.org/D31949 llvm-svn: 300830	2017-04-20 10:13:54 +00:00
John Brawn	dcf037a6f0	[AArch64] Fix handling of integer fp immediates When an integer is used as an fp immediate we're failing to check the return value of getFP64Imm, so invalid values are silently permitted. Fix this by merging together the integer and real handling. llvm-svn: 300828	2017-04-20 10:10:10 +00:00
Diana Picus	7c6dee9f16	[ARM] Rename HW div feature to HW div Thumb. NFCI. The hardware div feature refers only to Thumb, but because of its name it is tempting to use it to check for hardware division in general, which may cause problems in ARM mode. See https://reviews.llvm.org/D32005. This patch adds "Thumb" to its name, to make its scope clear. One notable place where I haven't made the change is in the feature flag (used with -mattr), which is still hwdiv. Changing it would also require changes in a lot of tests, including clang tests, and it doesn't seem like it's worth the effort. Differential Revision: https://reviews.llvm.org/D32160 llvm-svn: 300827	2017-04-20 09:38:25 +00:00
Kannan Narayanan	2fb5960121	Revert earlier change. ds permute operations affect lgkm counter. Differential Revision: https://reviews.llvm.org/D32254 llvm-svn: 300791	2017-04-19 23:39:19 +00:00
Matthias Braun	372ee59766	X86FrameLowering: Fix getFrameIndexReference() for 'fixed' objects Debug information is calculated with getFrameIndexReference() which was missing some logic for the fixed object cases (= parameters on the stack). rdar://24557797 Differential Revision: https://reviews.llvm.org/D32204 llvm-svn: 300781	2017-04-19 23:10:43 +00:00
Matthias Braun	8aaa368d00	ARMFrameLowering: Reserve emergency spill slot for large arguments Re-commit after revert in r300668. Changed getMaxFPOffset() to a more conservative heuristic instead of trying to be clever and missing for some exotic calling conventions. We need to reserve an emergency spill slot in cases with large argument types that could overflow immediate offsets for FP relative address calculations. rdar://31317893 Differential Revision: https://reviews.llvm.org/D31643 llvm-svn: 300761	2017-04-19 21:11:44 +00:00
Matt Arsenault	4a48623e4f	AMDGPU: Custom lower illegal small select types Promote them to i32 vectors to avoid unpacking and re-packing the vectors. llvm-svn: 300754	2017-04-19 20:53:07 +00:00
Eli Friedman	70ad2751d5	[ARM] Remove redundant computeKnownBits helper. Move the BFI logic to computeKnownBitsForTargetNode, and delete the redundant CMOV logic. This is intended as a cleanup, but it's probably possible to construct a case where moving the BFI logic allows more combines. Differential Revision: https://reviews.llvm.org/D31795 llvm-svn: 300752	2017-04-19 20:50:57 +00:00
Aditya Nandakumar	75ad9ccbfa	[GISEL]: Move getConstantVReg to Utils NFCI llvm-svn: 300751	2017-04-19 20:48:50 +00:00
Eli Friedman	f281d490cc	[ARM] Use TableGen patterns to select vtbl. NFC. Differential Revision: https://reviews.llvm.org/D32103 llvm-svn: 300749	2017-04-19 20:39:39 +00:00
Dehao Chen	58601674d2	PR32710: Disable using PMADDWD for unsigned short. Summary: PMADDWD can only handle signed short. Reviewers: mkuper, wmi Reviewed By: mkuper Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D32236 llvm-svn: 300737	2017-04-19 19:50:34 +00:00
Matt Arsenault	021a218dd2	AMDGPU: Don't emit amd_kernel_code_t for callable functions This is inserted directly in the text section. The relocation for the function ends up resolving to the beginning of the amd_kernel_code_t header rather than the actual function entry point. Also skip some of the comments for initialization that only makes sense for kernels. llvm-svn: 300736	2017-04-19 19:38:10 +00:00
Tim Northover	ff168c68dc	ARM: TLS calling convention doesn't preserve r9 or r12 on Darwin. llvm-svn: 300726	2017-04-19 18:07:54 +00:00
Matt Arsenault	6cb7b8a42f	AMDGPU: Don't align callable functions to 256 llvm-svn: 300720	2017-04-19 17:42:39 +00:00
Matt Arsenault	4c1ecded63	AMDGPU: Change DivergenceAnalysis for function arguments Stop assuming all functions are kernels. llvm-svn: 300719	2017-04-19 17:42:34 +00:00
Krzysztof Parzyszek	333b2bf2ed	[Hexagon] Generate proper offset in opt-addr-mode Also, make a few changes to allow using the pass in .mir testcases. Among other things, change the abbreviation from opt-amode to amode-opt, because otherwise lit would expand the "opt" part to the full path to the opt binary. llvm-svn: 300707	2017-04-19 15:15:51 +00:00
Krzysztof Parzyszek	634f57e0bb	[Hexagon] Remove RDefMap, use Liveness:getNearestAliasedRef instead llvm-svn: 300706	2017-04-19 15:14:30 +00:00
Krzysztof Parzyszek	0de74f315d	[RDF] Switch NodeList to SmallVector from std::vector The list has a single element 75+% of the time, reservation of 4 elements is sufficient in 95% of cases. llvm-svn: 300705	2017-04-19 15:12:44 +00:00
Krzysztof Parzyszek	7c69a3b490	[RDF] Use faster version of findBlock llvm-svn: 300704	2017-04-19 15:11:23 +00:00
Krzysztof Parzyszek	6aa3a3f00b	[RDF] Cache register units for reg masks instead of recalculating them llvm-svn: 300702	2017-04-19 15:10:09 +00:00
Krzysztof Parzyszek	5bfaf56ee5	[Hexagon] Cache reached blocks in bit tracker instead of scanning list llvm-svn: 300701	2017-04-19 15:08:31 +00:00
Igor Breger	4fdf1e489c	[GlobalIsel][X86] support G_TRUNC selection. Summary: [GlobalIsel][X86] support G_TRUNC selection. Add regbank-select and legalizer tests. Currently legalization of trunc i64 on 32bit platform not supported. Reviewers: ab, zvi, rovka Reviewed By: zvi Subscribers: dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D32115 llvm-svn: 300678	2017-04-19 11:34:59 +00:00
Renato Golin	742aed8683	Revert "ARMFrameLowering: Reserve emergency spill slot for large arguments" This reverts commit r300639, as it broke self-hosting on ARM. PR32709. llvm-svn: 300668	2017-04-19 09:02:52 +00:00
Diana Picus	49472ff1cf	[ARM] GlobalISel: Add support for G_MUL Support G_MUL, very similar to G_ADD and G_SUB. The only difference is in the instruction selector, where we have to select either MUL or MULv5 depending on the target. llvm-svn: 300665	2017-04-19 07:29:46 +00:00
Kristof Beyls	0f36e68f62	[GlobalISel] Support vector-of-pointers in LLT This fixes PR32471. As comment 10 on that bug report highlights (https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a few different defendable design tradeoffs that could be made, including not representing pointers at all in LLT. I decided to go for representing vector-of-pointer as a concept in LLT, while keeping the size of the LLT type 64 bits (this is an increase from 48 bits before). My rationale for keeping pointers explicit is that on some targets probably it's very handy to have the distinction between pointer and non-pointer (e.g. 68K has a different register bank for pointers IIRC). If we keep a scalar pointer, it probably is easiest to also have a vector-of-pointers to keep LLT relatively conceptually clean and orthogonal, while we don't have a very strong reason to break that orthogonality. Once we gain more experience on the use of LLT, we can of course reconsider this direction. Rejecting vector-of-pointer types in the IRTranslator is also an option to avoid the crash reported in PR32471, but that is only a very short-term solution; also needs quite a bit of code tweaks in places, and is probably fragile. Therefore I didn't consider this the best option. llvm-svn: 300664	2017-04-19 07:23:57 +00:00
Serge Pavlov	5943a96d81	ARM: Use methods to access data stored with frame instructions In r300196 several methods were added to TarfetInstrInfo to access data stored with call frame setup/destroy instructions. This change replaces calls to getOperand with calls to such special methods in ARM target. Differential Revision: https://reviews.llvm.org/D32127 llvm-svn: 300655	2017-04-19 03:12:05 +00:00
Leslie Zhai	b86e9a1c14	[AVR] Migrate to new MCAsmInfo CodePointerSize Reviewers: dylanmckay, rengolin, kzhuravl, jroelofs Reviewed By: kzhuravl, jroelofs Subscribers: kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D32154 llvm-svn: 300641	2017-04-19 01:20:43 +00:00
Matthias Braun	661d3d4b00	ARMFrameLowering: Reserve emergency spill slot for large arguments We need to reserve an emergency spill slot in cases with large argument types that could overflow immediate offsets for FP relative address calculations. rdar://31317893 Differential Revision: https://reviews.llvm.org/D31643 llvm-svn: 300639	2017-04-19 01:16:07 +00:00
Dylan McKay	eb24b850c5	[AVR] Fix the build 'PointerSize' was renamed to 'CodePointerSize'. llvm-svn: 300629	2017-04-18 23:53:10 +00:00
Sanjoy Das	f09c1e346e	Add a getPointerOperandType() helper to LoadInst and StoreInst; NFC I will use this in a later change. llvm-svn: 300613	2017-04-18 22:00:54 +00:00
Matt Arsenault	3138075dd4	DAG: Make mayBeEmittedAsTailCall parameter const llvm-svn: 300603	2017-04-18 21:16:46 +00:00
Matt Arsenault	aa31dce3c5	Fix typo llvm-svn: 300597	2017-04-18 20:59:46 +00:00
Matt Arsenault	161e2b4223	AMDGPU: Make MFI fields private llvm-svn: 300596	2017-04-18 20:59:40 +00:00
Simon Pilgrim	e8ad1da4e2	[X86] Use for-range loop. NFCI. llvm-svn: 300567	2017-04-18 17:18:54 +00:00
Craig Topper	fc947bcfba	[APInt] Use lshrInPlace to replace lshr where possible This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result. This adds an lshrInPlace(const APInt &) version as well. Differential Revision: https://reviews.llvm.org/D32155 llvm-svn: 300566	2017-04-18 17:14:21 +00:00
Oliver Stannard	7ad2e8aae1	[ARM] Add hardware build attributes in assembler In the assembler, we should emit build attributes based on the target selected with command-line options. This matches the GNU assembler's behaviour. We only do this for build attributes which describe the hardware that is expected to be available, not the ones that describe ABI compatibility. This is done by moving some of the attribute emission code to ARMTargetStreamer, so that it can be shared between the assembly and code-generation code paths. Since the assembler only creates a MCSubtargetInfo, not an ARMSubtarget, the code had to be changed to check raw features, and not use the convenience functions in ARMSubtarget. If different attributes are later specified using the .eabi_attribute directive, then they will take precedence, as happens when the same .eabi_attribute is specified twice. This must be enabled by an option, because we don't want to do this when parsing inline assembly. The attributes would match the ones emitted at the start of the file, so wouldn't actually change the emitted object file, but the extra directives would be added to every inline assembly block when emitting assembly, which we'd like to avoid. The majority of the changes in the build-attributes.ll test are just re-ordering the directives, because the hardware attributes are now emitted before the ABI ones. However, I did fix one bug which I spotted: Tag_CPU_arch_profile was not being emitted for v6M. Differential revision: https://reviews.llvm.org/D31812 llvm-svn: 300547	2017-04-18 12:52:35 +00:00
Diana Picus	a3a0cccb2c	[ARM] GlobalISel: Add support for G_SUB Support G_SUB throughout the GlobalISel pipeline. It is exactly the same as G_ADD, nothing fancy. llvm-svn: 300546	2017-04-18 12:35:28 +00:00

... 8 9 10 11 12 ...

43055 Commits