llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	e9af89a78b	[X86] Don't custom widen (v2i32 (setcc v2f32)) when -x86-experimental-vector-widening-legalization is in effect. We aren't doing anything than what the generic legalizer will do so just let it do it. llvm-svn: 341172	2018-08-31 07:05:37 +00:00
Matt Arsenault	988df63525	AMDGPU: Stop forcing internalize at -O0 This doesn't really matter if clang is always emitting the visibility as hidden by default. llvm-svn: 341168	2018-08-31 06:02:36 +00:00
Matt Arsenault	0da6350dc8	AMDGPU: Remove remnants of old address space mapping llvm-svn: 341165	2018-08-31 05:49:54 +00:00
Krzysztof Parzyszek	d51f7b3b43	[Hexagon] Check validity of register class when generating bitsplit llvm-svn: 341137	2018-08-30 22:26:43 +00:00
Eli Friedman	d5d0a4d27f	[ARM] Enable GEP offset splitting for 32-bit ARM. It has essentially the same benefit it has on 64-bit ARM: it substantially reduces the number of constants used by large GEP operations. Seems to be generally helpful across a few different codebases I've tried. Differential Revision: https://reviews.llvm.org/D51462 llvm-svn: 341136	2018-08-30 22:18:27 +00:00
Thomas Lively	abf6bdcb59	[WebAssembly] Update utility functions with SIMD types Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51516 llvm-svn: 341131	2018-08-30 22:10:43 +00:00
Thomas Lively	80725808a3	[WebAssembly] Vector conversions Summary: Lowers away bitconverts between vector types. This CL depends on D51383. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51498 llvm-svn: 341128	2018-08-30 21:43:51 +00:00
Thomas Lively	d183d8c772	[WebAssembly] SIMD loads and stores Summary: Reuse the patterns from WebAssemblyInstrMemory.td. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51383 llvm-svn: 341127	2018-08-30 21:36:48 +00:00
Ana Pazos	6b34051b33	[RISCV] Fixed SmallVector.h Assertion `idx < size()' Summary: RISCVAsmParser needs to handle the case the error message is of specific type, other than the generic Match_InvalidOperand, and the corresponding operand is missing. This bug was uncovered by a LLVM MC Assembler Protocol Buffer Fuzzer for the RISC-V assembly language. Reviewers: asb Reviewed By: asb Subscribers: llvm-commits, jocewei, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX Differential Revision: https://reviews.llvm.org/D50790 llvm-svn: 341104	2018-08-30 19:43:19 +00:00
Craig Topper	1a8c99e670	[X86] Weaken an overly aggressive assert. This assert tried to check that AND constants are only on the RHS. But its possible for both operands to be constants if one is opaque which will prevent the AND from being constant folded. Fixes PR38771 llvm-svn: 341102	2018-08-30 19:35:38 +00:00
Evandro Menezes	64ed9a7fe8	[ARM] Adjust the feature set for Exynos Enable `FeatureUseAA` for all Exynos processors. llvm-svn: 341101	2018-08-30 19:22:00 +00:00
Wouter van Oortmerssen	a733d08db2	[WebAssembly] Made disassembler only use stack instructions. Summary: Now uses the StackBased bit from the tablegen defs to identify stack instructions (and ignore register based or non-wasm instructions). Also changed how we store operands, since we now have up to 16 of them per instruction. To not cause static data bloat, these are compressed into a tiny table. + a few other cleanups. Tested: - MCTest - llvm-lit -v `find test -name WebAssembly` Reviewers: dschuff, jgravelle-google, sunfish, tlively Subscribers: sbc100, aheejin, llvm-commits Differential Revision: https://reviews.llvm.org/D51320 llvm-svn: 341081	2018-08-30 15:40:53 +00:00
Alexander Ivchenko	af96112ec6	Make TargetInstrInfo::isCopyInstr return true for regular COPY-instructions ..Move all target-dependent checks into new isCopyInstrImpl method. This change allows us to treat MoveReg-type instructions and generic COPY instruction in the same way Differential Revision: https://reviews.llvm.org/D49913 llvm-svn: 341072	2018-08-30 14:32:47 +00:00
Nicolai Haehnle	35617ed4cb	[NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis Summary: This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433). The purpose of this patch is to free up the name DivergenceAnalysis for the new generic implementation. The generic implementation class will be shared by specialized divergence analysis classes. Patch by: Simon Moll Reviewed By: nhaehnle Subscribers: jvesely, jholewinski, arsenm, nhaehnle, mgorny, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50434 Change-Id: Ie8146b11be2c50d5312f30e11c7a3036a15b48cb llvm-svn: 341071	2018-08-30 14:21:36 +00:00
Daniel Cederman	8f0bf6c19a	[Sparc] Use ANDN instead of AND if constant can be encoded more efficiently Summary: In the case of (and reg, constant) or (or reg, constant), it can be beneficial to use a ANDNrr/ORNrr instruction instead of ANDrr/ORrr, if the complement of the constant can be encoded using a single SETHI instruction instead of a SETHI/ORri pair. If the constant has more than one use, it is probably better to keep it in its original form. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D50964 llvm-svn: 341069	2018-08-30 14:05:26 +00:00
Alexander Timofeev	201f892b3b	[AMDGPU] Preliminary patch for divergence driven instruction selection. Operands Folding 1. Reviewers: rampitec Differential revision: https://reviews/llvm/org/D51316 llvm-svn: 341068	2018-08-30 13:55:04 +00:00
Ties Stuij	9c16d809d2	[CodeGen] emit inline asm clobber list warnings for reserved (cont) Summary: This is a continuation of https://reviews.llvm.org/D49727 Below the original text, current changes in the comments: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } Compiled for thumb, this gives: $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @APP mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ Reviewers: efriedma, olista01, javed.absar Reviewed By: efriedma Subscribers: eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D51165 llvm-svn: 341062	2018-08-30 12:52:35 +00:00
David Green	1f203bcd75	[AArch64] Optimise load(adr address) to ldr address Providing that the load is known to be 4 byte aligned, we can optimise a ldr(adr address) to just ldr address. Differential Revision: https://reviews.llvm.org/D51030 llvm-svn: 341058	2018-08-30 11:55:16 +00:00
Florian Hahn	521dc4dda4	Fix "Q" and "R" inline assembly template modifiers for big-endian Arm Consider the endianness of the target when printing register names. This is in line with the documentation at http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers Patch by Jackson Woodruff <jackson.woodruff@arm.com> Reviewers: t.p.northover, echristo, javed.absar, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D49778 llvm-svn: 341052	2018-08-30 10:28:23 +00:00
Andrew V. Tischenko	62f7a3207b	[X86] Improved sched model for X86 CMPXCHG* instructions. Differential Revision: https://reviews.llvm.org/D50070 llvm-svn: 341024	2018-08-30 06:26:00 +00:00
Craig Topper	b7b353be60	[X86] Make Feature64Bit useful We now only add +64bit to the CPU string for "generic" CPU. All other CPU names are assumed to have the feature flag already set if they support 64-bit. I've remove the implies from CMPXCHG8 so that Feature64Bit only comes in via CPUs or user passing -mattr=+64bit. I've changed the assert to a report_fatal_error so it's not lost in Release builds. The test updates are to fix things that tripped the new error. Differential Revision: https://reviews.llvm.org/D51231 llvm-svn: 341022	2018-08-30 06:01:05 +00:00
Sam Clegg	88599bf6f4	[WebAssembly] Be a little more conservative in WebAssemblyFixFunctionBitcasts We don't have enough information to know if struct types being bitcast will cause validation failures or not, so be conservative and allow such cases to persist (fot now). Fixes: https://bugs.llvm.org/show_bug.cgi?id=38711 Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51460 llvm-svn: 341010	2018-08-30 01:01:30 +00:00
Marek Olsak	3fc2079cf4	AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes Summary: This fixes GPU hangs with OpenGL bindless handle arithmetic. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51203 llvm-svn: 340959	2018-08-29 20:03:00 +00:00
Martin Storsjo	489993db94	[MinGW] [X86] Add stubs for references to data variables that might end up imported from a dll Variables declared with the dllimport attribute are accessed via a stub variable named __imp_<var>. In MinGW configurations, variables that aren't declared with a dllimport attribute might still end up imported from another DLL with runtime pseudo relocs. For x86_64, this avoids the risk that the target is out of range for a 32 bit PC relative reference, in case the target DLL is loaded further than 4 GB from the reference. It also avoids having to make the text section writable at runtime when doing the runtime fixups, which makes it worthwhile to do for i386 as well. Add stub variables for all dso local data references where a definition of the variable isn't visible within the module, since the DLL data autoimporting might make them imported even though they are marked as dso local within LLVM. Don't do this for variables that actually are defined within the same module, since we then know for sure that it actually is dso local. Don't do this for references to functions, since there's no need for runtime pseudo relocations for autoimporting them; if a function from a different DLL is called without the appropriate dllimport attribute, the call just gets routed via a thunk instead. GCC does something similar since 4.9 (when compiling with -mcmodel=medium or large; from that version, medium is the default code model for x86_64 mingw), but only for x86_64. Differential Revision: https://reviews.llvm.org/D51288 llvm-svn: 340942	2018-08-29 17:28:34 +00:00
Farhana Aleen	9250c92d0e	[AMDGPU] Match udot4 pattern. Summary: D.u32 = S0.u8[0] * S1.u8[0] + S0.u8[1] * S1.u8[1] + S0.u8[2] * S1.u8[2] + S0.u8[3] * S1.u8[3] + S2.u32 Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D50921 llvm-svn: 340936	2018-08-29 16:31:18 +00:00
Simon Atanasyan	dc7f04bcea	[mips] Fix microMIPS unconditional branch offset handling MipsSEInstrInfo class defines for internal purpose unconditional branches as Mips::B nad Mips:J even in case of microMIPS code generation. Under some conditions that leads to the bug - for rather long branch which fits to Mips jump instruction offset size, but does not fit to microMIPS jump offset size, we generate 'short' branch and later show an error 'out of range PC16 fixup' after check in the isBranchOffsetInRange routine. Differential revision: https://reviews.llvm.org/D50615 llvm-svn: 340932	2018-08-29 14:54:01 +00:00
Simon Atanasyan	a7999216f8	[mips] Involves microMIPS's jump in the analyzable branch set Involves microMIPS's jump in the analyzable branch set to reduce some code patterns. Differential revision: https://reviews.llvm.org/D50613 llvm-svn: 340931	2018-08-29 14:53:55 +00:00
Vladimir Stefanovic	0ef60da858	[mips] Prevent shrink-wrap for BuildPairF64, ExtractElementF64 when they use $sp For a certain combination of options, BuildPairF64_{64}, ExtractElementF64{_64} may be expanded into instructions using stack. Add implicit operand $sp for such cases so that ShrinkWrapping doesn't move prologue setup below them. Fixes MultiSource/Benchmarks/MallocBench/cfrac for '--target=mips-img-linux-gnu -mcpu=mips32r6 -mfpxx -mnan=2008' and '--target=mips-img-linux-gnu -mcpu=mips32r6 -mfp64 -mnan=2008 -mno-odd-spreg'. Differential Revision: https://reviews.llvm.org/D50986 llvm-svn: 340927	2018-08-29 14:07:14 +00:00
Aleksandar Beserminji	f8f00e5065	[mips] Add missing instructions Add pll.ps, plu.ps, cvt.s.pu, cvt.s.pl, cvt.ps instructions for FP64. Differential Revision: https://reviews.llvm.org/D50437 llvm-svn: 340920	2018-08-29 11:35:03 +00:00
Simon Pilgrim	6b9bf7ecbc	[X86][AVX] Prefer VPBLENDW+VPBLENDD to VPBLENDVB for v16i16 blend shuffles Noticed while looking at D49562 codegen - we can avoid a large constant mask load and a slow VPBLENDVB select op by using VPBLENDW+VPBLENDD instead. TODO: As discussed on the patch, we should investigate adding VPBLENDVB handling to target shuffle combining as well, that will allow us to extend this to VPBLENDW+VPBLENDW+VPBLENDD. Differential Revision: https://reviews.llvm.org/D50074 llvm-svn: 340913	2018-08-29 10:51:08 +00:00
Nicolai Haehnle	283b995097	AMDGPU: Fix getInstSizeInBytes Summary: Add some optional code to validate getInstSizeInBytes for emitted instructions. This flushed out some issues which are fixed by this patch: - Streamline getInstSizeInBytes - Properly define the VI readlane/writelane instruction as VOP3 - Fix the inline constant determination. Specifically, this change fixes an issue where a 32-bit value of 0xffffffff was recorded as unsigned. This is equal to -1 when restricting to a 32-bit comparison, and an inline constant can be used. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D50629 Change-Id: Id87c3b7975839da0de8156a124b0ce98c5fb47f2 llvm-svn: 340903	2018-08-29 07:46:09 +00:00
Craig Topper	9401fd0ed2	[X86] Add intrinsics for KADD instructions These are intrinsics for supporting kadd builtins in clang. These builtins are already in gcc to implement intrinsics from icc. Though they are missing from the Intel Intrinsics Guide. This instruction adds two mask registers together as if they were scalar rather than a vXi1. We might be able to get away with a bitcast to scalar and a normal add instruction, but that would require DAG combine smarts in the backend to recoqnize add+bitcast. For now I'd prefer to go with the easiest implementation so we can get these builtins in to clang with good codegen. Differential Revision: https://reviews.llvm.org/D51370 llvm-svn: 340869	2018-08-28 19:22:55 +00:00
Fangrui Song	9cca227d3e	[AMDGPU] Fix -Wunused-variable when -DLLVM_ENABLE_ASSERTIONS=off llvm-svn: 340868	2018-08-28 19:19:03 +00:00
Matt Arsenault	755f41f3a2	AMDGPU: Don't delete instructions if S_ENDPGM has implicit uses This can leave behind the uses with the defs removed. Since this should only really happen in tests, it's not worth the effort of trying to handle this. llvm-svn: 340866	2018-08-28 18:55:55 +00:00
Thomas Lively	adb6da10b8	[WebAssembly][NFC] Document stackifier tablegen backend Summary: Add comments to help readers avoid having to read tablegen backends to understand the code. Also remove unecessary breaks from the output. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51371 llvm-svn: 340864	2018-08-28 18:49:47 +00:00
Matt Arsenault	44a8a756e2	AMDGPU: Force shrinking of add/sub even if the carry is used The original motivating example uses a 64-bit add, so the carry is used. Insert a copy from VCC. This may allow shrinking of the used carry instruction. At worst, we are replacing a mov to materialize the constant with a copy of vcc. llvm-svn: 340862	2018-08-28 18:44:16 +00:00
Matt Arsenault	de6c421cc8	AMDGPU: Shrink insts to fold immediates This needs to be done in the SSA fold operands pass to be effective, so there is a bit of overlap with SIShrinkInstructions but I don't think this is practically avoidable. llvm-svn: 340859	2018-08-28 18:34:24 +00:00
Thomas Lively	995ad61f23	[WebAssembly] v128.not Implementation and tests. llvm-svn: 340857	2018-08-28 18:31:15 +00:00
Matt Arsenault	35b1902bce	AMDGPU: Move canShrink into TII llvm-svn: 340855	2018-08-28 18:22:34 +00:00
Heejin Ahn	56e79dd048	[WebAssembly] Use getCalleeOpNo utility function (NFC) Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51366 llvm-svn: 340848	2018-08-28 17:49:39 +00:00
Craig Topper	c73095e264	[X86] Mark the FUCOMI instructions as requiring CMOV to be enabled. NFCI These instructions were added on the PentiumPro along with CMOV. This was already comprehended by the lowering process which should emit an alternate sequence using FCOM and FNSTW. This just makes it an explicit error if that doesn't work for some reason. llvm-svn: 340844	2018-08-28 17:17:13 +00:00
Ryan Taylor	1f334d0062	[AMDGPU] Add support for a16 modifiear for gfx9 Summary: Adding support for a16 for gfx9. A16 bit replaces r128 bit for gfx9. Change-Id: Ie8b881e4e6d2f023fb5e0150420893513e5f4841 Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50575 llvm-svn: 340831	2018-08-28 15:07:30 +00:00
Simon Pilgrim	af98587095	[X86][SSE] Improve variable scalar shift of vXi8 vectors (PR34694) This patch creates the shift mask and actual shift using the vXi16 vector shift ops. Differential Revision: https://reviews.llvm.org/D51263 llvm-svn: 340813	2018-08-28 10:37:29 +00:00
Simon Pilgrim	f119e27d80	[X86][SSE] Avoid vector extraction/insertion for non-constant uniform shifts As discussed on D51263, we're better off using byte shifts to clear the upper bits on pre-SSE41 hardware. llvm-svn: 340810	2018-08-28 10:14:09 +00:00
Craig Topper	c1436db753	[X86] Fix some comments to refer to KORTEST not KTEST. NFC KTEST is a different instruction. All of this code uses KORTEST. llvm-svn: 340799	2018-08-28 06:39:35 +00:00
Kit Barton	7c80f98b69	[PPC] Remove Darwin support from POWER backend. This patch issues an error message if Darwin ABI is attempted with the PPC backend. It also cleans up existing test cases, either converting the test to use an alternative triple or removing the test if the coverage is no longer needed. Updated Tests ------------- The majority of test cases were updated to use a different triple that does not include the Darwin ABI. Many tests were also updated to use FileCheck, in place of grep. Deleted Tests ------------- llvm/test/tools/dsymutil/PowerPC/sibling.test was originally added to test specific functionality of dsymutil using an object file created with an old version of llvm-gcc for a Powerbook G4. After a discussion with @JDevlieghere he suggested removing the test. llvm/test/CodeGen/PowerPC/combine_loads_from_build_pair.ll was converted from a PPC test to a SystemZ test, as the behavior is also reproducible there. All other tests that were deleted were specific to the darwin/ppc ABI and no longer necessary. Phabricator Review: https://reviews.llvm.org/D50988 llvm-svn: 340795	2018-08-28 01:18:29 +00:00
Thomas Lively	211874d2f3	[WebAssembly] TableGen backend for stackifying instructions Summary: The new stackification backend generates the giant switch statement used to translate instructions to their stackified forms. I did this because it was more interesting than adding all the different vector versions of the various SIMD instructions to the switch statment manually. Reviewers: aardappel, aheejin, dschuff Subscribers: mgorny, sbc100, jgravelle-google, sunfish, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D51318 llvm-svn: 340781	2018-08-27 22:02:09 +00:00
Sean Fertile	a2f095f1a3	[PowerPC][MC] Support expressions in getMemRIX16Encoding. Loosens an assert in getMemRIX16Encoding that restricts DQ-form instructions to using an immediate, so that we can assemble instructions like lxv/stxv where the offset is an expression. Differential Revision: https://reviews.llvm.org/D51122 llvm-svn: 340761	2018-08-27 17:37:43 +00:00
Benjamin Kramer	759e7d9819	[NVPTX] Implement isLegalToVectorizeLoadChain This lets LSV nicely split up underaligned chains. Differential Revision: https://reviews.llvm.org/D51306 llvm-svn: 340760	2018-08-27 17:29:43 +00:00
Craig Topper	4be11c0585	[X86] When lowering v32i8 MULHS/MULHU, shuffle after the PACKUS rather than before. We're using a 256-bit PACKUS to do the truncation, but that instruction operates on 128-bit lanes. So previously we shuffled first to rearrange the lanes. But that requires 2 shuffles. Instead we can shuffle after the PACKUS using a single VPERMQ. This matches what our normal LowerTRUNCATE code does when it uses PACKUS. Differential Revision: https://reviews.llvm.org/D51284 llvm-svn: 340757	2018-08-27 17:20:41 +00:00
Craig Topper	fff90377fd	[X86] Add support for matching paddus patterns where one of the vectors is a constant. InstCombine mucks these up a bit. So we need to do some additional pattern matching to fix it. There are a still a few special cases not handled, but this covers the general case. Differential Revision: https://reviews.llvm.org/D50952 llvm-svn: 340756	2018-08-27 17:20:38 +00:00
Wouter van Oortmerssen	8a9cb242fb	[WebAssembly] Added default stack-only instruction mode for MC. Summary: Made it convert from register to stack based instructions, and removed the registers. Fixes to related code that was expecting register based instructions. Added the correct testing flag to all tests, depending on what the format they were expecting so far. Translated one test to stack format as example: reg-stackify-stack.ll tested: llvm-lit -v `find test -name WebAssembly` unittests/MC/* Reviewers: dschuff, sunfish Subscribers: sbc100, jgravelle-google, eraman, aheejin, llvm-commits, jfb Differential Revision: https://reviews.llvm.org/D51241 llvm-svn: 340750	2018-08-27 15:45:51 +00:00
Nico Weber	e75fd1b184	fix comment typo llvm-svn: 340744	2018-08-27 14:25:22 +00:00
Nemanja Ivanovic	5d06f17b8a	[PowerPC] Revert commit r339779 This commit has caused failures in some internal benchmarks. Temporarily reverting this patch until the issue can be diagnosed and fixed. llvm-svn: 340740	2018-08-27 13:20:42 +00:00
Daniel Cederman	db474c12e9	[Sparc] Avoid writing outside array in applyFixup Summary: If an object file ends with a relocation that is smaller than 4 bytes we will write outside the Data array and trigger an "Invalid index" assertion. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D50971 llvm-svn: 340736	2018-08-27 11:43:59 +00:00
Nemanja Ivanovic	f2588a28a8	[PowerPC] Recommit r340016 after fixing the reported issue The internal benchmark failure reported by Google was due to a missing check for the result type for the sign-extend and shift DAG. This commit adds the check and re-commits the patch. llvm-svn: 340734	2018-08-27 11:20:27 +00:00
Daniel Cederman	2739596063	[Sparc] Add support for the cycle counter available in GR740 Summary: The GR740 provides an up cycle counter in the registers ASR22 and ASR23. As these registers can not be read together atomically we only use the value of ASR23 for llvm.readcyclecounter(). The ASR23 register holds the 32 LSBs of the up-counter. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: jfb, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48638 llvm-svn: 340733	2018-08-27 11:11:47 +00:00
Daniel Cederman	92dadc0bca	[Sparc] Custom bitcast between f64 and v2i32 Summary: Currently bitcasting constants from f64 to v2i32 is done by storing the value to the stack and then loading it again. This is not necessary, but seems to happen because v2i32 is a valid type for Sparc V8. If it had not been legal, we would have gotten help from the type legalizer. This patch tries to do the same work as the legalizer would have done by bitcasting the floating point constant and splitting the value up into a vector of two i32 values. Reviewers: venkatra, jyknight Reviewed By: jyknight Subscribers: glaubitz, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D49219 llvm-svn: 340723	2018-08-27 07:14:53 +00:00
Roger Ferrer Ibanez	fe28217048	[RISCV] atomic_store_nn have a different layout to regular store We cannot directy reuse the patterns of StPat because for some reason the store DAG node and the atomic_store_nn DAG nodes put the ptr and the value in different positions. Currently we attempt to store the address to an address formed by the value. Differential Revision: https://reviews.llvm.org/D51217 llvm-svn: 340722	2018-08-27 07:08:18 +00:00
Craig Topper	73ed2a2a6b	[X86] Cleanup the LowerMULH code by hoisting some commonalities between the vXi32 and vXi8 handling. NFCI vXi32 support was recently moved from LowerMUL_LOHI to LowerMULH. This commit shares the getOperand calls, switches both to use common IsSigned flag, and hoists the NumElems/NumElts variable. llvm-svn: 340720	2018-08-27 06:35:02 +00:00
Craig Topper	a72012c206	[X86] Correct the cost of (v4i32 (fptoui (v4f64))) under AVX512F. Summary: This was inheriting the cost from the AVX table, but should be legal under AVX512. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51267 llvm-svn: 340708	2018-08-26 18:47:44 +00:00
Craig Topper	128915f4ae	[X86] Add FeatureCMOV explicitly to all CPUs that support it. Remove FeatureCMOV implication from Feature64Bit and FeatureSSE1 Summary: Previously most CPUs inherited cmov support through Feature64Bit(or FeatureCMPXCHG16HB implying Feature64Bit) or FeatureSSE1. This has the surprising side effect that -mattr=-cmov causes an assert to fire in 64-bit mode because it clears the Feature64Bit. Or in 32-bit mode, -mattr=-cmov disables any sse/avx features which seems surprising. This patch removes the implication and instead updates hasCMOV in X86Subtarget to check SSE1 or is64Bit in addition to the regular cmov flag. This should keep most things working the way they did before. I don't believe there is a way to specific "-cmov" directly from clang so this should only effect our lower level tools. This does stop -mattr=cx16(cmpxchg16b) from implying cmov is enabled via the 64bit flag as you can see from one of the changed tests. But that was a 32-bit test so I don't know why it enabled cx16 anyway. For the other test I had to add -sse to override the new sse check in hasCMOV. Reviewers: RKSimon, DavidKreitzer, spatel Reviewed By: RKSimon Subscribers: llvm-commits, jfb Differential Revision: https://reviews.llvm.org/D51228 llvm-svn: 340707	2018-08-26 18:29:33 +00:00
Craig Topper	b68a78b9ac	[X86] Add FeatureCMOV to athlon and athlon-tbird cpus. Summary: This matches gcc and one cpuid dump I found online. Given that these are considered 7th generation x86 CPU it seems likely they support cmov since cmov was added by Intel in their 6th generation. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51264 llvm-svn: 340706	2018-08-26 18:29:27 +00:00
Sanjay Patel	113cac3b15	[SelectionDAG][x86] turn insertelement into undef with variable index into splat I noticed this along with the patterns in D51125, but when the index is variable, we don't convert insertelement into a build_vector. For x86, that means these get expanded at legalization time into the loading/spilling code that we see in the tests. I think it's always better to avoid going to memory on these, and we get the optimal 'broadcast' if it's available. I suspect other targets may want to look at enabling the hook. AArch64 and AMDGPU have regression tests that would be affected (although I did not check what would happen in those cases). In the most basic cases shown here, AArch64 would probably do much better with a splat. Differential Revision: https://reviews.llvm.org/D51186 llvm-svn: 340705	2018-08-26 18:20:41 +00:00
Petar Jovanovic	1fa5051bad	[MIPS GlobalISel] Legalize i8 and i16 add Legalize G_ADD for types smaller than i32. LegalizationArtifactCombiner replaces extend instructions with appropriate bitwise instructions. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D51213 llvm-svn: 340697	2018-08-26 07:25:33 +00:00
Craig Topper	4240ecb909	[X86] Fix typo in comment, expect->except. NFC llvm-svn: 340695	2018-08-26 03:43:23 +00:00
Craig Topper	ebec2793d1	[X86] Replace support for vXi32 SMUL_LOHI/UMUL_LOHI with MULHS/MULHU support instead. Summary: The only time vector SMUL_LOHI/UMUL_LOHI nodes are created is during division/remainder lowering. If its created before op legalization, generic DAGCombine immediately turns that SMUL_LOHI/UMUL_LOHI into a MULHS/MULHU since only the upper half is used. That node will stick around through vector op legalization and will be turned back into UMUL_LOHI/SMUL_LOHI during op legalization. It will then be custom lowered by the X86 backend. Due to this two step lowering the vector shuffles created by the custom lowering get legalized after their inputs rather than before. This prevents the shuffles from being combined with any build_vector of constants. This patch uses changes vXi32 to use MULHS/MULHU instead. This is what the later DAG combine did anyway. But by skipping the change back to UMUL_LOHI/SMUL_LOHI we lower it before any constant BUILD_VECTORS. This allows the vector_shuffle creation to constant fold with the build_vectors. This accounts for the test changes here. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51254 llvm-svn: 340690	2018-08-25 18:01:24 +00:00
Craig Topper	a11a3b3818	[SelectionDAG][X86] Reorder the operands the MaskedStoreSDNode to put the value first. Summary: Previously the value being stored is the last operand in SDNode. This causes the type legalizer to visit the mask operand before the value operand. The type legalizer was more complicated because of this since we want the type of the value to drive the decisions. This patch moves the value to be the first operand so we visit it first during type legalization. It also simplifies the type legalization code accordingly. X86 is currently the only in tree target that uses this SDNode. Not sure if there are any users out of tree. Reviewers: RKSimon, delena, hfinkel, eli.friedman Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50402 llvm-svn: 340689	2018-08-25 17:48:17 +00:00
Craig Topper	bce8680605	[X86] Make sure type is a vector before calling VT.getVectorNumElements() in combineLoopMAddPattern Fixes PR38700. llvm-svn: 340688	2018-08-25 17:23:43 +00:00
Sanjay Patel	8a84c747d2	[x86] try harder to use broadcast to load a scalar into vector reg This is a preliminary step for a preliminary step for D50992. I noticed that x86 often misses chances to load a scalar directly into a vector register. So this patch is just allowing more of those cases to match a broadcast op in lowerBuildVectorAsBroadcast(). The old code comment said it doesn't make sense to use a broadcast when we're loading a single element and everything else is undef, but I think that's the best case in the improved tests in insert-loaded-scalar.ll. We avoid scalar-to-vector-register move and/or less efficient shuffling. Note that there are some existing types that were already producing a broadcast, but that happens semi-accidentally. Ie, it's not happening as part of lowerBuildVectorAsBroadcast(). The build vector gets expanded into load + shuffle, and then shuffle lowering produces the broadcast. Description of the other test diffs: 1. avx-basic.ll - replacing load+shufle is a win. 2. sse3-avx-addsub-2.ll - vmovddup vs. vbroadcastss is neutral 3. sse41.ll - don't care - we convert that intrinsic to generic IR now, so this test is deprecated 4. vector-shuffle-128-v8.ll / vector-shuffle-256-v16.ll - pshufb alternatives with an extra instruction are not obviously bad Differential Revision: https://reviews.llvm.org/D51125 llvm-svn: 340685	2018-08-25 14:56:05 +00:00
Tim Renouf	904343f879	[AMDGPU] Add support for multi-dword s.buffer.load intrinsic Summary: Patch by Marek Olsak and David Stuttard, both of AMD. This adds a new amdgcn intrinsic supporting s.buffer.load, in particular multiple dword variants. These are convenient to use from some front-end implementations. Also modified the existing llvm.SI.load.const intrinsic to common up the underlying implementation. This modification also requires that we can lower to non-uniform loads correctly by splitting larger dword variants into sizes supported by the non-uniform versions of the load. V2: Addressed minor review comments. V3: i1 glc is now i32 cachepolicy for consistency with buffer and tbuffer intrinsics, plus fixed formatting issue. V4: Added glc test. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51098 Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b llvm-svn: 340684	2018-08-25 14:53:17 +00:00
Ana Pazos	ecc65eddec	[RISCV] Fixed Assertion`Kind == Immediate && "Invalid type access!"' failed. Summary: Missing check for isImm() in some Immediate classes. This bug was uncovered by a LLVM MC Assembler Protocol Buffer Fuzzer for the RISC-V assembly language. Reviewers: hiraditya, asb Reviewed By: hiraditya, asb Subscribers: llvm-commits, hiraditya, kito-cheng, shiva0217, rkruppe, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei Differential Revision: https://reviews.llvm.org/D50797 llvm-svn: 340674	2018-08-24 23:47:49 +00:00
Ana Pazos	61b28ede75	[RISCV] Fix std::advance slowness Summary: It seems std::advance template is treating "-MFI.getCalleeSavedInfo().size()" as a large unsigned value", causing slowness. Thanks to Henrik Gustafsson for reporting the issue. Reviewers: asb Reviewed By: asb Subscribers: llvm-commits, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, asb Differential Revision: https://reviews.llvm.org/D51148 llvm-svn: 340669	2018-08-24 23:13:59 +00:00
Stefan Pintilie	f384606799	[PowerPC] Emit xscpsgndp instead of xxlor when copying floating point scalar registers for P9 This patch will address using the xscpsgndp instruction to copy floating point scalar registers instead of the xxlor (specifically XXLORf) instruction that is currently used. Additionally, this patch of utilizing xscpsgndp will apply to P9, while pre-P9 will still use xxlor. Patch by amyk Differential Revision: https://reviews.llvm.org/D50004 llvm-svn: 340643	2018-08-24 20:00:24 +00:00
Eli Friedman	071203bbf2	[AArch64] Reject inline asm with FP registers when FP is disabled. Otherwise, we would crash trying to deal with an illegal input. Differential Revision: https://reviews.llvm.org/D51202 llvm-svn: 340637	2018-08-24 19:12:13 +00:00
Craig Topper	4058e29e7d	[X86] Teach combineLoopMAddPattern to handle cases where there is no loop and the add has two multiply inputs Differential Revision: https://reviews.llvm.org/D50868 llvm-svn: 340631	2018-08-24 18:05:04 +00:00
Joel Galenson	c6f6c17c9b	Add missing override keyword (NFC) llvm-svn: 340615	2018-08-24 16:15:44 +00:00
Joel Galenson	d36fb48a27	Find PLT entries for x86, x86_64, and AArch64. This adds a new method to ELFObjectFileBase that returns the symbols and addresses of PLT entries. This design was suggested by pcc and eugenis in https://reviews.llvm.org/D49383. Differential Revision: https://reviews.llvm.org/D50203 llvm-svn: 340610	2018-08-24 15:21:56 +00:00
Petar Jovanovic	65d463bdd7	[MIPS GlobalISel] Lower i8 and i16 arguments Lower integer arguments smaller than i32. Support both register and stack arguments. Define setLocInfo function for setting LocInfo field in ArgLocs vector. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D51031 llvm-svn: 340572	2018-08-23 20:41:09 +00:00
Thomas Lively	da26b84bd0	[WebAssembly] Prioritize splats over v128.consts Summary: Splats are fewer bytes than v128.consts, so use them when either could apply. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51179 llvm-svn: 340569	2018-08-23 19:23:13 +00:00
Sanjay Patel	40aa86751a	[x86] add debug option for and-immediate shrinking The commit that added this functionality: rL322957 may be causing/exposing a miscompile in PR38648: https://bugs.llvm.org/show_bug.cgi?id=38648 so allow enabling/disabling to make debugging easier. llvm-svn: 340540	2018-08-23 15:58:07 +00:00
Chandler Carruth	ae0cafece8	[x86/retpoline] Split the LLVM concept of retpolines into separate subtarget features for indirect calls and indirect branches. This is in preparation for enabling only the call retpolines when using speculative load hardening. I've continued to use subtarget features for now as they continue to seem the best fit given the lack of other retpoline like constructs so far. The LLVM side is pretty simple. I'd like to eventually get rid of the old feature, but not sure what backwards compatibility issues that will cause. This does remove the "implies" from requesting an external thunk. This always seemed somewhat questionable and is now clearly not desirable -- you specify a thunk the same way no matter which set of things are getting retpolines. I really want to keep this nicely isolated from end users and just an LLVM implementation detail, so I've moved the `-mretpoline` flag in Clang to no longer rely on a specific subtarget feature by that name and instead to be directly handled. In some ways this is simpler, but in order to preserve existing behavior I've had to add some fallback code so that users who relied on merely passing -mretpoline-external-thunk continue to get the same behavior. We should eventually remove this I suspect (we have never tested that it works!) but I've not done that in this patch. Differential Revision: https://reviews.llvm.org/D51150 llvm-svn: 340515	2018-08-23 06:06:38 +00:00
Thomas Lively	c17425708b	[WebAssembly] SIMD Bitwise binary arithmetic Summary: AND, OR, and XOR. This CL depends on D51113. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51136 llvm-svn: 340505	2018-08-23 00:48:37 +00:00
Thomas Lively	123c3bb29e	[WebAssembly][NFC] Reorganize SIMD instructions Summary: Reorganize WebAssemblyInstrSIMD.td to put all of the instruction definitions together, making it easier to see which instructions have been implemented already. Depends on D51143. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51113 llvm-svn: 340504	2018-08-23 00:43:47 +00:00
Thomas Lively	914f0f20a4	[WebAssembly][NFC] Move specific instruction formats to specific files Summary: WebAssemblyInstrFormats.td retains only multiclasses that are used in multiple other tablegen files. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D51143 llvm-svn: 340503	2018-08-23 00:36:43 +00:00
Craig Topper	cf9df99d79	[X86] Teach combineLoopSADPattern to handle cases where there is no loop and the add has two absolute difference inputs Previously we asumed a vector reduction add is part of a loop and one of the input is a phi. But the code in SelectionDAGBuilder that sets vector reduction flag handles more cases than that. It just requires that the use chain ends in a horizontal reduction. And there are no other uses. This means it can handle unrolled reduction loops. If the initial value of the reduction was 0, an unrolled loop would begin with a vector reduction add that has two sad inputs. Previously we would only transform one side of the add, but for this case we need to transform both sides. I've created a lambda to reuse some of the code for both sides. And fixed the variables names to remove reference to "phi". Differential Revision: https://reviews.llvm.org/D50817 llvm-svn: 340478	2018-08-22 23:19:01 +00:00
Thomas Lively	2ee686da27	[WebAssembly] Arbitrary BUILD_VECTOR and remove i64x2.mul Summary: This CL adds support for arbitrary BUILD_VECTORS, i.e. not splats and not consts. This is the last feature needed to properly lower v2i64 multiplies without a i64x2.mul instruction (which is not in the spec), so i64x2.mul is removed as well. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51082 Remove unnecessary condition and fix whitespace llvm-svn: 340472	2018-08-22 23:06:27 +00:00
Eli Friedman	96e3cd85bd	[ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available. The inline sequence is very long (about 70 bytes on Thumb1), so it's not really a good idea to inline it, especially when optimizing for size. Differential Revision: https://reviews.llvm.org/D47917 llvm-svn: 340458	2018-08-22 21:47:14 +00:00
Yonghong Song	48883142de	bpf: fix an assertion in BPFAsmBackend applyFixup() Fix bug https://bugs.llvm.org/show_bug.cgi?id=38643 In BPFAsmBackend applyFixup(), there is an assertion for FixedValue to be 0. This may not be true, esp. for optimiation level 0. For example, in the above bug, for the following two static variables: @bpf_map_lookup_elem = internal global i8* (i8, i8)* inttoptr (i64 1 to i8* (i8, i8)), align 8 @bpf_map_update_elem = internal global i32 (i8, i8, i8, i64)* inttoptr (i64 2 to i32 (i8, i8, i8, i64)), align 8 The static variable @bpf_map_update_elem will have a symbol offset of 8 and a FK_SecRel_8 with FixupValue 8 will cause the assertion if llvm is built with -DLLVM_ENABLE_ASSERTIONS=ON. The above relocations will not exist if the program is compiled with optimization level -O1 and above as the compiler optimizes those static variables away. In the below error message, -O2 is suggested as this is the common practice. Note that FixedValue = 0 in applyFixup() does exist and is valid, e.g., for the global variable my_map in the above bug. The bpf loader will process them properly for map_id's before loading the program into the kernel. The static variables, which are not optimized away by compiler, may have FK_SecRel_8 relocation with non-zero FixedValue. The patch removed the offending assertion and will issue a hard error as below if the FixedValue in applyFixup() is not 0. $ llc -march=bpf -filetype=obj fixup.ll LLVM ERROR: Unsupported relocation: try to compile with -O2 or above, or check your static variable usage Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 340455	2018-08-22 21:21:03 +00:00
Heejin Ahn	972fc3588b	[WebAssembly] Don't write SP back when prolog is generated only for EH Summary: When we don't actually have stack-allocated variables but need SP only to support EH, we don't need to write SP back in the epilog, because we don't bump down the stack pointer. Reviewers: dschuff Subscribers: jgravelle-google, sbc100, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51114 llvm-svn: 340454	2018-08-22 21:13:49 +00:00
Martin Storsjo	5ab1d107bb	[ARM] Avoid injecting constant islands in movw+movt pairs on Windows On Windows, movw+movt pairs with relocations are handled with a single relocation that covers them both. Therefore we can't inject anything between these instructions, otherwise the relocation (which in LLVM only is treated as the movw instruction's relocation, while the movt instruction's relocation is dropped) will end up bogus. These instructions are bundled up until right before the constant islands pass, making this effectively the only place that can split them apart. Differential Revision: https://reviews.llvm.org/D51032 llvm-svn: 340451	2018-08-22 20:34:12 +00:00
Martin Storsjo	d3b29223a8	[ARM] Move machine operand target flags to ARMBaseInstrInfo This makes sure the flags are available for use for thumb MIR as well. A test that requires this will be added in the next commit. llvm-svn: 340450	2018-08-22 20:34:06 +00:00
Krzysztof Parzyszek	2ff9aa15e4	[Hexagon] Enable interleaving in loop vectorizer llvm-svn: 340447	2018-08-22 20:15:04 +00:00
Eli Friedman	c11e2b9470	[ARM] Handle all-ones mask explicitly in targetShrinkDemandedConstant. This avoids a potential infinite loop setting and unsetting bits in the mask. Reduced from a failure on the polly-aosp bot. Differential Revision: https://reviews.llvm.org/D51066 llvm-svn: 340446	2018-08-22 20:13:45 +00:00
Craig Topper	538f8ab438	[X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware Inspired by what AArch64 does for shifts, this patch attempts to replace shift amounts with neg if we can. This is done directly as part of isel so its as late as possible to avoid breaking some BZHI patterns since those patterns need an unmasked (32-n) to be correct. To avoid manual load folding and custom instruction selection for the negate. I've inserted new nodes in the DAG above the shift node in topological order. Differential Revision: https://reviews.llvm.org/D48789 llvm-svn: 340441	2018-08-22 19:39:09 +00:00
Heejin Ahn	bc6d8970bb	[WebAssembly] Remove MachineFrameInfo arg from checking functions (NFC) Summary: There are several functions in the form of `has*` or `needs*` in `WebAssemblyFrameLowering` and its `MachineFrameInfo` argument can be obtained from `MachineFunction` so it is not necessarily has to be passed from a caller. Also, it is more in line with other overriden fuctions like `hasBP` or `hasReservedCallFrame`, which also take only `MachineFunction` argument. Reviewers: dschuff Subscribers: sbc100, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51116 llvm-svn: 340438	2018-08-22 18:53:48 +00:00
Craig Topper	87f78cfe15	[X86] In OptimizeLEAs pass, check that the key is in the LEAs map before accessing When the key is not already in the map, the access operator[] creates an empty value and grows the map. Resizing a map is very slow, so this needs to be avoided. Found with csmith + asserts. May help with https://bugs.llvm.org/show_bug.cgi?id=25843 Patch by Tom Rix. Differential Revision: https://reviews.llvm.org/D50780 llvm-svn: 340434	2018-08-22 18:24:13 +00:00
Heejin Ahn	ff363539c6	[WebAssembly] Add hasSideEffects flag to catch instructions Summary: `catch` instruction certainly has rather huge side effects and the flag was missing. At the moment this does not change any unit tests we currently have. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D50919 llvm-svn: 340433	2018-08-22 18:22:45 +00:00
Samuel Pitoiset	7bd9dcffcd	AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space 32-bit constant address space is declared as 6, so the maximum number of address spaces is 6, not 5. Fixes "LLVM ERROR: Pointer address space out of range". v5: rename MAX_COMMON_ADDRESS to MAX_AMDGPU_ADDRESS v4: - fix compilation issues - fix out of bounds access v3: use static_assert() v2: add a very simple test for 32-bit addr space Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630 llvm-svn: 340417	2018-08-22 16:08:48 +00:00
Samuel Pitoiset	d81d6f7d58	AMDGPU: fix existing alias rules for constant and global Constant and global may alias, also one rules table wasn't ordered correctly. Pinpointed by Matt. v2: add a test with swapped parameters llvm-svn: 340416	2018-08-22 16:08:43 +00:00
Simon Pilgrim	ffdfe45645	[X86][SSE] LowerMULH vXi8 - use SSE shifts directly. We know these vXi16 extended cases are legal constant splat shifts. llvm-svn: 340414	2018-08-22 15:37:11 +00:00
Sam Parker	4d519fc3b5	[ARM] Rotated operand patterns for *xtb16 Add intrinsic isel patterns for sxtb16, sxtab16, uxtb16 and uxtab16 so that they can perform a ror. Differential Revision: https://reviews.llvm.org/D51034 llvm-svn: 340405	2018-08-22 12:58:36 +00:00
David Green	9dd1d451d9	[AArch64] Add Tiny Code Model for AArch64 This adds the plumbing for the Tiny code model for the AArch64 backend. This, instead of loading addresses through the normal ADRP;ADD pair used in the Small model, uses a single ADR. The 21 bit range of an ADR means that the code and its statically defined symbols need to be within 1MB of each other. This makes it mostly interesting for embedded applications where we want to fit as much as we can in as small a space as possible. Differential Revision: https://reviews.llvm.org/D49673 llvm-svn: 340397	2018-08-22 11:31:39 +00:00
Matt Arsenault	bb8e64e7f5	AMDGPU: Fix not respecting byval alignment in call frame setup This was hackily adding in the 4-bytes reserved for the callee's emergency stack slot. Treat it like a normal stack allocation so we get the correct alignment padding behavior. This fixes an inconsistency between the caller and callee. llvm-svn: 340396	2018-08-22 11:09:45 +00:00
Stefan Maksimovic	6ccbd16433	[mips] Handle missing CondCodes Add patterns for unhandled CondCode enumerables: SETEQ, SETGE, SETGT, SETLE, SETLT, SETNE. Stated at the ISD::CondCode enum declaration: `All of these (except for the 'always folded ops') should be handled for floating point.` Add patterns which use these nodes, same as corresponding 'ordered' CondCode nodes. Referring to 'Ordered means that neither operand is a QNAN' we assume it is safe to match ex. SETLT node to the same instruction as SETOLT. Differential Revision: https://reviews.llvm.org/D50757 llvm-svn: 340392	2018-08-22 09:34:44 +00:00
Heejin Ahn	684325955c	[WebAssembly] Fix typos in mem.grow/memory.grow opcodes This should be not 0x3f but 0x40. llvm-svn: 340373	2018-08-22 00:33:34 +00:00
Heejin Ahn	c4df1d182c	[WebAssembly] Change comments on SP writing back (NFC) Summary: We now write back not to memory but to __stack_pointer global. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51074 llvm-svn: 340372	2018-08-22 00:20:02 +00:00
Alina Sbirlea	ab6f84f763	Update MemorySSA in BasicBlockUtils. Summary: Extend BasicBlocksUtils to update MemorySSA. Subscribers: sanjoy, arsenm, nhaehnle, jlebar, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D45300 llvm-svn: 340365	2018-08-21 23:32:03 +00:00
Scott Linder	72855e36c5	[AMDGPU] Consider loads from flat addrspace to be potentially divergent In general we can't assume flat loads are uniform, and cases where we can prove they are should be handled through infer-address-spaces. Differential Revision: https://reviews.llvm.org/D50991 llvm-svn: 340343	2018-08-21 21:24:31 +00:00
Heejin Ahn	78d1910891	[WebAssembly] Restore __stack_pointer after catch instructions Summary: After the stack is unwound due to a thrown exception, the `__stack_pointer` global can point to an invalid address. This inserts instructions that restore `__stack_pointer` global. Reviewers: jgravelle-google, dschuff Subscribers: mgorny, sbc100, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D50980 llvm-svn: 340339	2018-08-21 21:23:07 +00:00
Thomas Lively	22442924a8	[WebAssembly] v128.const Summary: This CL implements v128.const for each vector type. New operand types are added to ensure the vector contents can be serialized without LEB encoding. Tests are added for instruction selection, encoding, assembly and disassembly. Reviewers: aheejin, dschuff, aardappel Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D50873 llvm-svn: 340336	2018-08-21 21:03:18 +00:00
Heejin Ahn	20c9c4438e	[WebAssembly] Change writeSPToMemory to writeSPToGlobal (NFC) Summary: SP is now a __stack_pointer global and not a memory address anymore. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51046 llvm-svn: 340328	2018-08-21 19:52:19 +00:00
Heejin Ahn	ed5e06b0a7	[WebAssembly] Add isEHScopeReturn instruction property Summary: So far, `isReturn` property is used to mean both a return instruction from a functon and the end of an EH scope, a scope that starts with a EH scope entry BB and ends with a catchret or a cleanupret instruction. Because WinEH uses funclets, all EH-scope-ending instructions are also real return instruction from a function. But for wasm, they only serve as the end marker of an EH scope but not a return instruction that exits a function. This mismatch caused incorrect prolog and epilog generation in wasm EH scopes. This patch fixes this. This patch is in the same vein with rL333045, which splits `MachineBasicBlock::isEHFuncletEntry` into `isEHFuncletEntry` and `isEHScopeEntry`. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D50653 llvm-svn: 340325	2018-08-21 19:44:11 +00:00
Benjamin Kramer	d66dde5a98	[NVPTX] Remove ftz variants of cvt with rounding mode These do not exist in ptxas, it refuses to compile them. Differential Revision: https://reviews.llvm.org/D51042 llvm-svn: 340317	2018-08-21 18:44:25 +00:00
Eric Christopher	3dc594c1e6	Temporarily Revert "[PowerPC] Generate Power9 extswsli extend sign and shift immediate instruction" due to it causing a compiler crash on valid. This reverts commit r340016, testcase forthcoming. llvm-svn: 340315	2018-08-21 18:35:08 +00:00
Simon Pilgrim	50eba6b380	[X86][SSE] Lower vXi8 general shifts to SSE shifts directly. NFCI. Most of these shifts are extended to vXi16 so we don't gain anything from forcing another round of generic shift lowering - we know these extended cases are legal constant splat shifts. llvm-svn: 340307	2018-08-21 17:27:03 +00:00
Simon Pilgrim	98eb4ae499	[X86][SSE] Lower v8i16 general shifts to SSE shifts directly. NFCI. We don't gain anything from forcing another round of generic shift lowering - we know these are legal constant splat shifts. llvm-svn: 340302	2018-08-21 17:05:07 +00:00
Simon Pilgrim	dbe4e9e3ff	[X86][SSE] Lower directly to SSE shifts in the BLEND(SHIFT, SHIFT) combine. NFCI. We don't gain anything from forcing another round of generic shift lowering - we know these are legal constant splat shifts. llvm-svn: 340300	2018-08-21 16:46:48 +00:00
Farhana Aleen	3528c80378	[AMDGPU] Support idot2 pattern. Summary: Transform add (mul ((i32)S0.x, (i32)S1.x), add( mul ((i32)S0.y, (i32)S1.y), (i32)S3) => i/udot2((v2i16)S0, (v2i16)S1, (i32)S3) Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D50024 llvm-svn: 340295	2018-08-21 16:21:15 +00:00
Simon Pilgrim	5a83a1fd13	[X86][SSE] Add helper function to convert to/between the SSE vector shift opcodes. NFCI. Also remove some more getOpcode calls from LowerShift when we already have Opc. llvm-svn: 340290	2018-08-21 15:57:33 +00:00
Daniel Sanders	6a943fb16a	[aarch64][mc] Don't lookup symbols when there is no symbol lookup callback Summary: When run under llvm-mc-disassemble-fuzzer, there is no symbol lookup callback so tryAddingSymbolicOperand() must fail gracefully instead of crashing Reviewers: aemerson, javed.absar Reviewed By: aemerson Subscribers: lhames, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D51005 llvm-svn: 340287	2018-08-21 15:47:25 +00:00
Tim Renouf	bb5ee41ab4	[AMDGPU] Allow int types for MUBUF vdata Summary: Previously the new llvm.amdgcn.raw/struct.buffer.load/store intrinsics only allowed float types for the data to be loaded or stored, which sometimes meant the frontend needed to generate a bitcast. In this, the new intrinsics copied the old buffer intrinsics. This commit extends the new intrinsics to allow int types as well. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D50315 Change-Id: I8202af2d036455553681dcbb3d7d32ae273f8f85 llvm-svn: 340270	2018-08-21 11:08:12 +00:00
Tim Renouf	4f703f5e11	[AMDGPU] New buffer intrinsics Summary: This commit adds new intrinsics llvm.amdgcn.raw.buffer.load llvm.amdgcn.raw.buffer.load.format llvm.amdgcn.raw.buffer.load.format.d16 llvm.amdgcn.struct.buffer.load llvm.amdgcn.struct.buffer.load.format llvm.amdgcn.struct.buffer.load.format.d16 llvm.amdgcn.raw.buffer.store llvm.amdgcn.raw.buffer.store.format llvm.amdgcn.raw.buffer.store.format.d16 llvm.amdgcn.struct.buffer.store llvm.amdgcn.struct.buffer.store.format llvm.amdgcn.struct.buffer.store.format.d16 llvm.amdgcn.raw.buffer.atomic.* llvm.amdgcn.struct.buffer.atomic.* with the following changes from the llvm.amdgcn.buffer.* intrinsics: * there are separate raw and struct versions: raw does not have an index arg and sets idxen=0 in the instruction, and struct always sets idxen=1 in the instruction even if the index is 0, to allow for the fact that gfx9 does bounds checking differently depending on whether idxen is set; * there is a combined cachepolicy arg (glc+slc) * there are now only two offset args: one for the offset that is included in bounds checking and swizzling, to be split between the instruction's voffset and immoffset fields, and one for the offset that is excluded from bounds checking and swizzling, to go into the instruction's soffset field. The AMDISD::BUFFER_* SD nodes always have an index operand, all three offset operands, combined cachepolicy operand, and an extra idxen operand. The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50306 Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205 llvm-svn: 340269	2018-08-21 11:07:10 +00:00
Tim Renouf	35484c9d50	[AMDGPU] New tbuffer intrinsics Summary: This commit adds new intrinsics llvm.amdgcn.raw.tbuffer.load llvm.amdgcn.struct.tbuffer.load llvm.amdgcn.raw.tbuffer.store llvm.amdgcn.struct.tbuffer.store with the following changes from the llvm.amdgcn.tbuffer.* intrinsics: * there are separate raw and struct versions: raw does not have an index arg and sets idxen=0 in the instruction, and struct always sets idxen=1 in the instruction even if the index is 0, to allow for the fact that gfx9 does bounds checking differently depending on whether idxen is set; * there is a combined format arg (dfmt+nfmt) * there is a combined cachepolicy arg (glc+slc) * there are now only two offset args: one for the offset that is included in bounds checking and swizzling, to be split between the instruction's voffset and immoffset fields, and one for the offset that is excluded from bounds checking and swizzling, to go into the instruction's soffset field. The AMDISD::TBUFFER_* SD nodes always have an index operand, all three offset operands, combined format operand, combined cachepolicy operand, and an extra idxen operand. The tbuffer pseudo- and real instructions now also have a combined format operand. The obsolescent llvm.amdgcn.tbuffer.* and llvm.SI.tbuffer.store intrinsics continue to work. V2: Separate raw and struct intrinsics. V3: Moved extract_glc and extract_slc defs to a more sensible place. V4: Rebased on D49995. V5: Only two separate offset args instead of three. V6: Pseudo- and real instructions have joint format operand. V7: Restored optionality of dfmt and nfmt in assembler. V8: Addressed minor review comments. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49026 Change-Id: If22ad77e349fac3a5d2f72dda53c010377d470d4 llvm-svn: 340268	2018-08-21 11:06:05 +00:00
Petar Jovanovic	3b953c37f8	[MIPS GlobalISel] Select bitwise instructions Select bitwise instructions for i32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D50183 llvm-svn: 340258	2018-08-21 08:15:56 +00:00
Heejin Ahn	487992cc09	[WebAssembly] Revert type of wake count in atomic.wake to i32 Summary: We decided to revert this from i64 to i32 in Nov 28 CG meeting. Fixes PR38632. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D51010 llvm-svn: 340234	2018-08-20 23:49:29 +00:00
Heejin Ahn	c2c33c8e64	[WebAssembly] Remove an unused argument from writeSPToMemory (NFC) Reviewers: dschuff Subscribers: dschuff, sbc100, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D50933 llvm-svn: 340230	2018-08-20 23:02:15 +00:00
Craig Topper	210ccfe3db	[X86] Prevent lowerVectorShuffleByMerging128BitLanes from creating cycles Due to some splat handling code in getVectorShuffle, its possible for NewV1/NewV2 to have their mask modified from what is requested. This can lead to cycles being created in the DAG. This patch examines the returned mask and makes sure its different. Long term we may need to look closer at that splat code in getVectorShuffle, or add more splat awareness to getVectorShuffle. Fixes PR38639 Differential Revision: https://reviews.llvm.org/D50981 llvm-svn: 340214	2018-08-20 21:08:35 +00:00
Craig Topper	7dcb2c4b0a	[X86] Teach combineTruncatedArithmetic to handle some cases of ISD::SUB We can safely avoid interfering with the subus combine if both inputs are freely truncatable. Either both extends, or an extend and a constant vector. Differential Revision: https://reviews.llvm.org/D50878 llvm-svn: 340212	2018-08-20 20:57:35 +00:00
Vitaly Buka	30b5ed3eb7	Revert "AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space" As it introduces out of bound access. This reverts commit r340172 and r340171 llvm-svn: 340202	2018-08-20 19:31:03 +00:00
Marcello Maggioni	5ca4128b45	[PSV] Update API to be able to use TargetCustom without UB. getTargetCustom() requires values for "Kind" in the constructor that are not in the PSVKind enum. Passing a value that is not inside an enum as an argument to a constructor of the type of the enum is UB. Changing to the underlying type of the enum would solve the UB Differential Revision: https://reviews.llvm.org/D50909 llvm-svn: 340200	2018-08-20 19:23:45 +00:00
Samuel Pitoiset	216a2da577	AMDGPU: fix compilation errors since r340171 Some buildbot slaves reports compilation errors, but it compiled fine on my side, sorry for the breakage. llvm-svn: 340172	2018-08-20 13:31:41 +00:00
Samuel Pitoiset	c95ef77d37	AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space 32-bit constant address space is declared as 6, so the maximum number of address spaces is 6, not 5. Fixes "LLVM ERROR: Pointer address space out of range". v3: use static_assert() v2: add a very simple test for 32-bit addr space Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 340171	2018-08-20 13:18:59 +00:00
Sander de Smalen	07db432265	[AArch64][SVE] Asm: Add SVE System registers This patch adds system registers for controlling aspects of SVE: - ZCR_EL1 (r/w) visible at EL1 and EL0. - ZCR_EL2 (r/w) visible at EL2 and Non-secure EL1 and EL0. - ZCR_EL3 (r/w) visible at all exception levels. and a system register identifying SVE: - ID_AA64ZFR0_EL1 (r) SVE Feature identifier. Reviewers: SjoerdMeijer, samparker, pbarrio, fhahn, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D50885 llvm-svn: 340158	2018-08-20 09:16:59 +00:00
QingShan Zhang	f8f9af7ba5	[PowerPC] Add a peephole post RA to transform the inst that fed by add If the arch is P8, we will select XFLOAD to load the floating point, and then, expand it to vsx and non-vsx X-form instruction post RA. This patch is trying to convert the X-form to D-form if it meets the requirement that one operand of the x-form inst is the special Zero register, and another operand fed by add inst. i.e. y = add imm, reg LFDX. 0, y --> LFD imm(reg) Reviewers: Nemanjai Differential Revision: https://reviews.llvm.org/D49007 llvm-svn: 340149	2018-08-20 02:52:55 +00:00
Craig Topper	803912ea57	[X86] Fix an issue in the matching for ADDUS. We were basically assuming only one operand of the compare could be an ADD node and using that to swap operands. But we can have a normal add followed by a saturing add. This rewrites the canonicalization to just be based on the condition code. llvm-svn: 340134	2018-08-19 04:26:31 +00:00
Craig Topper	2b03df9b05	[X86] Use SDValue::operator== instead of DAG.isEqualTo in strictly integer matching. isEqualTo is more useful for floating point. operator== is sufficient for integer. llvm-svn: 340130	2018-08-18 19:16:56 +00:00
Craig Topper	3e299d896f	[X86] Simplify the PADDUS legality check in combineSelect to match PSUBUS. NFC While there remove some trailing whitespace. llvm-svn: 340129	2018-08-18 18:51:04 +00:00
Craig Topper	40c9559b74	[X86] Add support for using 512-bit PSUBUS to combineSelect. The code already support 128 and 256 and even knows to split 256 for AVX1. So we really just needed to stop looking for specific VTs and subtarget features and just look for legal VTs with i8/i16 elements. While there, add some curly braces around outer if statement bodies that contain only another if. It makes all the closing curly braces look more regular. llvm-svn: 340128	2018-08-18 18:51:03 +00:00
Simon Pilgrim	9c1761a6fd	[X86] Replace all single match schedule class instregexs with instrs entries Helps reduce cost of instrw collection llvm-svn: 340124	2018-08-18 18:04:29 +00:00
Simon Pilgrim	ebfd6ebba7	[X86] Merge shift/rotate schedule class instregexs Helps reduce cost of instrw collection llvm-svn: 340123	2018-08-18 15:58:19 +00:00
Zachary Turner	bc94ae437f	Add the extended XMM registers mappings for AVX-512. After this we should have the entire AVX-512 register set mapping in place. llvm-svn: 340118	2018-08-18 03:54:16 +00:00
Craig Topper	62dd1b1b4f	[X86] Remove detectAddSubSatPattern. This was added very recently in r339650, but appears to be completely untested and has at least one bug in it. llvm-svn: 340086	2018-08-17 21:19:28 +00:00
Krzysztof Parzyszek	9937e205e8	[Hexagon] Remove unused functions from HexagonInstPrinter, NFC llvm-svn: 340081	2018-08-17 21:12:37 +00:00
Simon Pilgrim	2f48122cc9	[X86][SSE] Lower constant vXi8 ISD::SRL/ISD::SRA using PMULLW Extending the concept introduced in D49562, this patch lowers constant vXi8 ISD::SRL/ISD::SRA by zero/sign extending to vXi16 and using PMULLW and then truncating the high 8 bits of the result. Differential Revision: https://reviews.llvm.org/D50781 llvm-svn: 340062	2018-08-17 18:03:11 +00:00
Craig Topper	730890dbdb	[X86] Use hasOneUse instead of isOnlyUserOf. NFCI isOnlyUserOf is a little heavier because it allows the node to be used multiple times by the other node. In this case we are looking at a truncate which only has one operand so we know it can only use it once. Thus hasOneUse is better. llvm-svn: 340059	2018-08-17 17:57:25 +00:00
Stefan Pintilie	39869ccf51	[PowerPC] Generate lxsd instead of the ld->mtvsrd sequence for vector loads This patch addresses: - Implementation within PPCISelLowering.cpp to check if we should use direct load into vector instructions (such as lxsd/lfd ) when the scalar_to_vector function is used; which will allow us to catch as many cases of the scalar_to_vector uses as possible to translate the ld->mtvsrd sequence into lxsd. - Test cases to exhibit the behaviour of emitting lxsd/lfd. Patch by amyk Differential revision: https://reviews.llvm.org/D49698 llvm-svn: 340037	2018-08-17 15:15:26 +00:00
Francis Visoiu Mistrih	8bff832534	[X86] Fix liveness information when expanding X86::EH_SjLj_LongJmp64 test/CodeGen/X86/shadow-stack.ll has the following machine verifier errors: ``` * Bad machine code: Using a killed virtual register * - function: bar - basic block: %bb.6 entry (0x7fdc81857818) - instruction: %3:gr64 = MOV64rm killed %2:gr64, 1, $noreg, 8, $noreg - operand 1: killed %2:gr64 * Bad machine code: Using a killed virtual register * - function: bar - basic block: %bb.6 entry (0x7fdc81857818) - instruction: $rsp = MOV64rm killed %2:gr64, 1, $noreg, 16, $noreg - operand 1: killed %2:gr64 * Bad machine code: Virtual register killed in block, but needed live out. * - function: bar - basic block: %bb.2 entry (0x7fdc818574f8) Virtual register %2 is used after the block. ``` The fix here is to only copy the machine operand's register without the kill flags for all the instructions except the very last one of the sequence. I had to insert dummy PHIs in the test case to force the NoPHI function property to be set to false. More on this here: https://llvm.org/PR38439 Differential Revision: https://reviews.llvm.org/D50260 llvm-svn: 340033	2018-08-17 14:46:56 +00:00
Krzysztof Parzyszek	39a979c838	[Hexagon] Expand vgather pseudos during packetization This will allow packetizing the vgather expansion with other instructions. llvm-svn: 340028	2018-08-17 14:24:24 +00:00
Roger Ferrer Ibanez	734a04ea33	[RISCV] Remove unused function This function is not virtual, it is private and it is not called anywhere. No regression is introduced by removing it. I think we can safely remove it. Differential Revision: https://reviews.llvm.org/D50836 llvm-svn: 340024	2018-08-17 13:40:03 +00:00
Luke Cheeseman	64dcdec60c	[AArch64] - Generate pointer authentication instructions - Generate pointer authentication instructions - The functions instrumented depend on function attribtues: all (all functions instrumentent) non-leaf (only those that spill LR) none - Function epilogues sign the LR before spilling to the stack and authenticate the LR once restored - If the target is v8.3a or greater than can use the combined authenticate and return instruction Differential revision: https://reviews.llvm.org/D49793 llvm-svn: 340018	2018-08-17 12:53:22 +00:00
Nemanja Ivanovic	39751276b0	[PowerPC] Generate Power9 extswsli extend sign and shift immediate instruction Add a DAG combine for the PowerPC code generator to generate the Power9 extswsli extend sign and shift immediate instruction. Patch by RolandF. Differential revision: https://reviews.llvm.org/D49879 llvm-svn: 340016	2018-08-17 12:35:44 +00:00
Bernard Ogden	b828bb2a15	[ARM/AArch64] Support FP16 +fp16fml instructions Add +fp16fml feature for new FP16 instructions, which are a mandatory part of FP16 from v8.4-A and an optional part of FP16 from v8.2-A. It doesn't seem to be possible to model this in LLVM, but the relationship between the options is handled by the related clang patch. In keeping with what I think is the usual practice, the fp16fml extension is accepted regardless of base architecture version. Builds on/replaces Sjoerd Meijer's patch to add these instructions at https://reviews.llvm.org/D49839. Differential Revision: https://reviews.llvm.org/D50228 llvm-svn: 340013	2018-08-17 11:29:49 +00:00
Daniel Cederman	0c597ca223	[Sparc] Get sret arg size from CallLoweringInfo.getArgs() Summary: Looking at the callee argument list, as is done now, might not work if the function has been typecasted into one that is expected to return a struct. This change also simplifies the code. The isFP128ABICall() function can be removed as it is no longer needed. The test in fp128.ll has been updated to verify this. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48117 llvm-svn: 340008	2018-08-17 10:40:00 +00:00
Daniel Cederman	7d3e08ff8d	[Sparc] Flush register windows for @llvm.returnaddress(1) Summary: When @llvm.returnaddress is called with a value higher than 0 it needs to read from the call stack to get the return address. This means that the register windows needs to be flushed to the stack to guarantee that the data read is valid. For values higher than 1 this is done indirectly by the call to getFRAMEADDR(), but not for the value 1. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48636 llvm-svn: 340003	2018-08-17 09:18:31 +00:00
Sjoerd Meijer	31239a4c6a	[ARM][NFC] ARMCodeGenPrepare: some refactoring and algorithm description Differential Revision: https://reviews.llvm.org/D50846 llvm-svn: 339997	2018-08-17 07:34:01 +00:00
Heejin Ahn	a93e726170	[WebAssembly] Modify LateEHPrepare one-line description (NFC) llvm-svn: 339972	2018-08-17 00:12:04 +00:00
Heejin Ahn	e76fa9ecca	[WebAssembly] CFG stackify support for exception handling Summary: This adds support for exception handling to CFGStackify pass. This only adds TRY / END_TRY markers and DOES NOT yet fix unwind mismatches that can be created by the linearization of the CFG into the structural wasm format. The mismatch fix will be added by following patches. In detail, this patch - Added support for TRY / END_TRY markers to support EH - Changed many static functions into class member functions as they take too many arguments now - Added several more bookeeping data structures - Refactored routines that decide where to insert markers, because without refactoring this got too complicated as we added support for new kinds of markers (TRY/END_TRY). - Rewrote rethrow instructions' BB arguments to relative depths in EH pad stack. Reviewers: dschuff, sunfish Subscribers: sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D48273 llvm-svn: 339967	2018-08-16 23:50:59 +00:00
Craig Topper	bde2b43cb3	[X86] In EFLAGS copy pass, don't emit EXTRACT_SUBREG instructions since we're after peephole Normally the peephole pass converts EXTRACT_SUBREG to COPY instructions. But we're after peephole so we can't rely on it to clean these up. To fix this, the eflags pass now emits a COPY with a subreg input. I also noticed that in 32-bit mode we need to constrain the input to the copy to ensure the subreg is valid. Otherwise we'll fail verify-machineinstrs Differential Revision: https://reviews.llvm.org/D50656 llvm-svn: 339945	2018-08-16 21:54:02 +00:00
Chandler Carruth	c73c0307fe	[MI] Change the array of `MachineMemOperand` pointers to be a generically extensible collection of extra info attached to a `MachineInstr`. The primary change here is cleaning up the APIs used for setting and manipulating the `MachineMemOperand` pointer arrays so chat we can change how they are allocated. Then we introduce an extra info object that using the trailing object pattern to attach some number of MMOs but also other extra info. The design of this is specifically so that this extra info has a fixed necessary cost (the header tracking what extra info is included) and everything else can be tail allocated. This pattern works especially well with a `BumpPtrAllocator` which we use here. I've also added the basic scaffolding for putting interesting pointers into this, namely pre- and post-instruction symbols. These aren't used anywhere yet, they're just there to ensure I've actually gotten the data structure types correct. I'll flesh out support for these in a subsequent patch (MIR dumping, parsing, the works). Finally, I've included an optimization where we store any single pointer inline in the `MachineInstr` to avoid the allocation overhead. This is expected to be the overwhelmingly most common case and so should avoid any memory usage growth due to slightly less clever / dense allocation when dealing with >1 MMO. This did require several ergonomic improvements to the `PointerSumType` to reasonably support the various usage models. This also has a side effect of freeing up 8 bits within the `MachineInstr` which could be repurposed for something else. The suggested direction here came largely from Hal Finkel. I hope it was worth it. ;] It does hopefully clear a path for subsequent extensions w/o nearly as much leg work. Lots of thanks to Reid and Justin for careful reviews and ideas about how to do all of this. Differential Revision: https://reviews.llvm.org/D50701 llvm-svn: 339940	2018-08-16 21:30:05 +00:00
Jacob Gravelle	3d668d3928	[WebAssembly] Remove temporary workaround for function bitcasts Summary: EM_ASM no longer is lowered as varargs in C, so this workaround is obsolete. Reviewers: dschuff, sunfish Subscribers: sbc100, aheejin, llvm-commits Differential Revision: https://reviews.llvm.org/D50859 llvm-svn: 339925	2018-08-16 19:24:31 +00:00
Reid Kleckner	bd5d71229d	[codeview] Use push_macro to avoid conflicts instead of a prefix Summary: This prefix was added in r333421, and it changed our dumper output to say things like "CVRegEAX" instead of just "EAX". That's a functional change that I'd rather avoid. I tested GCC, Clang, and MSVC, and all of them support #pragma push_macro. They don't issue warnings whem the macro is not defined either. I don't have a Mac so I can't test the real termios.h header, but I looked at the termios.h sources online and looked for other conflicts. I saw only the CR* macros, so those are the ones we work around. Reviewers: zturner, JDevlieghere Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D50851 llvm-svn: 339907	2018-08-16 17:34:31 +00:00
Matt Arsenault	7121bed210	AMDGPU: Custom lower fexp This will allow the library to just use __builtin_expf directly without expanding this itself. Note f64 still won't work because there is no exp instruction for it. llvm-svn: 339902	2018-08-16 17:07:52 +00:00
Nirav Dave	7fd992a755	[MC][X86] Enhance X86 Register expression handling to more closely match GCC. Allow the comparison of x86 registers in the evaluation of assembler directives. This generalizes and simplifies the extension from r334022 to catch another case found in the Linux kernel. Reviewers: rnk, void Reviewed By: rnk Subscribers: hiraditya, nickdesaulniers, llvm-commits Differential Revision: https://reviews.llvm.org/D50795 llvm-svn: 339895	2018-08-16 16:31:14 +00:00
Zachary Turner	2838b59121	Add support for AVX-512 CodeView registers. When compiling with /arch:AVX512 and optimizations turned on, we could crash while emitting debug info because we did not have CodeView register constants for the AVX 512 register set defined. This patch defines them. Differential Revision: https://reviews.llvm.org/D50819 llvm-svn: 339893	2018-08-16 16:17:55 +00:00
Sam Parker	0d51197051	[ARM] Ignore GEPs in ARMCodeGenPrepare While searching through the use-def tree, ignore GetElementPtrInst instructions because they don't need promoting and neither do their indices. Otherwise, the wide indices prevent the transformation from happening. Differential Revision: https://reviews.llvm.org/D50762 llvm-svn: 339871	2018-08-16 12:24:40 +00:00
Sam Parker	0e2f0bd48e	[ARM] Allow zext in ARMCodeGenPrepare Treat zext instructions as roots, like we do for truncs. Differential Revision: https://reviews.llvm.org/D50759 llvm-svn: 339868	2018-08-16 11:54:09 +00:00
Sam Parker	13567dbbd8	[ARM] Allow signed icmps in ARMCodeGenPrepare Originally committed in r339755 which was reverted in r339806 due to an asan issue. The issue was caused by my assumption that operands to a CallInst mapped to the FunctionType Params. CallInsts are now handled by iterating over their ArgOperands instead of Operands. Original Message: Treat signed icmps as 'sinks', allowing them to be in the use-def tree, enabling more promotions to be performed. As a sink, any promoted incoming values need to be truncated before being used by the signed icmp. Differential Revision: https://reviews.llvm.org/D50067 llvm-svn: 339858	2018-08-16 10:05:39 +00:00
Simon Atanasyan	a8ac4308aa	[mips] Remove dead code from MipsPassConfig Found by GCC's -Wunused-function. Patch by Kim Gräsman. Differential revision: https://reviews.llvm.org/D50612 llvm-svn: 339847	2018-08-16 08:43:17 +00:00
Craig Topper	9c1d9fdeaa	[X86] Remove masking from the 512-bit padds and psubs intrinsics. Use select in IR instead. llvm-svn: 339842	2018-08-16 06:20:24 +00:00
Craig Topper	9d6983c9fd	[X86] Remove the unused masked 128 and 256-bit masked padds/psubs intrinsics. Still need to remove masking from the 512-bit versions. llvm-svn: 339841	2018-08-16 06:20:22 +00:00
Chandler Carruth	00c35c7794	[x86] Actually initialize the SLH pass with the x86 backend and use a shorter name ('x86-slh') for the internal flags and pass name. Without this, you can't use the -stop-after or -stop-before infrastructure. I seem to have just missed this when originally adding the pass. The shorter name solves two problems. First, the flag names were ... really long and hard to type/manage. Second, the pass name can't be the exact same as the flag name used to enable this, and there are already some users of that flag name so I'm avoiding changing it unnecessarily. llvm-svn: 339836	2018-08-16 01:22:19 +00:00
Matt Arsenault	f533e6b0ed	AMDGPU: Fold fneg into fmed3 llvm-svn: 339821	2018-08-15 21:46:27 +00:00
Matt Arsenault	a816073764	AMDGPU: Improve extract_vector_elt reduction combine Handle fmul, fsub and preserve flags. Also really test minnum/maxnum reductions. The existing tests were only checking from minnum/maxnum matched from a fast math compare and select which is not the same. llvm-svn: 339820	2018-08-15 21:34:06 +00:00
Matt Arsenault	b3a80e5397	AMDGPU: Implement llvm.amdgcn.icmp/fcmp for i16/f16 Also support these on targets without support for these, since it will allow us to freely create these in instcombine. llvm-svn: 339819	2018-08-15 21:25:20 +00:00
Craig Topper	08e082619a	[X86] Improve AVX1 shuffle lowering for v8f32 shuffles where the low half comes from V1 and the high half comes from V2 and the halves do the same operation To lower this we now create a new V1 containing the low half of both sources and a new V2 containing the upper half of both sources. Then we created a repeated lane shuffle of those new sources to create the final result. This fixes PR35833 Differential Revison: https://reviews.llvm.org/D41794 llvm-svn: 339818	2018-08-15 21:21:52 +00:00
Matt Arsenault	6c7ba82900	AMDGPU: Address todo for handling 1/(2 pi) llvm-svn: 339814	2018-08-15 21:03:55 +00:00
Vitaly Buka	ed4239f482	Revert "[ARM] Allow signed icmps in ARMCodeGenPrepare" use-after-poison in check-llvm under asan This reverts commit r339755. llvm-svn: 339806	2018-08-15 20:09:35 +00:00
Thomas Lively	5222cb601b	[WebAssembly][NFC] Standardize SIMD multiclass format Summary: This CL changes the ExtractLane ISEL multiclass to more closely mirror the structure of the splat and replace_lane multiclasses. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D50794 llvm-svn: 339801	2018-08-15 18:15:18 +00:00
Thomas Lively	39fe480832	[WebAssembly] Test commit Changes a comment and some whitespace to test commit access. llvm-svn: 339798	2018-08-15 17:50:22 +00:00
Derek Schuff	82812fb986	[WebAssembly] SIMD replace_lane Implement and test replace_lane instructions. Patch by Thomas Lively Differential Revision: https://reviews.llvm.org/D50750 llvm-svn: 339786	2018-08-15 16:18:51 +00:00
Nemanja Ivanovic	5b9a4f8ee5	[PowerPC] Enhance the selection(ISD::VSELECT) of vector type To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding. Use xxsel to match vselect if vsx is open, or use vsel. In order to do not write many patterns in td file, promote (for vector it's bitcast) all other type into v4i32 and only pattern match vselect of v4i32 into vsel or xxsel. Patch by wuzish Differential revision: https://reviews.llvm.org/D49531 llvm-svn: 339779	2018-08-15 15:30:36 +00:00
Krzysztof Parzyszek	2a119b9a98	[SystemZ] Replace subreg_r with subreg_h Change subreg_r32 -> subreg_h32 subreg_r64 -> subreg_h64 subreg_hr32 -> subreg_hh32 The subregisters subreg_r32 and subreg_r64 were added to emphasize the fact that modifying these subregisters may clobber the entire register. This is not necessarily the case for subreg_h32, et al. However, the ability to compose subreg_h64 with subreg_r32, and with subreg_h32 and subreg_l32 at the same time makes the compositions be treated as non-overlapping (leading to problems when tracking subreg liveness). See D50468 for more details. Differential Revision: https://reviews.llvm.org/D50725 llvm-svn: 339778	2018-08-15 15:21:23 +00:00
Jonas Paulsson	d5a9c2d551	[SystemZ] New CL option to enable subreg liveness This option is needed to enable subreg liveness tracking during register allocation. Review: Ulrich Weigand https://reviews.llvm.org/D50779 llvm-svn: 339776	2018-08-15 15:04:49 +00:00
Sam Parker	fabf7fe5f8	[ARM] TypeSize lower bound for ARMCodeGenPrepare We only try to promote types with are smaller than 16-bits, but we also need to check that the type is not less than 8-bits. Differential Revision: https://reviews.llvm.org/D50769 llvm-svn: 339770	2018-08-15 13:29:50 +00:00
Nemanja Ivanovic	8b4bd09e22	[PowerPC] Don't run BV DAG Combine before legalization if it assumes legal types When trying to combine a DAG that builds a vector out of sign-extensions of vector extracts, the code assumes legal input types. Due to that, we have to disable this combine prior to legalization. In some cases, the DAG will look slightly different after legalization so account for that in the matching code. This is a fix for https://bugs.llvm.org/show_bug.cgi?id=38087 Differential Revision: https://reviews.llvm.org/D49080 llvm-svn: 339769	2018-08-15 12:58:13 +00:00
Simon Pilgrim	f3b5943ffc	Remove lambda default argument to fix gcc pedantic warning. llvm-svn: 339767	2018-08-15 12:32:09 +00:00
Sam Parker	6548cd3905	[ARM] Allow signed icmps in ARMCodeGenPrepare Treat signed icmps as 'sinks', allowing them to be in the use-def tree, enabling more promotions to be performed. As a sink, any promoted incoming values need to be truncated before being used by the signed icmp. Differential Revision: https://reviews.llvm.org/D50067 llvm-svn: 339755	2018-08-15 08:23:03 +00:00
Sam Parker	7def86bbdb	[ARM] Allow pointer values in ARMCodeGenPrepare Add pointers to the list of allowed types, but don't try to promote them. Also fixed a bug with the promotion of undef values, so a new value is now created instead of mutating in place. We also now only promote if there's an instruction in the use-def chains other than the icmp, sinks and sources. Differential Revision: https://reviews.llvm.org/D50054 llvm-svn: 339754	2018-08-15 07:52:35 +00:00
Craig Topper	633fe98e27	[X86] Change legacy SSE scalar fp to integer intrinsics to use specific ISD opcodes instead of keeping as intrinsics. Unify SSE and AVX512 isel patterns. AVX512 added new versions of these intrinsics that take a rounding mode. If the rounding mode is 4 the new intrinsics are equivalent to the old intrinsics. The AVX512 intrinsics were being lowered to ISD opcodes, but the legacy SSE intrinsics were left as intrinsics. This resulted in the AVX512 instructions needing separate patterns for the ISD opcodes and the legacy SSE intrinsics. Now we convert SSE intrinsics and AVX512 intrinsics with rounding mode 4 to the same ISD opcode so we can share the isel patterns. llvm-svn: 339749	2018-08-15 01:23:00 +00:00
Chandler Carruth	139b35192a	[SDAG] Update the AVR backend for the SelectionDAG API changes in r339740, fixing the build for this target. llvm-svn: 339748	2018-08-15 01:22:50 +00:00
Derek Schuff	4ec8bca13e	[WebAssembly] SIMD Splats Implement and test SIMD splat ops. Patch by Thomas Lively Differential Revision: https://reviews.llvm.org/D50741 llvm-svn: 339744	2018-08-15 00:30:27 +00:00
Chandler Carruth	66654b72c9	[SDAG] Remove the reliance on MI's allocation strategy for `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be surprised at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740	2018-08-14 23:30:32 +00:00
Eli Friedman	0d12e90bf5	[ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly. Intentionally excluding nodes from the DAGCombine worklist is likely to lead to weird optimizations and infinite loops, so it's generally a bad idea. To avoid the infinite loops, fix DAGCombine to use the isDesirableToCommuteWithShift target hook before performing the transforms in question, and implement the target hook in the ARM backend disable the transforms in question. Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a reduced testcase for that bug. But we should have sufficient test coverage for PerformSHLSimplify given that we're not playing weird tricks with the worklist. I can try to bugpoint it if necessary, though.) Differential Revision: https://reviews.llvm.org/D50667 llvm-svn: 339734	2018-08-14 22:10:25 +00:00
Heejin Ahn	c9c711a0ac	[WebAssembly] Fix encoding of non-SIMD vector-typed instructions Previously SIMD_I was the same as a normal instruction except for the addition of a HasSIM128 predicate. However, rL339186 changed the encoding of SIMD_I instructions to automatically contain the SIMD prefix byte. This broke the encoding of non-SIMD vector-typed instructions, which had instantiated SIMD_I. This CL corrects this error. Reviewers: aheejin Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D50682 Patch by Thomas Lively (tlively) llvm-svn: 339710	2018-08-14 19:03:36 +00:00
Heejin Ahn	a0fd9c3e9a	[WebAssembly] SIMD extract_lane Implement instruction selection for all versions of the extract_lane instruction. Use explicit sext/zext to differentiate between extract_lane_s and extract_lane_u for applicable types, otherwise default to extract_lane_u. Reviewers: aheejin Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D50597 Patch by Thomas Lively (tlively) llvm-svn: 339707	2018-08-14 18:53:27 +00:00
Andrea Di Biagio	9eaf5aa006	[Tablegen][MCInstPredicate] Removed redundant template argument from class TIIPredicate, and implemented verification rules for TIIPredicates. This patch removes redundant template argument `TargetName` from TIIPredicate. Tablegen can always infer the target name from the context. So we don't need to force users of TIIPredicate to always specify it. This allows us to better modularize the tablegen class hierarchy for the so-called "function predicates". class FunctionPredicateBase has been added; it is currently used as a building block for TIIPredicates. However, I plan to reuse that class to model other function predicate classes too (i.e. not just TIIPredicates). For example, this can be a first step towards implementing proper support for dependency breaking instructions in tablegen. This patch also adds a verification step on TIIPredicates in tablegen. We cannot have multiple TIIPredicates with the same name. Otherwise, this will cause build errors later on, when tablegen'd .inc files are included by cpp files and then compiled. Differential Revision: https://reviews.llvm.org/D50708 llvm-svn: 339706	2018-08-14 18:36:54 +00:00
Simon Pilgrim	2ce3d6e135	[X86][SSE] Avoid duplicate shuffle input sources in combineX86ShufflesRecursively rL339686 added the case where a faux shuffle might have repeated shuffle inputs coming from either side of the OR(). This patch improves the insertion of the inputs into the source ops lists to account for this, as well as making it trivial to add support for shuffles with more than 2 inputs in the future. llvm-svn: 339696	2018-08-14 17:22:37 +00:00
Simon Pilgrim	ed55138247	[X86][SSE] Add shuffle combine support for OR(PSHUFB,PSHUFB) style patterns. If each element is zero from one (or both) inputs then we can combine these into a single shuffle mask. llvm-svn: 339686	2018-08-14 16:00:05 +00:00
Simon Pilgrim	df9880f257	[X86][SSE] Generalize lowerVectorShuffleAsBlendOfPSHUFBs to work with any vXi8 type. We still only use this for v16i8, but this cleans up the code to support v32i8/v64i8 sometime in the future. llvm-svn: 339679	2018-08-14 14:00:14 +00:00
Roger Ferrer Ibanez	c8f4dbbc63	[RISCV] Fix incorrect use of MCInstBuilder This is a fix for r339314. MCInstBuilder uses the named parameter idiom and an 'operator MCInst&' to ease the creation of MCInsts. As the object of MCInstBuilder owns the MCInst is manipulating, the lifetime of the MCInst is bound to that of MCInstBuilder. In r339314 I bound a reference to the MCInst in an initializer. The temporary of MCInstBuilder (and also its MCInst) is destroyed at the end of the declaration leading to a dangling reference. Fix this by using MCInstBuilder inside an argument of a function call. Temporaries in function calls are destroyed in the enclosing full expression, so the the reference to MCInst is still valid when emitToStreamer executes. llvm-svn: 339654	2018-08-14 08:30:42 +00:00
Chih-Mao Chen	5d94b25ffe	Test commit: fix punctuation llvm-svn: 339652	2018-08-14 08:08:39 +00:00
Tomasz Krupa	86a63889f3	[X86] Lowering addus/subus intrinsics to native IR Summary: This revision improves previous version (rL330322) which has been reverted due to crashes. This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. The patch also includes folding of previously missing saturation patterns so that IR emits the same machine instructions as the intrinsics. Reviewers: craig.topper, spatel, RKSimon Reviewed By: craig.topper Subscribers: mike.dvoretsky, DavidKreitzer, sroland, llvm-commits Differential Revision: https://reviews.llvm.org/D46179 llvm-svn: 339650	2018-08-14 08:00:56 +00:00
Sjoerd Meijer	3c859b3ec3	[ARM] ParallelDSP: add option to enable/disable the pass Differential Revision: https://reviews.llvm.org/D50511 llvm-svn: 339645	2018-08-14 07:43:49 +00:00
Wouter van Oortmerssen	a7be375586	Revert "[WebAssembly] Added default stack-only instruction mode for MC." This reverts commit 917a99b71ce21c975be7bfbf66f4040f965d9f3c. llvm-svn: 339630	2018-08-13 23:12:49 +00:00
Craig Topper	cade635c77	[X86] Don't ignore 0x66 prefix on relative jumps in 64-bit mode. Fix opcode selection of relative jumps in 16-bit mode. Treat jno/jo like other jcc instructions. The behavior in 64-bit mode is different between Intel and AMD CPUs. Intel ignores the 0x66 prefix. AMD does not. objump doesn't ignore the 0x66 prefix. Since LLVM aims to match objdump behavior, we should do the same. While I was trying to fix this I had change brtarget16/32 to use ENCODING_IW/ID instead of ENCODING_Iv to get the 0x66+REX.W case to act sort of sanely. It's still wrong, but that's a problem for another day. The change in encoding exposed the fact that 16-bit mode disassembly of relative jumps was creating JMP_4 with a 2 byte immediate. It should have been JMP_2. From just printing you can't tell the difference, but if you dumped the encoding it wouldn't have matched what we started with. While fixing that, it exposed that jo/jno opcodes were missing from the switch that this patch deleted and there were no test cases for them. Fixes PR38537. llvm-svn: 339622	2018-08-13 22:06:28 +00:00
Andrea Di Biagio	7b77b14198	[X86][BtVer2] Use NoSchedPredicate to model default transitions in variant scheduling classes. NFC. llvm-svn: 339589	2018-08-13 17:52:39 +00:00
Krzysztof Parzyszek	cce15c76d3	[Hexagon] Silence -Wuninitialized warning from GCC 5.4, NFC Patch by Kim Gräsman. Differential Revision: https://reviews.llvm.org/D50623 llvm-svn: 339576	2018-08-13 15:08:25 +00:00
Daniel Cederman	dc3e4c6d95	Revert "[Sparc] Add support for the cycle counter available in GR740" It breaks when using EXPENSIVE_CHECKS with the error message "Bad machine code: Using an undefined physical register". llvm-svn: 339570	2018-08-13 14:18:09 +00:00
Sid Manning	8d4a6615e1	Check for tied operands Differential Revision: https://reviews.llvm.org/D50592 llvm-svn: 339567	2018-08-13 14:01:25 +00:00
Jonas Paulsson	5ffb27b166	[SystemZ] Increase the amount of inlining. Implement getInliningThresholdMultiplier() and have it return 3. Review: Ulrich Weigand llvm-svn: 339563	2018-08-13 13:31:30 +00:00
Daniel Cederman	1bfbc62022	[Sparc] Add support for the cycle counter available in GR740 Summary: The GR740 provides an up cycle counter in the registers ASR22 and ASR23. As these registers can not be read together atomically we only use the value of ASR23 for llvm.readcyclecounter(). The ASR23 register holds the 32 LSBs of the up-counter. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48638 llvm-svn: 339551	2018-08-13 10:49:48 +00:00
Luke Geeson	4ce41d2bb7	[ARM] Added FP16 VREV Vector Instrinsic CodeGen support llvm-svn: 339546	2018-08-13 08:37:41 +00:00
Matt Arsenault	13b0db9285	AMDGPU: Check NSZ MI flag when folding omod I'm not sure the exact nsz flag combination that is OK. I think as long as it's on either, this is OK. For now just check it on the omod multiply. llvm-svn: 339513	2018-08-12 08:44:25 +00:00
Matt Arsenault	b5acec1f79	AMDGPU: Use splat vectors for undefs when folding canonicalize If one of the elements is undef, use the canonicalized constant from the other element instead of 0. Splat vectors are more useful for other optimizations, such as matching vector clamps. This was breaking on clamps of half3 from the undef 4th component. llvm-svn: 339512	2018-08-12 08:42:54 +00:00
Matt Arsenault	3ead7d7389	AMDGPU: Fix packing undef parts of build_vector llvm-svn: 339511	2018-08-12 08:42:46 +00:00
Craig Topper	ed8a114c86	[X86] Remove unnecessary AddedComplexity line. NFC The use of the or_is_add predicate already gives enough of a complexity boost to get the patterns ordered properly. llvm-svn: 339507	2018-08-12 03:22:18 +00:00
Craig Topper	b3e3477649	[X86] Remove the AL/AX/EAX/RAX short immediate forms from the macro fusion shouldScheduleAdjacent. NFC These instructions are only created by the backend during MCInst lowering. llvm-svn: 339499	2018-08-11 06:42:51 +00:00
Craig Topper	c6cf169940	[X86] Add the mem-reg form of CMP to the macro fusion shouldScheduleAdjacent. Unlike the other arithmetic instructions the mem-reg form of compare is just a load and not a RMW operation. According to the Intel optimization manual, this form is also supported by macro fusion. llvm-svn: 339498	2018-08-11 06:42:50 +00:00
Craig Topper	616eeb827d	[X86] Remove ADD8mi and ADDmr from the macro fusion shouldScheduleAdjacent. The are RMW of memory operations. They aren't eligible for macro fusion. llvm-svn: 339497	2018-08-11 06:42:49 +00:00
Craig Topper	570d47a010	[X86] Change the MOV32ri64 pseudo instruction to def a GR64 directly instead of wrapping it in a SUBREG_TO_REG. Now we switch to the subregister in expandPostRAPseudos where we already switched the opcode. This simplifies a few isel patterns that used the pseudo directly. And magically seems to have improved our ability to CSE it in the undef-label.ll test. llvm-svn: 339496	2018-08-11 05:33:00 +00:00
Richard Trieu	01f99f3cd6	Fix WebAssembly instruction printer after r339474 Treat the stack variants of control instructions the same as regular instructions. Otherwise, the vector ControlFlowStack will be the wrong size and have out-of-bounds access. This was detected by MemorySanitizer. llvm-svn: 339495	2018-08-11 04:18:05 +00:00
Tom Stellard	8adc86a7dc	AMDGPU/GlobalISel: Define instruction mapping for G_INSERT Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49625 llvm-svn: 339491	2018-08-11 00:51:54 +00:00
Eli Friedman	6b84a48953	Fix unused lambda capture warning from r339472. llvm-svn: 339479	2018-08-10 22:03:25 +00:00
Wouter van Oortmerssen	ab26bd0647	[WebAssembly] Added default stack-only instruction mode for MC. Summary: Moved Explicit Locals pass to last. Made that pass obligatory. Made it convert from register to stack based instructions, and removed the registers. Fixes to related code that was expecting register based instructions. Added the correct testing flag to all tests, depending on what the format they were expecting so far. Translated one test to stack format as example: reg-stackify-stack.ll tested: llvm-lit -v `find test -name WebAssembly` unittests/MC/* Reviewers: dschuff, sunfish Subscribers: jfb, llvm-commits, aheejin, eraman, jgravelle-google, sbc100 Differential Revision: https://reviews.llvm.org/D50568 llvm-svn: 339474	2018-08-10 21:32:47 +00:00
Eli Friedman	e1687a89e8	[ARM] Adjust AND immediates to make them cheaper to select. LLVM normally prefers to minimize the number of bits set in an AND immediate, but that doesn't always match the available ARM instructions. In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer a two-instruction sequence movs+ands or movs+bics. Some potential improvements outlined in ARMTargetLowering::targetShrinkDemandedConstant, but seems to work pretty well already. The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX instruction due to a larger-than-expected mask. (It's orthogonal, in some sense, but as far as I can tell it's either impossible or nearly impossible to reproduce the bug without this change.) According to my testing, this seems to consistently improve codesize by a small amount by forming bic more often for ISD::AND with an immediate. Differential Revision: https://reviews.llvm.org/D50030 llvm-svn: 339472	2018-08-10 21:21:53 +00:00
Matt Arsenault	940e6075e4	AMDGPU: More canonicalized operations llvm-svn: 339464	2018-08-10 19:20:17 +00:00
Matt Arsenault	3dcf4ce435	AMDGPU: Combine and of seto/setuo and fp_class Clear the nan (or non-nan) test bits from the mask. llvm-svn: 339462	2018-08-10 18:58:56 +00:00
Matt Arsenault	8ad00d30fa	AMDGPU: Match isfinite pattern to class instructions llvm-svn: 339460	2018-08-10 18:58:41 +00:00
Matt Arsenault	5bb9d798b4	AMDGPU: Add LLVM_FALLTHROUGH llvm-svn: 339458	2018-08-10 17:57:12 +00:00
Sam Parker	8c4b964c5a	[ARM] Disallow zexts in ARMCodeGenPrepare Enabling ARMCodeGenPrepare by default caused a whole load of failures. This is due to zexts and truncs not being handled properly. ZExts are messy so it's just easier to disable for now and truncs are allowed only as 'sinks'. I still need to figure out why allowing them as 'sources' causes so many failures. The other main changes are that we are explicit in the types that we converting to, it's now always 'TypeSize'. Type support is also now performed while checking for valid opcodes as it unnecessarily complicated having the checks are different stages. I've moved the tests around too, so we have the zext and truncs in their own file as well as the overflowing opcode tests. Differential Revision: https://reviews.llvm.org/D50518 llvm-svn: 339432	2018-08-10 13:57:13 +00:00
Simon Pilgrim	130b00bc43	[X86][SSE] Pull out repeated shift getOpcode() calls. NFCI. llvm-svn: 339425	2018-08-10 11:42:42 +00:00
Heejin Ahn	5831e9cc79	[WebAssembly] Gate i64x2 and f64x2 on -wasm-enable-unimplemented Summary: i64x2 and f64x2 operations are not implemented in V8, so we normally do not want to emit them. However, they are in the SIMD spec proposal, so we still want to be able to test them in the toolchain. This patch adds a flag to enable their emission. Reviewers: aheejin, dschuff Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D50423 Patch by Thomas Lively (tlively) llvm-svn: 339407	2018-08-09 23:58:51 +00:00
Craig Topper	9a8136f7b4	[X86] Qualify one of the heuristics in combineMul to only apply to positive multiply amounts. This seems to slightly help the performance of one of our internal benchmarks. We probably need better heuristics here. llvm-svn: 339406	2018-08-09 23:27:42 +00:00
Heejin Ahn	41b25c6cf4	[WebAssembly] Fix wasm backend compilation on gcc 5.4: variable name cannot match class Summary: gcc does not like const Region *Region; It wants a different name for the variable. Is there a better convention for what name to use in such a case? Reviewers: sbc100, aheejin Subscribers: aheejin, jgravelle-google, dschuff, llvm-commits Differential Revision: https://reviews.llvm.org/D50472 Patch by Alon Zakai (kripken) llvm-svn: 339398	2018-08-09 22:35:23 +00:00
Reid Kleckner	fce7f73bec	[MC] Move EH DWARF encodings from MC to CodeGen, NFC Summary: The TType encoding, LSDA encoding, and personality encoding are all passed explicitly by CodeGen to the assembler through .cfi_* directives, so only the AsmPrinter needs to know about them. The FDE CFI encoding however, controls the encoding of the label implicitly created by the .cfi_startproc directive. That directive seems to be special in that it doesn't take an encoding, so the assembler just has to know how to encode one DSO-local label reference from .eh_frame to .text. As a result, it looks like MC will continue to have to know when the large code model is in use. Perhaps we could invent a '.cfi_startproc [large]' flag so that this knowledge doesn't need to pollute the assembler. Reviewers: davide, lliu0, JDevlieghere Subscribers: hiraditya, fedor.sergeev, llvm-commits Differential Revision: https://reviews.llvm.org/D50533 llvm-svn: 339397	2018-08-09 22:24:04 +00:00
Ana Pazos	10de234905	[RISC-V] Fixed alias for addi x2, x2, 0 A missing check for non-zero immediate in MCOperandPredicate caused c.addi16sp sp, 0 to be selected which is not a valid instruction. llvm-svn: 339381	2018-08-09 20:51:53 +00:00
Krzysztof Parzyszek	75c2ca3638	[Hexagon] Map ISD::TRAP to J2_trap0(#0 ) llvm-svn: 339365	2018-08-09 18:03:45 +00:00
Evandro Menezes	8c4366273c	[ARM] Adjust the feature set for Exynos Enable `FeatureZCZeroing`, `FeatureHasSlowFPVMLx`, `FeatureExpandMLx`, `FeatureProfUnpredicate`, `FeatureSlowVDUP32`, `FeatureSlowVGETLNi32`, `FeatureSplatVFPToNeon`, `FeatureHasRetAddrStack`, `FeatureSlowFPBrcc` for all Exynos processors. llvm-svn: 339356	2018-08-09 16:34:38 +00:00
Evandro Menezes	9a92fe0c9e	[ARM] Replace processor check with feature Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a processor check. Otherwise, NFC. llvm-svn: 339354	2018-08-09 16:13:24 +00:00
Andrea Di Biagio	f3bde0485c	[MC][PredicateExpander] Extend the grammar to support simple switch and return statements. This patch introduces tablegen class MCStatement. Currently, an MCStatement can be either a return statement, or a switch statement. ``` MCStatement: MCReturnStatement MCOpcodeSwitchStatement ``` A MCReturnStatement expands to a return statement, and the boolean expression associated with the return statement is described by a MCInstPredicate. An MCOpcodeSwitchStatement is a switch statement where the condition is a check on the machine opcode. It allows the definition of multiple checks, as well as a default case. More details on the grammar implemented by these two new constructs can be found in the diff for TargetInstrPredicates.td. This patch makes it easier to read the body of auto-generated TargetInstrInfo predicates. In future, I plan to reuse/extend the MCStatement grammar to describe more complex target hooks. For now, this is just a first step (mostly a minor cosmetic change to polish the new predicates framework). Differential Revision: https://reviews.llvm.org/D50457 llvm-svn: 339352	2018-08-09 15:32:48 +00:00
Sjoerd Meijer	806f70d229	[ARM] FP16: codegen support for VTRN Differential Revision: https://reviews.llvm.org/D50454 llvm-svn: 339340	2018-08-09 12:45:09 +00:00
Simon Pilgrim	511c3fc529	[X86][SSE] Remove PMULDQ/PMULUDQ by zero Exposed by D50328 Differential Revision: https://reviews.llvm.org/D50328 llvm-svn: 339337	2018-08-09 12:37:36 +00:00
Simon Pilgrim	01ae462fef	[X86][SSE] Combine (some) target shuffles with multiple uses As discussed on D41794, we have many cases where we fail to combine shuffles as the input operands have other uses. This patch permits these shuffles to be combined as long as they don't introduce additional variable shuffle masks, which should reduce instruction dependencies and allow the total number of shuffles to still drop without increasing the constant pool. However, this may mean that some memory folds may no longer occur, and on pre-AVX require the occasional extra register move. This also exposes some poor PMULDQ/PMULUDQ codegen which was doing unnecessary upper/lower calculations which will in fact fold to zero/undef - the fix will be added in a followup commit. Differential Revision: https://reviews.llvm.org/D50328 llvm-svn: 339335	2018-08-09 12:30:02 +00:00
Andrew V. Tischenko	24f63bcb34	[X86] Improved sched models for X86 XCHGrr and XADDrr instructions. Differential Revision: https://reviews.llvm.org/D49861 llvm-svn: 339321	2018-08-09 09:23:26 +00:00
Jonas Hahnfeld	20526bf483	[NVPTX] Select atomic loads and stores According to PTX ISA .volatile has the same memory synchronization semantics as .relaxed.sys, so it can be used to implement monotonic atomic loads and stores. This is important for OpenMP's atomic construct where - 'read's and 'write's are lowered to atomic loads and stores, and - an update of float or double types are lowered into a cmpxchg loop. (Note that PTX could do better because it has atom.add.f{32,64} but LLVM's atomicrmw instruction only allows integer types.) Higher levels of atomicity (like acquire and release) need additional synchronization properties which were added with PTX ISA 6.0 / sm_70. So using these instructions still results in an error. Differential Revision: https://reviews.llvm.org/D50391 llvm-svn: 339316	2018-08-09 07:45:49 +00:00
Roger Ferrer Ibanez	577a97e2b9	[RISCV] Add "lla" pseudo-instruction to assembler This pseudo-instruction is similar to la but uses PC-relative addressing unconditionally. This is, la is only different to lla when using -fPIC. This pseudo-instruction seems often forgotten in several specs but it is definitely mentioned in binutils opcodes/riscv-opc.c. The semantics are defined both in page 37 of the "RISC-V Reader" book but also in function macro found in gas/config/tc-riscv.c. This is a very first step towards adding PIC support for Linux in the RISC-V backend. The lla pseudo-instruction expands to a sequence of auipc + addi with a couple of pc-rel relocations where the second points to the first one. This is described in https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#pc-relative-symbol-addresses For now, this patch only introduces support of that pseudo instruction at the assembler parser. Differential Revision: https://reviews.llvm.org/D49661 llvm-svn: 339314	2018-08-09 07:08:20 +00:00
Eli Friedman	5b45a39056	[ARM] Avoid spilling lr with Thumb1 tail calls. Normally, if any registers are spilled, we prefer to spill lr on Thumb1 so we can fold the "bx lr" into the "pop". However, if there are tail calls involved, restoring lr is expensive, so skip the optimization in that case. The spill of r7 in the new test also isn't necessary, but that's mostly orthogonal to this patch. (It's the same code in ARMFrameLowering, but it's not related to tail calls.) Differential Revision: https://reviews.llvm.org/D49459 llvm-svn: 339283	2018-08-08 20:03:10 +00:00
Krzysztof Parzyszek	1df7059150	[Hexagon] Diagnose misaligned absolute loads and stores Differential Revision: https://reviews.llvm.org/D50405 llvm-svn: 339272	2018-08-08 17:00:09 +00:00
Matt Arsenault	935f3b70fe	AMDGPU: Error more gracefully on libcalls I think this is the only situation where the callsite will have a null instruction. llvm-svn: 339271	2018-08-08 16:58:39 +00:00
Matt Arsenault	e719139b10	AMDGPU: Fix shifts for i128 llvm-svn: 339270	2018-08-08 16:58:33 +00:00
Zaara Syeda	b2595b988b	[PowerPC] Improve codegen for vector loads using scalar_to_vector This patch aims to improve the codegen for vector loads involving the scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X) to utilize: LXSD and LXSDX for i64 and f64 LXSIWAX for i32 (sign extension to i64) LXSIWZX for i32 and f64 Committing on behalf of Amy Kwan. Differential Revision: https://reviews.llvm.org/D48950 llvm-svn: 339260	2018-08-08 15:20:43 +00:00
Alex Bradbury	07224dfb47	[RISCV] Add mnemonic alias: move, sbreak and scall. Further improve compatibility with the GNU assembler. Differential Revision: https://reviews.llvm.org/D50217 Patch by Kito Cheng. llvm-svn: 339255	2018-08-08 14:53:45 +00:00
Alex Bradbury	7d8d87c143	[RISCV] Add InstAlias definitions for add[w], and, xor, or, sll[w], srl[w], sra[w], slt and sltu with immediate Match the GNU assembler in supporting immediate operands for these instructions even when the reg-reg mnemonic is used. Differential Revision: https://reviews.llvm.org/D50046 Patch by Kito Cheng. llvm-svn: 339252	2018-08-08 14:45:44 +00:00
Sjoerd Meijer	f8c394f0f5	[ARM] FP16: codegen support for VEXT Differential Revision: https://reviews.llvm.org/D50427 llvm-svn: 339241	2018-08-08 13:26:38 +00:00
Sjoerd Meijer	db5908deb9	[ARM] FP16: vector vmov and vdup support This adds codegen support for the vmov_n_f16 and vdup_n_f16 variants. Differential Revision: https://reviews.llvm.org/D50329 llvm-svn: 339238	2018-08-08 13:11:31 +00:00
Sjoerd Meijer	920a453485	[ARM] FP16: vector VMUL variants This adds codegen support for the vmul_lane_f16 and vmul_n_f16 variants. Differential Revision: https://reviews.llvm.org/D50326 llvm-svn: 339232	2018-08-08 10:27:34 +00:00
Benjamin Kramer	83996e4dee	[Wasm] Don't iterate over MachineBasicBlock::successors while erasing from it This will read out of bounds. Found by asan. llvm-svn: 339230	2018-08-08 10:13:19 +00:00
Sjoerd Meijer	b33a4c02cc	[ARM] FP16: support vector INT_TO_FP and FP_TO_INT This adds codegen support for the different vcvt_f16 variants. Differential Revision: https://reviews.llvm.org/D50393 llvm-svn: 339227	2018-08-08 09:45:34 +00:00
Sjoerd Meijer	b264944ed5	[ARM] FP16: support the vector vmin and vmax variants Differential Revision: https://reviews.llvm.org/D50238 llvm-svn: 339221	2018-08-08 07:20:15 +00:00
Jan Vesely	7b2c98ab59	AMDGPU: Remove broken i16 ternary patterns Fixup test to check for GCN prefix These patterns always zero extend the result even though it might need sign extension. This has been broken since the addition of i16 support. It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence: v_mad_i16 v2, v10, v9, v11 v_med3_i32 v2, v2, v8, v7 Fixes mad_sat(char) piglit on VI. Differential Revision: https://reviews.llvm.org/D49836 llvm-svn: 339190	2018-08-07 21:54:37 +00:00
Derek Schuff	51ed131ed2	[WebAssembly] Update SIMD binary arithmetic Add missing SIMD types (v2f64) and binary ops. Also adds tablegen support for automatically prepending prefix byte to SIMD opcodes. Differential Revision: https://reviews.llvm.org/D50292 Patch by Thomas Lively llvm-svn: 339186	2018-08-07 21:24:01 +00:00
Krzysztof Parzyszek	e7ce247dd7	[Hexagon] Allow use of gather intrinsics even with no-packets Vgather requires must be in a packet with a store, which contradicts the no-packets feature. As a consequence, gather/scatter could not be used with no-packets. Relax this, and allow gather packets as exceptions to the no-packets requirements. llvm-svn: 339177	2018-08-07 20:33:47 +00:00
Heejin Ahn	7fb68d2679	[WebAssembly] CFG sort support for exception handling Summary: This patch extends CFGSort pass to support exception handling. Once it places a loop header, it does not place blocks that are not dominated by the loop header until all the loop blocks are sorted. This patch extends the same algorithm to exception 'catch' part, using the information calculated by WebAssemblyExceptionInfo class. Reviewers: dschuff, sunfish Subscribers: sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D46500 llvm-svn: 339172	2018-08-07 20:19:23 +00:00
Craig Topper	deb2899b2d	[SelectionDAG][X86][SystemZ] Add a generic nonvolatile_store/nonvolatile_load pattern fragment in TargetSelectionDAG.td Differential Revision: https://reviews.llvm.org/D50358 llvm-svn: 339156	2018-08-07 17:34:59 +00:00
Sjoerd Meijer	b39cd886b9	[ARM] FP16: codegen support for VACGT Differential Revision: https://reviews.llvm.org/D50236 llvm-svn: 339148	2018-08-07 15:11:47 +00:00
Jonas Paulsson	5438f1debc	[SystemZ] Comment update. Update the comment in nextGroup since the ProcResourceCounters are not anymore always decremented with '1'. llvm-svn: 339140	2018-08-07 13:48:09 +00:00
Jonas Paulsson	25cbfdd423	[SystemZ] NFC: Remove redundant check in SystemZHazardRecognizer. Remove the redundant check against zero when updating ProcResourceCounters in nextGroup(), as pointed out in https://reviews.llvm.org/D50187. Review: Ulrich Weigand. llvm-svn: 339139	2018-08-07 13:44:11 +00:00
Aleksandar Beserminji	949a17c016	[mips] Handle branch expansion corner cases When potential jump instruction and target are in the same segment, use jump instruction with immediate field. In cases where offset does not fit immediate value of a bc/j instructions, offset is stored into register, and then jump register instruction is used. Differential Revision: https://reviews.llvm.org/D48019 llvm-svn: 339126	2018-08-07 10:45:45 +00:00
Matt Arsenault	96b678427a	AMDGPU: Add feature vi-insts This is necessary to add a VI specific builtin, __builtin_amdgcn_s_dcache_wb. We already have an overly specific feature for one of these builtins, for s_memrealtime. I'm not sure whether it's better to add more of those, or to get rid of that and merge it with vi-insts. Alternatively, maybe this logically goes with scalar-stores? llvm-svn: 339104	2018-08-07 07:28:46 +00:00
Craig Topper	9de1797c50	[SelectionDAG][X86] Rename MaskedLoadSDNode::getSrc0 to getPassThru. Src0 doesn't really convey any meaning to what the operand is. Passthru matches what's used in the documentation for the intrinsic this comes from. llvm-svn: 339101	2018-08-07 06:52:49 +00:00
Craig Topper	17989208a9	[SelectionDAG][X86] Rename getValue to getPassThru for gather SDNodes. getValue is more meaningful name for scatter than it is for gather. Split them and use getPassThru for gather. llvm-svn: 339096	2018-08-07 06:13:40 +00:00
Heejin Ahn	e8653bb89a	[WebAssembly] Enable atomic expansion for unsupported atomicrmws Summary: Wasm does not have direct counterparts to some of LLVM IR's atomicrmw instructions (min, max, umin, umax, and nand). This enables atomic expansion using cmpxchg instruction within a loop for those atomicrmw instructions. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49440 llvm-svn: 339084	2018-08-07 00:22:22 +00:00
Derek Schuff	2c78385960	[WebAssembly] Replace SIMD expression types with V128 Summary: The spec only defines a SIMD expression type of V128 and leaves interpretation of different vector types to the instructions. Differential Revision: https://reviews.llvm.org/D50367 Patch by Thomas Lively llvm-svn: 339082	2018-08-06 23:16:50 +00:00
Matt Arsenault	08f3fe4fae	AMDGPU: cvt_pk_rtz_f16 canonicalizes llvm-svn: 339078	2018-08-06 23:01:31 +00:00
Matt Arsenault	e94ee833f9	AMDGPU: Handle some vector operations in isCanonicalized llvm-svn: 339077	2018-08-06 22:45:51 +00:00
Matt Arsenault	a29e76244a	AMDGPU: Push fcanonicalize through partially constant build_vector This usually avoids some re-packing code, and may help find canonical sources. llvm-svn: 339072	2018-08-06 22:30:44 +00:00
Matt Arsenault	f2a167fb1d	AMDGPU: Refactor fcanonicalize combine This will make more complex combines easier. llvm-svn: 339070	2018-08-06 22:10:26 +00:00
Matt Arsenault	d49ab0b214	AMDGPU: Treat more custom operations as canonicalizing Everything should quiet, and I think everything should flush. I assume the min3/med3/max3 follow the same rules as regular min/max for flushing, which should at least be conservatively correct. There are still more operations that need to be handled. llvm-svn: 339065	2018-08-06 21:58:11 +00:00
Matt Arsenault	ce6d61fba8	AMDGPU: Conversions always produce canonical results Not sure why this was checking for denormals for f16. My interpretation of the IEEE standard is conversions should produce a canonical result, and the ISA manual says denormals are created when appropriate. llvm-svn: 339064	2018-08-06 21:51:52 +00:00
Matt Arsenault	f8768bfc84	AMDGPU: Fix implementation of isCanonicalized If denormals are enabled, denormals are canonical. Also fix a few other issues. minnum/maxnum are supposed to canonicalize. Temporarily improve workaround for the instruction behavior change in gfx9. Handle selects and fcopysign. The tests were also largely broken, since they were checking for a flush used on some targets after the store of the result. llvm-svn: 339061	2018-08-06 21:38:27 +00:00
Easwaran Raman	10fd92dd94	[X86] Recognize a splat of negate in isFNEG Summary: Expand isFNEG so that we generate the appropriate F(N)M(ADD\|SUB) instructions in more cases. For example, the following sequence a = _mm256_broadcast_ss(f) d = _mm256_fnmadd_ps(a, b, c) generates an fsub and fma without this patch and an fnma with this change. Reviewers: craig.topper Subscribers: llvm-commits, davidxl, wmi Differential Revision: https://reviews.llvm.org/D48467 llvm-svn: 339043	2018-08-06 19:23:38 +00:00
Craig Topper	0076477a4c	[X86] When using "and $0" and "orl $-1" to store 0 and -1 for minsize, make sure the store isn't volatile If the store is volatile this might be a memory mapped IO access. In that case we shouldn't generate a load that didn't exist in the source Differential Revision: https://reviews.llvm.org/D50270 llvm-svn: 339041	2018-08-06 18:44:26 +00:00
Matt Arsenault	0d1b3934e2	AMDGPU: Fold v_lshl_or_b32 with 0 src0 Appears from expansion of some packed cases. llvm-svn: 339025	2018-08-06 15:40:20 +00:00
Bryan Chan	e023706471	[AArch64] Fix assertion failure on widened f16 BUILD_VECTOR Summary: Ensure that NormalizedBuildVector returns a BUILD_VECTOR with operands of the same type. This fixes an assertion failure in VerifySDNode. Reviewers: SjoerdMeijer, t.p.northover, javed.absar Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50202 llvm-svn: 339013	2018-08-06 14:14:41 +00:00
Tim Northover	9956e4a24b	ARM-MachO: don't add Thumb bit for addend to non-external relocation. ld64 supplies its own Thumb bit for Thumb functions, and intentionally zeroes out that part of any addend in an object file. But it only does that for symbols marked N_EXT -- i.e. external symbols. So LLVM should avoid setting that extra bit in other cases. llvm-svn: 339007	2018-08-06 11:32:44 +00:00
David Bolvansky	c0aa4b75a4	Enrich inline messages Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338969	2018-08-05 14:53:08 +00:00
Craig Topper	3c869cb5e5	[X86] Add isel patterns for atomic_load+sub+atomic_sub. Despite the comment removed in this patch, this is beneficial when the RHS of the sub is a register. llvm-svn: 338930	2018-08-03 22:08:30 +00:00
Craig Topper	d7391eefdf	[X86] Remove RELEASE_ and ACQUIRE_ pseudo instructions. Use isel patterns and the normal instructions instead At one point in time acquire implied mayLoad and mayStore as did release. Thus we needed separate pseudos that also carried that property. This appears to no longer be the case. I believe it was changed in 2012 with a comment saying that atomic memory accesses are marked volatile which preserves the ordering. So from what I can tell we shouldn't need additional pseudos since they aren't carry any flags that are different from the normal instructions. The only thing I can think of is that we may consider them for load folding candidates in the peephole pass now where we didn't before. If that's important hopefully there's something in the memory operand we can check to prevent the folding without relying on pseudo instructions. Differential Revision: https://reviews.llvm.org/D50212 llvm-svn: 338925	2018-08-03 21:40:44 +00:00
Matt Arsenault	c3dc8e65e2	DAG: Enhance isKnownNeverNaN Add a parameter for testing specifically for sNaNs - at least one instruction pattern on AMDGPU needs to check specifically for this. Also handle more cases, and add a target hook for custom nodes, similar to the hooks for known bits. llvm-svn: 338910	2018-08-03 18:27:52 +00:00
Artem Belevich	0a11b6366a	[NVPTX] Handle __nvvm_reflect("__CUDA_ARCH"). Summary: libdevice in recent CUDA versions relies on __nvvm_reflect() to select GPU-specific bitcode. This patch addresses the requirement. Reviewers: jlebar Subscribers: jholewinski, sanjoy, hiraditya, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D50207 llvm-svn: 338908	2018-08-03 18:05:24 +00:00
Craig Topper	feb2a58860	[X86] Add a DAG combine for the __builtin_parity idiom used by clang to enable better codegen Clang uses "ctpop & 1" to implement __builtin_parity. If the popcnt instruction isn't supported this generates a large amount of code to calculate the population count. Instead we can bisect the data down to a single byte using xor and then check the parity flag. Even when popcnt is supported, its still a good idea to split 64-bit data on 32-bit targets using an xor in front of a single popcnt. Otherwise we get two popcnts and an add before the and. I've specifically targeted this at the sizes supported by clang builtins, but we could generalize this if we think that's useful. Differential Revision: https://reviews.llvm.org/D50165 llvm-svn: 338907	2018-08-03 18:00:29 +00:00
Nicholas Wilson	e408a89a3a	[WebAssembly] Cleanup of the way globals and global flags are handled Differential Revision: https://reviews.llvm.org/D44030 llvm-svn: 338894	2018-08-03 14:33:37 +00:00
Jonas Paulsson	f107b7275c	[SystemZ] Improve handling of instructions which expand to several groups Some instructions expand to more than one decoder group. This has been hitherto ignored, but is handled with this patch. Review: Ulrich Weigand https://reviews.llvm.org/D50187 llvm-svn: 338849	2018-08-03 10:43:05 +00:00
Sjoerd Meijer	d62c5ec2fe	[ARM] FP16: support vector zip and unzip This is addressing PR38404. Differential Revision: https://reviews.llvm.org/D50186 llvm-svn: 338835	2018-08-03 09:24:29 +00:00
Sjoerd Meijer	9b30213828	[ARM] FP16: support VFMA This is addressing PR38404. llvm-svn: 338830	2018-08-03 09:12:56 +00:00
Craig Topper	a7a12399a1	[X86] Remove all the vector NOP bitcast patterns. Use a few lines of code in the Select method in X86ISelDAGToDAG.cpp instead. There are a lot of permutations of types here generating a lot of patterns in the isel table. It's more efficient to just ReplaceUses and RemoveDeadNode from the Select function. The test changes are because we have a some shuffle patterns that have a bitcast as their root node. But the behavior is identical to another instruction whose pattern doesn't start with a bitcast. So this isn't a functional change. llvm-svn: 338824	2018-08-03 07:01:10 +00:00
Craig Topper	e902b7d0b0	[X86] Support fp128 and/or/xor/load/store with VEX and EVEX encoded instructions. Move all the patterns to X86InstrVecCompiler.td so we can keep SSE/AVX/AVX512 all in one place. To save some patterns we'll use an existing DAG combine to convert f128 fand/for/fxor to integer when sse2 is enabled. This allows use to reuse all the existing patterns for v2i64. I believe this now makes SHA instructions the only case where VEX/EVEX and legacy encoded instructions could be generated simultaneously. llvm-svn: 338821	2018-08-03 06:12:56 +00:00
Craig Topper	a80352c04e	[X86] When post-processing the DAG to remove zero extending moves for YMM/ZMM, make sure the producing instruction is VEX/XOP/EVEX encoded. If the producing instruction is legacy encoded it doesn't implicitly zero the upper bits. This is important for the SHA instructions which don't have a VEX encoded version. We might also be able to hit this with the incomplete f128 support that hasn't been ported to VEX. llvm-svn: 338812	2018-08-03 04:49:42 +00:00
Craig Topper	b2cc9a1d44	[X86] Add R13D to the isInefficientLEAReg in FixupLEAs. I'm assuming the R13 restriction extends to R13D. Guessing this restriction is related to the funny encoding of this register as base always requiring a displacement to be encoded. llvm-svn: 338806	2018-08-03 03:45:19 +00:00
Craig Topper	2c095444a4	[X86] Prevent promotion of i16 add/sub/and/or/xor to i32 if we can fold an atomic load and atomic store. This makes them consistent with i8/i32/i64. Which still seems to be more aggressive on folding than icc, gcc, or MSVC. llvm-svn: 338795	2018-08-03 00:37:34 +00:00
Tim Renouf	366a49d986	[AMDGPU] Minor change to d16 buffer load implementation Summary: By not reconstructing the operand list of the SDNode, this change makes it easier to add the forthcoming new tbuffer and buffer intrinsics. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49995 Change-Id: I0cb79ef0801532645d7dd954a6d7355139db7b38 llvm-svn: 338784	2018-08-02 23:33:01 +00:00
Tim Renouf	abd85fb1f5	[AMDGPU] Reworked SIFixWWMLiveness Summary: I encountered some problems with SIFixWWMLiveness when WWM is in a loop: 1. It sometimes gave invalid MIR where there is some control flow path to the new implicit use of a register on EXIT_WWM that does not pass through any def. 2. There were lots of false positives of registers that needed to have an implicit use added to EXIT_WWM. 3. Adding an implicit use to EXIT_WWM (and adding an implicit def just before the WWM code, which I tried in order to fix (1)) caused lots of the values to be spilled and reloaded unnecessarily. This commit is a rework of SIFixWWMLiveness, with the following changes: 1. Instead of considering any register with a def that can reach the WWM code and a def that can be reached from the WWM code, it now considers three specific cases that need to be handled. 2. A register that needs liveness over WWM to be synthesized now has it done by adding itself as an implicit use to defs other than the dominant one. Also added the following fixmes: FIXME: We should detect whether a register in one of the above categories is already live at the WWM code before deciding to add the implicit uses to synthesize its liveness. FIXME: I believe this whole scheme may be flawed due to the possibility of the register allocator doing live interval splitting. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46756 Change-Id: Ie7fba0ede0378849181df3f1a9a7a39ed1a94a94 llvm-svn: 338783	2018-08-02 23:31:32 +00:00
Craig Topper	63873db5c4	[X86] Allow 'atomic_store (neg/not atomic_load)' to isel to a RMW instruction. There was a FIXMe in the td file about a type inference issue that was easy to fix. llvm-svn: 338782	2018-08-02 23:30:38 +00:00
Tim Renouf	f1c7b92a6a	[AMDGPU] Avoid using divergent value in mubuf addr64 descriptor Summary: This fixes a problem where a load from global+idx generated incorrect code on <=gfx7 when the index is divergent. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47383 Change-Id: Ib4d177d6254b1dd3f8ec0203fdddec94bd8bc5ed llvm-svn: 338779	2018-08-02 22:53:57 +00:00
Krzysztof Parzyszek	d91a9e27a9	[Hexagon] Simplify CFG after atomic expansion This will remove suboptimal branching from the generated ll/sc loops. The extra simplification pass affects a lot of testcases, which have been modified to accommodate this change: either by modifying the test to become immune to the CFG simplification, or (less preferablt) by adding option -hexagon-initial-cfg-clenaup=0. llvm-svn: 338774	2018-08-02 22:17:53 +00:00
Heejin Ahn	4128cb0b6b	[WebAssembly] Support for atomic.wait / atomic.wake instructions Summary: This adds support for atomic.wait / atomic.wake instructions in the wasm thread proposal. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49395 llvm-svn: 338770	2018-08-02 21:44:24 +00:00
Sam Clegg	41d7047de5	[WebAssembly] Ensure bitcasts that would result in invalid wasm are removed by FixFunctionBitcasts Rather than allowing invalid bitcasts to be lowered to wasm call instructions that won't validate, generate wrappers that contain unreachable thereby delaying the error until runtime. Differential Revision: https://reviews.llvm.org/D49517 llvm-svn: 338744	2018-08-02 17:38:06 +00:00
Craig Topper	0423881820	[X86] Allow fake unary unpckhpd and movhlps to be commuted for execution domain fixing purposes These instructions perform the same operation, but the semantic of which operand is destroyed is reversed. If the same register is used as both operands we can change the execution domain without worrying about this difference. Unfortunately, this really only works in cases where the input register is killed by the instruction. If its not killed, the two address isntruction pass inserts a copy that will become a move instruction. This makes the instruction use different physical registers that contain the same data at the time the unpck/movhlps executes. I've considered using a unary pseudo instruction with tied operand to trick the two address instruction pass. We could then expand the pseudo post regalloc to get the same physical register on both inputs. Differential Revision: https://reviews.llvm.org/D50157 llvm-svn: 338735	2018-08-02 16:48:01 +00:00
Matt Arsenault	36cdcfadcf	AMDGPU: Fix scalarizing v4f16 fcanonicalize llvm-svn: 338714	2018-08-02 13:43:42 +00:00
Simon Pilgrim	8b16e15d47	[X86][SSE] Pull out duplicate VSELECT to shuffle mask code. NFCI. llvm-svn: 338693	2018-08-02 09:20:27 +00:00
Alexander Ivchenko	48eba54f18	[GlobalISel] Fix typo with missed override specifier llvm-svn: 338689	2018-08-02 08:55:05 +00:00
Alexander Ivchenko	49168f6778	[GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per Value This is logical continuation of https://reviews.llvm.org/D46018 (r332449) Differential Revision: https://reviews.llvm.org/D49660 llvm-svn: 338685	2018-08-02 08:33:31 +00:00
David Green	ea60446c6d	[AArch64] Add support for got relocated LDR's As a part of adding the tiny codemodel, we need to support ldr's with :got: relocations on them. This seems to be mostly already done, just needs the relocation type support. Differential Revision: https://reviews.llvm.org/D50137 llvm-svn: 338673	2018-08-02 06:24:40 +00:00
Kito Cheng	dffce953bf	Test commit. llvm-svn: 338672	2018-08-02 05:38:18 +00:00
Nemanja Ivanovic	e1a525ed06	[PowerPC] Do not round values prior to converting to integer Adding the FP_ROUND nodes when combining FP_TO_[SU]INT of elements feeding a BUILD_VECTOR into an FP_TO_[SU]INT of the built vector loses precision. This patch removes the code that adds these nodes to true f64 operands. It also adds patterns required to ensure the code is still vectorized rather than converting individual elements and inserting into a vector. Fixes https://bugs.llvm.org/show_bug.cgi?id=38342 Differential Revision: https://reviews.llvm.org/D50121 llvm-svn: 338658	2018-08-02 00:03:22 +00:00
Lei Liu	8e422b8403	[AArch64] DWARF: do not generate AT_location for thread local AArch64 ELF ABI does not define a static relocation type for TLS offset within a module, which makes it impossible for compiler to generate a valid DW_AT_location content for thread local variables. Currently LLVM generates an invalid R_AARCH64_ABS64 relocation at the DW_AT_location field for a TLS variable. That causes trouble for linker because thread local variable does not have an absolute address at link time. AArch64 GCC solves the problem by not generating DW_AT_location for thread local variables. We should do the same in LLVM. Differential Revision: https://reviews.llvm.org/D43860 llvm-svn: 338655	2018-08-01 23:46:49 +00:00
Reid Kleckner	a30a6d2c29	Load from the GOT for external symbols in the large, PIC code model Do the same handling for external symbols that we do for jump table symbols and global values. Fixes one of the cases in PR38385 llvm-svn: 338651	2018-08-01 22:56:05 +00:00
Matt Arsenault	a7dfd48310	AMDGPU: Use SPseudoInst helper llvm-svn: 338631	2018-08-01 20:49:00 +00:00
Matt Arsenault	709374d186	AMDGPU: Improve hack for packing conversion ops Mutate the node type during selection when it doesn't matter. This avoids an intermediate bitcast node on targets with legal i16/f16. Also fixes missing output modifiers on v_cvt_pkrtz_f32_f16, which I assume are OK. llvm-svn: 338619	2018-08-01 20:13:58 +00:00
Matt Arsenault	55ab9213d3	AMDGPU: Partially fix handling of packed amdgpu_ps arguments Fixes annoying limitations when writing tests. Also remove more leftover code for manually scalarizing arguments and return values. llvm-svn: 338618	2018-08-01 19:57:34 +00:00
Heejin Ahn	b3724b7169	[WebAssembly] Support for a ternary atomic RMW instruction Summary: This adds support for a ternary atomic RMW instruction: cmpxchg. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49195 llvm-svn: 338617	2018-08-01 19:40:28 +00:00
Craig Topper	c985d42903	[X86] Canonicalize the pattern for __builtin_ffs in a similar way to '__builtin_ffs + 5' We now emit a move of -1 before the cmov and do the addition after the cmov just like the case with an extra addition. This may be slightly worse for code size, but is more consistent with other compilers. And we might be able to hoist the mov -1 outside of loops. llvm-svn: 338613	2018-08-01 18:38:46 +00:00
Jan Vesely	93b252799b	AMDGPU/R600: Convert kernel param loads to use PARAM_I_ADDRESS Non ext aligned i32 loads are still optimized to use CONSTANT_BUFFER (AS 8) llvm-svn: 338610	2018-08-01 18:36:07 +00:00
Vlad Tsyrklevich	ab016e00ec	[X86] FastISel fall back on !absolute_symbol GVs Summary: D25878, which added support for !absolute_symbol for normal X86 ISel, did not add support for materializing references to absolute symbols for X86 FastISel. This causes build failures because FastISel generates PC-relative relocations for absolute symbols. Fall back to normal ISel for references to !absolute_symbol GVs. Fix for PR38200. Reviewers: pcc, craig.topper Reviewed By: pcc Subscribers: hiraditya, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D50116 llvm-svn: 338599	2018-08-01 17:44:37 +00:00
Simon Pilgrim	931ebe3be1	[X86] Assign from a brace initializer to match style guide. NFCI. llvm-svn: 338598	2018-08-01 17:43:38 +00:00
Simon Pilgrim	a3548c960e	[SelectionDAG] Make binop reduction matcher available to all targets There is nothing x86-specific about this code, so it'd be nice to make this available for other targets to use in the future (and get it out of X86ISelLowering!). Differential Revision: https://reviews.llvm.org/D50083 llvm-svn: 338586	2018-08-01 16:52:28 +00:00
Jan Vesely	5ba1b4bdab	AMDGPU: Allow fp32-denormals feature for r600 targets This was accidentally removed in r335942. Differential Revision: https://reviews.llvm.org/D49934 llvm-svn: 338569	2018-08-01 15:04:36 +00:00
Bryan Chan	67106b5e08	[AArch64] Fix FCCMP with FP16 operands Summary: This patch adds support for FCCMP instruction with FP16 operands, avoiding an assertion during instruction selection. Reviewers: olista01, SjoerdMeijer, t.p.northover, javed.absar Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50115 llvm-svn: 338554	2018-08-01 13:50:29 +00:00
Simon Pilgrim	564762cf32	[X86] Use isNullConstant helper. NFCI. llvm-svn: 338530	2018-08-01 13:06:14 +00:00
Ryan Taylor	894c8fd0e2	[AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero Summary: Add _L to _LZ image intrinsic table mapping to table gen. In ISelLowering check if image intrinsic has lod and if it's equal to zero, if so remove lod and change opcode to equivalent mapped _LZ. Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71 Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49483 llvm-svn: 338523	2018-08-01 12:12:01 +00:00
Ulrich Weigand	58a9786e81	[SystemZ, TableGen] Fix shift count handling The DAG combiner logic to simplify AND masks in shift counts is invalid. While it is true that the SystemZ shift instructions ignore all but the low 6 bits of the shift count, it is still invalid to simplify the AND masks while the DAG still uses the standard shift operators (which are not defined to match the SystemZ instruction behavior). Instead, this patch performs equivalent operations during instruction selection. For completely removing the AND, this now happens via additional DAG match patterns implemented by a multi-alternative PatFrags. For simplifying a 32-bit AND to a 16-bit AND, the existing DAG patterns were already mostly OK, they just needed an output XForm to actually truncate the immediate value. Unfortunately, the latter change also exposed a bug in TableGen: it seems XForms are currently only handled correctly for direct operands of the outermost operation node. This patch also fixes that bug by simply recurring through the whole pattern. This should be NFC for all other targets. Differential Revision: https://reviews.llvm.org/D50096 llvm-svn: 338521	2018-08-01 11:57:58 +00:00
Simon Pilgrim	e447a273bd	[X86] Use isNullConstant helper. NFCI. llvm-svn: 338516	2018-08-01 11:24:11 +00:00
Andrew V. Tischenko	dad919d357	[X86] Improved sched models for X86 BT*rr instructions. Differential Revision: https://reviews.llvm.org/D49243 llvm-svn: 338507	2018-08-01 10:24:27 +00:00
Petar Jovanovic	64c10ba8e2	[MIPS GlobalISel] Select global address Select G_GLOBAL_VALUE for position dependent code. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D49803 llvm-svn: 338499	2018-08-01 09:03:23 +00:00
David Bolvansky	fbbb83c782	Revert "Enrich inline messages", tests fail llvm-svn: 338496	2018-08-01 08:02:40 +00:00
David Bolvansky	7f36cd9d96	Enrich inline messages Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338494	2018-08-01 07:37:16 +00:00
Martin Storsjo	d4590c38ab	[AArch64] Disallow the MachO specific .loh directive for windows Also add a test for it being unsupported for linux. Differential Revision: https://reviews.llvm.org/D49929 llvm-svn: 338493	2018-08-01 06:50:18 +00:00
Craig Topper	65a1388881	[X86] When looking for (CMOV C-1, (ADD (CTTZ X), C), (X != 0)) -> (ADD (CMOV (CTTZ X), -1, (X != 0)), C), make sure we really have a compare with 0. It's not strictly required by the transform of the cmov and the add, but it makes sure we restrict it to the cases we know we want to match. While there canonicalize the operand order of the cmov to simplify the matching and emitting code. llvm-svn: 338492	2018-08-01 06:36:20 +00:00
Chandler Carruth	2ce191e220	[x86] Fix a really subtle miscompile due to a somewhat glaring bug in EFLAGS copy lowering. If you have a branch of LLVM, you may want to cherrypick this. It is extremely unlikely to hit this case empirically, but it will likely manifest as an "impossible" branch being taken somewhere, and will be ... very hard to debug. Hitting this requires complex conditions living across complex control flow combined with some interesting memory (non-stack) initialized with the results of a comparison. Also, because you have to arrange for an EFLAGS copy to be in just the right place, almost anything you do to the code will hide the bug. I was unable to reduce anything remotely resembling a "good" test case from the place where I hit it, and so instead I have constructed synthetic MIR testing that directly exercises the bug in question (as well as the good behavior for completeness). The issue is that we would mistakenly assume any SETcc with a valid condition and an initial operand that was a register and a virtual register at that to be a register defining SETcc... It isn't though.... This would in turn cause us to test some other bizarre register, typically the base pointer of some memory. Now, testing this register and using that to branch on doesn't make any sense. It even fails the machine verifier (if you are running it) due to the wrong register class. But it will make it through LLVM, assemble, and it looks fine... But wow do you get a very unsual and surprising branch taken in your actual code. The fix is to actually check what kind of SETcc instruction we're dealing with. Because there are a bunch of them, I just test the may-store bit in the instruction. I've also added an assert for sanity that ensure we are, in fact, defining the register operand. =D llvm-svn: 338481	2018-08-01 03:01:58 +00:00
Konstantin Zhuravlyov	bb30ef7af4	AMDGPU: Add clamp bit to dot intrinsics Differential Revision: https://reviews.llvm.org/D49874 llvm-svn: 338470	2018-08-01 01:31:30 +00:00
Reid Kleckner	b32ff46ff7	Revert r338354 "[ARM] Revert r337821" Disable ARMCodeGenPrepare by default again. It is causing verifier failues in V8 that look like: Duplicate integer as switch case switch i32 %trunc, label %if.end13 [ i32 0, label %cleanup36 i32 0, label %if.then8 ], !dbg !4981 i32 0 fatal error: error in backend: Broken function found, compilation aborted! I will continue reducing the test case and send it along. llvm-svn: 338452	2018-07-31 23:09:42 +00:00
Jonas Paulsson	590b1fc881	[SystemZ] Fix bad assert composition. Use '&&' before the string instead of '\|\|' llvm-svn: 338429	2018-07-31 19:58:42 +00:00
Matt Arsenault	feedabfde7	AMDGPU: Break 64-bit arguments into 32-bit pieces llvm-svn: 338421	2018-07-31 19:29:04 +00:00
Matt Arsenault	0395da7842	AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls This improves code for the same reasons as scalarizing 32-bit element vectors. llvm-svn: 338418	2018-07-31 19:17:47 +00:00
Matt Arsenault	9ced1e0d80	AMDGPU: Scalarize vector argument types to calls When lowering calling conventions, prefer to decompose vectors into the constitute register types. This avoids artifical constraints to satisfy a wide super-register. This improves code quality because now optimizations don't need to deal with the super-register constraint. For example the immediate folding code doesn't deal with 4 component reg_sequences, so by breaking the register down earlier the existing immediate folding code is able to work. This also avoids the need for the shader input processing code to manually split vector types. llvm-svn: 338416	2018-07-31 19:05:14 +00:00
Simon Pilgrim	67caf04d3a	[X86] WriteBSWAP sched classes are reg-reg only. Don't declare them as X86SchedWritePair when the folded class will never be used. Note: MOVBE (load/store endian conversion) instructions tend to have a very different behaviour to BSWAP. llvm-svn: 338412	2018-07-31 18:24:24 +00:00
Simon Pilgrim	5d9b00d15b	[X86][SSE] Use ISD::MULHU for constant/non-zero ISD::SRL lowering (PR38151) As was done for vector rotations, we can efficiently use ISD::MULHU for vXi8/vXi16 ISD::SRL lowering. Shift-by-zero cases are still problematic (mainly on v32i8 due to extra AND/ANDN/OR or VPBLENDVB blend masks but v8i16/v16i16 aren't great either if PBLENDW fails) so I've limited this first patch to known non-zero cases if we can't easily use PBLENDW. Differential Revision: https://reviews.llvm.org/D49562 llvm-svn: 338407	2018-07-31 18:05:56 +00:00
Craig Topper	bef126fb71	[X86] Add pattern matching for PMADDUBSW Summary: Similar to D49636, but for PMADDUBSW. This instruction has the additional complexity that the addition of the two products saturates to 16-bits rather than wrapping around. And one operand is treated as signed and the other as unsigned. A C example that triggers this pattern ``` static const int N = 128; int8_t A[2N]; uint8_t B[2N]; int16_t C[N]; void foo() { for (int i = 0; i != N; ++i) C[i] = MIN(MAX((int16_t)A[2i](int16_t)B[2i] + (int16_t)A[2i+1](int16_t)B[2i+1], -32768), 32767); } ``` Reviewers: RKSimon, spatel, zvi Reviewed By: RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49829 llvm-svn: 338402	2018-07-31 17:12:08 +00:00
Francis Visoiu Mistrih	ae8002c1cf	[X86] Preserve more liveness information in emitStackProbeInline This commit fixes two issues with the liveness information after the call: 1) The code always spills RCX and RDX if InProlog == true, which results in an use of undefined phys reg. 2) FinalReg, JoinReg, RoundedReg, SizeReg are not added as live-ins to the basic blocks that use them, therefore they are seen undefined. https://llvm.org/PR38376 Differential Revision: https://reviews.llvm.org/D50020 llvm-svn: 338400	2018-07-31 16:41:12 +00:00
David Bolvansky	ab79414f7b	Revert Enrich inline messages llvm-svn: 338389	2018-07-31 14:47:22 +00:00
David Bolvansky	b562dbabda	Enrich inline messages Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338387	2018-07-31 14:25:24 +00:00
Matt Arsenault	638a202760	AMDGPU: Don't handle FP16_TO_FP in isCanonicalized This needs more special handling to do correctly. Fixes test in subsequent commit. llvm-svn: 338381	2018-07-31 14:15:16 +00:00
Matt Arsenault	4aec86d37a	AMDGPU: Fold undef fcanonicalize to qNaN We could choose a free 0 for this, but this matches the behavior for fmul undef, 1.0. Also, the NaN use is more useful for folding use operations although if it's not eliminated it is more expensive in terms of code size. llvm-svn: 338376	2018-07-31 13:34:31 +00:00
Andrea Di Biagio	a1852b6194	[llvm-mca][BtVer2] Teach how to identify dependency-breaking idioms. This patch teaches llvm-mca how to identify dependency breaking instructions on btver2. An example of dependency breaking instructions is the zero-idiom XOR (example: `XOR %eax, %eax`), which always generates zero regardless of the actual value of the input register operands. Dependency breaking instructions don't have to wait on their input register operands before executing. This is because the computation is not dependent on the inputs. Not all dependency breaking idioms are also zero-latency instructions. For example, `CMPEQ %xmm1, %xmm1` is independent on the value of XMM1, and it generates a vector of all-ones. That instruction is not eliminated at register renaming stage, and its opcode is issued to a pipeline for execution. So, the latency is not zero. This patch adds a new method named isDependencyBreaking() to the MCInstrAnalysis interface. That method takes as input an instruction (i.e. MCInst) and a MCSubtargetInfo. The default implementation of isDependencyBreaking() conservatively returns false for all instructions. Targets may override the default behavior for specific CPUs, and return a value which better matches the subtarget behavior. In future, we should teach to Tablegen how to automatically generate the body of isDependencyBreaking from scheduling predicate definitions. This would allow us to expose the knowledge about dependency breaking instructions to the machine schedulers (and, potentially, other codegen passes). Differential Revision: https://reviews.llvm.org/D49310 llvm-svn: 338372	2018-07-31 13:21:43 +00:00
Simon Pilgrim	0aa2867545	Revert r338365: [X86] Improved sched models for X86 BT*rr instructions. https://reviews.llvm.org/D49243 Contains WIP code that should not have been included. llvm-svn: 338369	2018-07-31 13:00:51 +00:00
Jonas Paulsson	2f12e45d5a	[SystemZ] Improve decoding in case of instructions with four register operands. Since z13, the max group size will be 2 if any μop has more than 3 register sources. This has been ignored sofar in the SystemZHazardRecognizer, but is now handled by recognizing those instructions and adjusting the tracking of decoding and the cost heuristic for grouping. Review: Ulrich Weigand https://reviews.llvm.org/D49847 llvm-svn: 338368	2018-07-31 13:00:42 +00:00
Andrew V. Tischenko	e6f5ace81a	[X86] Improved sched models for X86 BT*rr instructions. https://reviews.llvm.org/D49243 llvm-svn: 338365	2018-07-31 12:33:48 +00:00
Andrew V. Tischenko	e564055671	[X86] Improved sched models for X86 SHLD/SHRD* instructions. Differential Revision: https://reviews.llvm.org/D9611 llvm-svn: 338359	2018-07-31 10:14:43 +00:00
Simon Pilgrim	99d475f97d	[X86][SSE] isFNEG - Use getTargetConstantBitsFromNode to handle all constant cases isFNEG was duplicating much of what was done by getTargetConstantBitsFromNode in its own calls to getTargetConstantFromNode. Noticed while reviewing D48467. llvm-svn: 338358	2018-07-31 10:13:17 +00:00
Martin Storsjo	293079f2de	[ARM] Allow automatically deducing the thumb instruction size for .inst This matches GAS, that allows unsuffixed .inst for thumb. Differential Revision: https://reviews.llvm.org/D49937 llvm-svn: 338357	2018-07-31 09:27:07 +00:00
Martin Storsjo	af18947f0a	[ARM] Support the .inst directive for MachO and COFF targets Contrary to ELF, we don't add any markers that distinguish data generated with .short/.long from normal instructions, so the .inst directive only adds compatibility with assembly that uses it. Differential Revision: https://reviews.llvm.org/D49936 llvm-svn: 338356	2018-07-31 09:27:01 +00:00
Martin Storsjo	3e3d39d07e	[AArch64] Support the .inst directive for MachO and COFF targets Contrary to ELF, we don't add any markers that distinguish data generated with .long from normal instructions, so the .inst directive only adds compatibility with assembly that uses it. Differential Revision: https://reviews.llvm.org/D49935 llvm-svn: 338355	2018-07-31 09:26:52 +00:00
Sam Parker	2a6c842fda	[ARM] Revert r337821 Re-enabling ARMCodeGenPrepare by default after failing to reproduce the bootstrap issues that I was concerned it was causing. llvm-svn: 338354	2018-07-31 09:04:14 +00:00
Craig Topper	9164b9b16e	[X86] Stop accidentally running the Bonnell LEA fixup path on Goldmont. In one place we checked X86Subtarget.slowLEA() to decide if the pass should run. But to decide what the pass should we only check isSLM. This resulted in Goldmont going down the Bonnell path. llvm-svn: 338342	2018-07-31 00:43:54 +00:00
Amara Emerson	1e8c164c63	[AArch64][GlobalISel] Add isel support for G_BLOCK_ADDR. Also refactors some existing code to materialize addresses for the large code model so it can be shared between G_GLOBAL_VALUE and G_BLOCK_ADDR. This implements PR36390. Differential Revision: https://reviews.llvm.org/D49903 llvm-svn: 338337	2018-07-31 00:09:02 +00:00
Amara Emerson	0e86c07077	[AArch64][GlobalISel] Make G_BLOCK_ADDR legal. Differential Revision: https://reviews.llvm.org/D49902 llvm-svn: 338336	2018-07-31 00:08:56 +00:00
Craig Topper	2f60ef2c78	[DAGCombiner][TargetLowering] Pass a SmallVector instead of a std::vector to BuildSDIV/BuildUDIV/etc. The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation. llvm-svn: 338329	2018-07-30 23:22:00 +00:00
Craig Topper	a568a27dfa	[DAGCombiner][PowerPC][AArch64] Pass Created vector by reference to BuildSDIVPow2. llvm-svn: 338303	2018-07-30 21:04:34 +00:00
Fangrui Song	f78650a8de	Remove trailing space sed -Ei 's/[[:space:]]+$//' include/*/.{def,h,td} lib/*/.{cpp,h} llvm-svn: 338293	2018-07-30 19:41:25 +00:00
Jessica Paquette	fa3bee4756	[MachineOutliner][AArch64] Add support for saving LR to a register This teaches the outliner to save LR to a register rather than the stack when possible. This allows us to avoid bumping the stack in outlined functions in some cases. By doing this, in a later patch, we can teach the outliner to do something like this: f1: ... bl OUTLINED_FUNCTION ... f2: ... move LR's contents to a register bl OUTLINED_FUNCTION move the register's contents back instead of falling back to saving LR in both cases. llvm-svn: 338278	2018-07-30 17:45:28 +00:00
Craig Topper	f014ec9b3b	[X86] Fix typo in comment. NFC llvm-svn: 338274	2018-07-30 17:34:31 +00:00
Craig Topper	dd0ef801f8	Recommit r338204 "[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'." This checks in a more direct way without triggering a UBSAN error. llvm-svn: 338273	2018-07-30 17:29:57 +00:00
Thomas Preud'homme	6c1b075299	Fix uninitialized read in ARM's PrintAsmOperand Summary: Fix read of uninitialized RC variable in ARM's PrintAsmOperand when hasRegClassConstraint returns false. This was causing inline-asm-operand-implicit-cast test to fail in r338206. Reviewers: t.p.northover, weimingz, javed.absar, chill Reviewed By: chill Subscribers: chill, eraman, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49984 llvm-svn: 338268	2018-07-30 16:45:40 +00:00
Sander de Smalen	e64206a02c	[AArch64][SVE] Asm: Enable instructions to be prefixed. This patch enables instructions that are destructive on their destination- and first source operand, to be prefixed with a MOVPRFX instruction. This patch also adds a variety of tests: - positive tests for all instructions and forms that accept a movprfx for either or both predicated and unpredicated forms. - negative tests for all instructions and forms that do not accept an unpredicated or predicated movprfx. - negative tests for the diagnostics that get emitted when a MOVPRFX instruction is used incorrectly. This is patch [2/2] in a series to add MOVPRFX instructions: - Patch [1/2]: https://reviews.llvm.org/D49592 - Patch [2/2]: https://reviews.llvm.org/D49593 Reviewers: rengolin, SjoerdMeijer, samparker, fhahn, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D49593 llvm-svn: 338261	2018-07-30 16:05:45 +00:00
Sander de Smalen	9b33309c87	[AArch64][SVE] Asm: Add MOVPRFX instructions. This patch adds predicated and unpredicated MOVPRFX instructions, which can be prepended to SVE instructions that are destructive on their first source operand, to make them a constructive operation, e.g. add z1.s, p0/m, z1.s, z2.s <=> z1 = z1 + z2 can be made constructive: movprfx z0, z1 add z0.s, p0/m, z0.s, z2.s <=> z0 = z1 + z2 The predicated MOVPRFX instruction can additionally be used to zero inactive elements, e.g. movprfx z0.s, p0/z, z1.s add z0.s, p0/m, z0.s, z2.s Not all instructions can be prefixed with the MOVPRFX instruction which is why this patch also adds a mechanism to validate prefixed instructions. The exact rules when a MOVPRFX applies is detailed in the SVE supplement of the Architectural Reference Manual. This is patch [1/2] in a series to add MOVPRFX instructions: - Patch [1/2]: https://reviews.llvm.org/D49592 - Patch [2/2]: https://reviews.llvm.org/D49593 Reviewers: rengolin, SjoerdMeijer, samparker, fhahn, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D49592 llvm-svn: 338258	2018-07-30 15:42:46 +00:00
Krzysztof Parzyszek	24fae50905	[Hexagon] Simplify A4_rcmp[n]eqi R, 0 Consider cases when register R is known to be zero/non-zero, or when it is defined by a C2_muxii instruction. llvm-svn: 338251	2018-07-30 14:28:02 +00:00
Matt Arsenault	de496c32a4	AMDGPU: Reduce code size with fcanonicalize (fneg x) When fcanonicalize is lowered to a mul, we can use -1.0 for free and avoid the cost of the bigger encoding for source modifers. llvm-svn: 338244	2018-07-30 12:16:58 +00:00
Matt Arsenault	f3c9a34def	AMDGPU: Make fneg combine handle fcanonicalize llvm-svn: 338243	2018-07-30 12:16:47 +00:00
Francis Visoiu Mistrih	7d003657de	[MachineOutliner][X86] Use TAILJMPd64 instead of JMP_1 for TailCall construction The machine verifier asserts with: Assertion failed: (isMBB() && "Wrong MachineOperand accessor"), function getMBB, file ../include/llvm/CodeGen/MachineOperand.h, line 542. It calls analyzeBranch which tries to call getMBB if the opcode is JMP_1, but in this case we do: JMP_1 @OUTLINED_FUNCTION I believe we have to use TAILJMPd64 instead of JMP_1 since JMP_1 is used with brtarget8. Differential Revision: https://reviews.llvm.org/D49299 llvm-svn: 338237	2018-07-30 09:59:33 +00:00
Dean Michael Berris	927b3da6c9	Revert "[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'." This reverts commit r338204. llvm-svn: 338236	2018-07-30 09:45:09 +00:00
Nicolai Haehnle	7f0d05d532	AMDGPU: Force skip over s_sendmsg and exp instructions Summary: These instructions interact with hardware blocks outside the shader core, and they can have "scalar" side effects even when EXEC = 0. We don't want these scalar side effects to occur when all lanes want to skip these instructions, so always add the execz skip branch instruction for basic blocks that contain them. Also ensure that we skip scalar stores / atomics, though we don't code-gen those yet. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48431 Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44 llvm-svn: 338235	2018-07-30 09:23:59 +00:00
Petr Pavlu	8b6eff4e77	[ARM] Fix over-alignment in arguments that are HA of 128-bit vectors Code in `CC_ARM_AAPCS_Custom_Aggregate()` is responsible for handling homogeneous aggregates for `CC_ARM_AAPCS_VFP`. When an aggregate ends up fully on stack, the function tries to pack all resulting items of the aggregate as tightly as possible according to AAPCS. Once the first item was laid out, the alignment used for consecutive items was the size of one item. This logic went wrong for 128-bit vectors because their alignment is normally only 64 bits, and so could result in inserting unexpected padding between the first and second element. The patch fixes the problem by updating the alignment with the item size only if this results in reducing it. Differential Revision: https://reviews.llvm.org/D49720 llvm-svn: 338233	2018-07-30 08:49:30 +00:00
Dylan McKay	6bc5d5c6db	[AVR] Re-enable expansion of ADDE/ADDC/SUBE/SUBC in ISel This was disabled in r333748, which broke four tests. In the future, these need to be updated to UADDO/ADDCARRY or USUBO/SUBCARRY. llvm-svn: 338212	2018-07-29 11:38:36 +00:00
Sander de Smalen	ad88a99956	[AArch64][SVE] Asm: Support for WHILE(LE\|LO\|LS\|LT) instructions. The WHILE instructions generate a predicate that is true while the comparison of the first scalar operand (incremented for each predicate element) with the second scalar operand is true and false thereafter. WHILELE While incrementing signed scalar less than or equal to scalar WHILELO While incrementing unsigned scalar lower than scalar WHILELS While incrementing unsigned scalar lower than or same as scalar WHILELT While incrementing signed scalar less than scalar e.g. whilele p0.s, x0, x1 generates predicate p0 (for 32bit elements) by incrementing (signed) x0 and comparing that vector to splat(x1). llvm-svn: 338211	2018-07-29 08:51:08 +00:00
Sander de Smalen	e70ed3187c	[AArch64][SVE] Asm: Instructions to perform serialized operations. The instructions added in this patch permit active elements within a vector to be processed sequentially without unpacking the vector. PFIRST Set the first active element to true. PNEXT Find next active element in predicate. CTERMEQ Compare and terminate loop when equal. CTERMNE Compare and terminate loop when not equal. llvm-svn: 338210	2018-07-29 08:00:16 +00:00
Craig Topper	5daa032546	[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'. X86 normally requires immediates to be a signed 32-bit value which would exclude i64 0x80000000. But for add/sub we can negate the constant and use the opposite instruction. llvm-svn: 338204	2018-07-28 18:21:46 +00:00
Craig Topper	ba208b07b6	[X86] Use alignTo and divideCeil to make some code more readable. NFC llvm-svn: 338203	2018-07-28 18:21:45 +00:00
Sander de Smalen	5b3a289424	[AArch64][SVE] Asm: Support for PFALSE and PTEST instructions. This patch adds PFALSE (unconditionally sets all elements of the predicate to false) and PTEST (set the status flags for the predicate). llvm-svn: 338198	2018-07-28 14:18:11 +00:00
Matt Arsenault	8f9dde94b7	AMDGPU: Stop wasting argument registers with v3i32/v3f32 SelectionDAGBuilder widens v3i32/v3f32 arguments to to v4i32/v4f32 which consume an additional register. In addition to wasting argument space, this produces extra instructions since now it appears the 4th vector component has a meaningful value to most combines. llvm-svn: 338197	2018-07-28 14:11:34 +00:00
Sander de Smalen	3878bf83dd	[AArch64][SVE] Asm: Data-dependent loop predicate partitioning instructions. This patch adds support for instructions that partition a predicate based on data-dependent termination conditions in a loop. BRKA Break after the first true condition BRKAS Break after the first true condition, setting condition flags BRKB Break before the first true condition BRKBS Break before the first true condition, setting condition flags BRKPA Break after the first true condition, propagating from the previous partition BRKPAS Break after the first true condition, propagating from the previous partition, setting condition flags BRKPB Break before the first true condition, propagating from the previous partition BRKPBS Break before the first true condition, propagating from the previous partition, setting condition flags BRKN Propagate break to next partition BKRNS Propagate break to next partition, setting condition flags llvm-svn: 338196	2018-07-28 14:04:52 +00:00
Matt Arsenault	81920b0a25	DAG: Add calling convention argument to calling convention funcs This seems like a pretty glaring omission, and AMDGPU wants to treat kernels differently from other calling conventions. llvm-svn: 338194	2018-07-28 13:25:19 +00:00
Matt Arsenault	72b0e38b26	AMDGPU: Stop trying to extend arguments for clover This was trying to replace i8/i16 arguments with i32, which was broken and no longer necessary. llvm-svn: 338193	2018-07-28 12:34:25 +00:00
Wouter van Oortmerssen	a90d24da1c	Revert "[WebAssembly] Added default stack-only instruction mode for MC." This reverts commit d3c9af4179eae7793d1487d652e2d4e23844555f. (SVN revision 338164) llvm-svn: 338176	2018-07-27 23:19:51 +00:00
Craig Topper	c3e11bf3f7	[X86] Add support expanding multiplies by constant where the constant is -3/-5/-9 multplied by a power of 2. These can be replaced with an LEA, a shift, and a negate. This seems to match what gcc and icc would do. llvm-svn: 338174	2018-07-27 23:04:59 +00:00
Wouter van Oortmerssen	a67c4137c3	[WebAssembly] Added default stack-only instruction mode for MC. Summary: Moved Explicit Locals pass to last. Made that pass obligatory. Made it convert from register to stack based instructions, and removed the registers. Fixes to related code that was expecting register based instructions. Added the correct testing flag to all tests, depending on what the format they were expecting so far. Translated one test to stack format as example: reg-stackify-stack.ll tested: llvm-lit -v `find test -name WebAssembly` unittests/MC/* Reviewers: dschuff, sunfish Subscribers: sbc100, jgravelle-google, eraman, aheejin, llvm-commits Differential Revision: https://reviews.llvm.org/D49160 llvm-svn: 338164	2018-07-27 20:56:43 +00:00
Jessica Paquette	f90edbe3d6	Recommit "Enable MachineOutliner by default under -Oz for AArch64" Fixed the ASAN failure from before in r338148, so recommiting. This patch enables the MachineOutliner by default in AArch64 under -Oz. The MachineOutliner offers around a 4.5% improvement on the current -Oz code size improvements. We have done work into improving the debuggability of outlined code, so that users of -Oz won't be surprised by the optimization. We have also been executing the LLVM test suite and common external tests such as the SPEC suites continuously with no issue. The outliner has a low compile-time overhead of roughly 1%. At this point, the outliner would be a really good addition to the -Oz pass pipeline! llvm-svn: 338160	2018-07-27 20:18:27 +00:00
Jessica Paquette	9d93c6026a	[MachineOutliner] Exit getOutliningCandidateInfo when we erase all candidates There was a missing check for if a candidate list was entirely deleted. This adds that check. This fixes an asan failure caused by running test/CodeGen/AArch64/addsub_ext.ll with the MachineOutliner enabled. llvm-svn: 338148	2018-07-27 18:21:57 +00:00
Evandro Menezes	fcca45f0dd	[ARM] Add new target feature to fuse literal generation This feature enables the fusion of such operations on Cortex A57 and Cortex A72, as recommended in their Software Optimisation Guides, sections 4.14 and 4.11, respectively. Differential revision: https://reviews.llvm.org/D49563 llvm-svn: 338147	2018-07-27 18:16:47 +00:00
Jessica Paquette	faea2d3130	Revert "Enable MachineOutliner by default under -Oz for AArch64" It failed an Asan test on a bot: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/21543/steps/check-llvm%20asan/logs/stdio Fixing that before recommitting. llvm-svn: 338136	2018-07-27 17:25:38 +00:00
Yonghong Song	04ccfda075	bpf: add missing RegState to notify MachineInstr verifier necessary register usage Errors like the following are reported by: https://urldefense.proofpoint.com/v2/url?u=http-3A__lab.llvm.org-3A8011_builders_llvm-2Dclang-2Dx86-5F64-2Dexpensive-2Dchecks-2Dwin_builds_11261&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=929oWPCf7Bf2qQnir4GBtowB8ZAlIRWsAdTfRkDaK-g&s=9k-wbEUVpUm474hhzsmAO29VXVvbxJPWD9RTgCD71fQ&e= * Bad machine code: Explicit definition marked as use * - function: cal_align1 - basic block: %bb.0 entry (0x47edd98) - instruction: LDB $r3, $r2, 0 - operand 0: $r3 This is because RegState info was missing for ScratchReg inside expandMEMCPY. This caused incomplete register usage information to MachineInstr verifier which then would complain as there could be potential code-gen issue if the complained MachineInstr is used in place where register usage information matters even though the memcpy expanding is not in such case as it happens at the last stage of IR optimization pipeline. We should always specify those register usage information which compiler couldn't deduct automatically whenever we add a hardware register manually. Reported-by: Builder llvm-clang-x86_64-expensive-checks-win Build #11261 Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 338134	2018-07-27 16:58:52 +00:00
Jessica Paquette	d4229b985c	Enable MachineOutliner by default under -Oz for AArch64 This patch enables the MachineOutliner by default in AArch64 under -Oz. The MachineOutliner offers around a 4.5% improvement on the current -Oz code size improvements. We have done work into improving the debuggability of outlined code, so that users of -Oz won't be surprised by the optimization. We have also been executing the LLVM test suite and common external tests such as the SPEC suites continuously with no issue. The outliner has a low compile-time overhead of roughly 1%. At this point, the outliner would be a really good addition to the -Oz pass pipeline! llvm-svn: 338133	2018-07-27 16:44:42 +00:00
Jan Vesely	6ff58ed5ca	AMDGPU/R600: Add MOV instructions to BFE patterns R600 can't handle immediates for BFE, these will be eliminated later. Fixes powr/pow regressions n r600 since r334817 Differential Revision: https://reviews.llvm.org/D49641 llvm-svn: 338127	2018-07-27 15:00:13 +00:00
Sander de Smalen	a703b8dc71	[AArch64][SVE] Asm: Predicated integer reductions. This patch adds support for various integer reduction operations: SADDV signed add reduction to scalar UADDV unsigned add reduction to scalar SMAXV signed maximum reduction to scalar SMINV signed minimum reduction to scalar UMAXV unsigned maximum reduction to scalar UMINV unsigned minimum reduction to scalar ANDV logical AND reduction to scalar ORV logical OR reduction to scalar EORV logical EOR reduction to scalar The reduction is predicated, e.g. smaxv s0, p0, z1.s performs a signed maximum reduction on active elements in z1, and stores the (signed max value) result in s0. llvm-svn: 338126	2018-07-27 14:24:55 +00:00
Sander de Smalen	fcb636d222	[AArch64][SVE] Asm: Predicated floating point reductions. This patch adds support for various floating-point reduction operations: FADDA strictly-ordered add reduction, accumulating in scalar FADDV recursive add reduction to scalar FMAXV recursive max reduction to scalar FMINV recursive min reduction to scalar FMAXNMV recursive max number reduction to scalar FMINNMV recursive min number reduction to scalar The reduction is predicated, e.g. fadda d0, p0, d0, z1.d performs the add-reduction in strict order on active elements in z1, accumulating into d0. faddv d0, p0, z1.d performs the add-reduction (not in strict order) on active elements in z1, storing the result in d0. llvm-svn: 338123	2018-07-27 13:58:48 +00:00
Sander de Smalen	88e154ff90	[AArch64][SVE] Asm: Support for FEXPA and FTSSEL. This patch adds support for transcendental acceleration instructions 'FEXPA' (exponential accelerator) and 'FTSSEL' (trigonometric select coefficient). llvm-svn: 338121	2018-07-27 12:40:09 +00:00
Sander de Smalen	71929e7cad	[AArch64][SVE] Asm: Support for FRECPE and FRSQRTE. Support for floating-point instructions for reciprocal estimate (FRECPE) and reciprocal square root estimate (FRSQRTE). llvm-svn: 338120	2018-07-27 12:26:24 +00:00
Matt Arsenault	0183c56c11	AMDGPU: Fix code size for return_to_epilog pseudo llvm-svn: 338113	2018-07-27 09:15:03 +00:00
Tom Stellard	e9bdc5f1d8	AMDGPU/GlobalISel: Fix crash in regbankselect on non-power-of-2 types Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D49624 llvm-svn: 338102	2018-07-27 06:04:40 +00:00
Craig Topper	561e298e29	[X86] Remove an unnecessary 'if' that prevented treating INT64_MAX and -INT64_MAX as power of 2 minus 1 in the multiply expansion code. Not sure why they were being explicitly excluded, but I believe all the math inside the if works. I changed the absolute value to be uint64_t instead of int64_t so INT64_MIN+1 wouldn't be signed wrap. llvm-svn: 338101	2018-07-27 05:56:27 +00:00
Craig Topper	e364baa88b	[X86] Add matching for another pattern of PMADDWD. Summary: This is the pattern you get from the loop vectorizer for something like this int16_t A[1024]; int16_t B[1024]; int32_t C[512]; void pmaddwd() { for (int i = 0; i != 512; ++i) C[i] = (A[2i]B[2i]) + (A[2i+1]B[2i+1]); } In this case we will have (add (mul (build_vector), (build_vector)), (mul (build_vector), (build_vector))). This is different than the pattern we currently match which has the build_vectors between an add and a single multiply. I'm not sure what C code would get you that pattern. Reviewers: RKSimon, spatel, zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49636 llvm-svn: 338097	2018-07-27 04:29:10 +00:00
Craig Topper	f7bc550223	[X86] When removing sign extends from gather/scatter indices, make sure we handle UpdateNodeOperands finding an existing node to CSE with. If this happens the operands aren't updated and the existing node is returned. Make sure we pass this existing node up to the DAG combiner so that a proper replacement happens. Otherwise we get stuck in an infinite loop with an unoptimized node. llvm-svn: 338090	2018-07-27 00:00:30 +00:00
Scott Linder	eb1f75d561	[AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits Scale the offset of VGPR spills by the wave size when it cannot fit in the 12-bit offset immediate field and so is added to the soffset SGPR. This accounts for hardware swizzling of scratch memory. Differential Revision: https://reviews.llvm.org/D49448 llvm-svn: 338060	2018-07-26 19:47:51 +00:00
Ana Pazos	2e4106b73d	[RISCV] Add support for _interrupt attribute - Save/restore only registers that are used. This includes Callee saved registers and Caller saved registers (arguments and temporaries) for integer and FP registers. - If there is a call in the interrupt handler, save/restore all Caller saved registers (arguments and temporaries) and all FP registers. - Emit special return instructions depending on "interrupt" attribute type. Based on initial patch by Zhaoshi Zheng. Reviewers: asb Reviewed By: asb Subscribers: rkruppe, the_o, MartinMosbeck, brucehoult, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, llvm-commits Differential Revision: https://reviews.llvm.org/D48411 llvm-svn: 338047	2018-07-26 17:49:43 +00:00
Alexey Bataev	4dd7558fab	[DEBUGINFO, NVPTX] Emit correct debug information for local variables. Summary: NVPTX target dos not use register-based frame information. Instead it relies on the artificial local_depot that is used instead of the frame and the data for variables must be emitted relatively to this local_depot. Reviewers: tra, jlebar, echristo Subscribers: jholewinski, aprantl, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D45963 llvm-svn: 338039	2018-07-26 16:29:52 +00:00
Luke Cheeseman	66b5e7da4c	Enable some pointer authentication instructions for aarch64 v8a targets - Some of the v8.3 pointer authentication instruction inhabit the Hint space - These instructions can be assembled to hint instructions which act as NOP instructions prior to v8.3 - This patch permits using the hint instructions for all v8a targets - Also, correct the RETA{A,B} instructions to match the instruction attributes of RET (set isTerminator and isBarrier) Differential Revision: https://reviews.llvm.org/D49786 llvm-svn: 338029	2018-07-26 14:00:50 +00:00
Stefan Maksimovic	4a612d4bf2	[mips] Sign extend i32 return values on MIPS64 Override getTypeForExtReturn so that functions returning an i32 typed value have it sign extended on MIPS64. Also provide patterns to get rid of unneeded sign extensions for arithmetic instructions which implicitly sign extend their results. Differential Revision: https://reviews.llvm.org/D48374 llvm-svn: 338019	2018-07-26 10:59:35 +00:00
Chandler Carruth	1387159b93	[x86/SLH] Extract the logic to trace predicate state through calls to a helper function with a nice overview comment. NFC. This is a preperatory refactoring to implementing another component of mitigation here that was descibed in the design document but hadn't been implemented yet. llvm-svn: 338016	2018-07-26 09:42:57 +00:00
Sjoerd Meijer	dc198344ce	[AArch64] Armv8.2-A: add the crypto extensions This adds MC support for the crypto instructions that were made optional extensions in Armv8.2-A (AArch64 only). Differential Revision: https://reviews.llvm.org/D49370 llvm-svn: 338010	2018-07-26 07:13:59 +00:00
Craig Topper	4e687d5bb2	[X86] Don't use CombineTo to skip adding new nodes to the DAGCombiner worklist in combineMul. I'm not sure if this was trying to avoid optimizing the new nodes further or what. Or maybe to prevent a cycle if something tried to reform the multiply? But I don't think its a reliable way to do that. If the user of the expanded multiply is visited by the DAGCombiner after this conversion happens, the DAGCombiner will check its operands, see that they haven't been visited by the DAGCombiner before and it will then add the first node to the worklist. This process will repeat until all the new nodes are visited. So this seems like an unreliable prevention at best. So this patch just returns the new nodes like any other combine. If this starts causing problems we can try to add target specific nodes or something to more directly prevent optimizations. Now that we handle the combine normally, we can combine any negates the mul expansion creates into their users since those will be visited now. llvm-svn: 338007	2018-07-26 05:40:10 +00:00
Craig Topper	370bdd3a0f	[X86] Remove some unnecessary explicit calls to DCI.AddToWorkList. These calls were making sure some newly created nodes were added to worklist, but the DAGCombiner has internal support for ensuring it has visited all nodes. Any time it visits a node it ensures the operands have been queued to be visited as well. This means if we only need to return the last new node. The DAGCombiner will take care of adding its inputs thus walking backwards through all the new nodes. llvm-svn: 337996	2018-07-26 03:20:27 +00:00
Matthias Braun	57dd5b3dea	CodeGen: Cleanup regmask construction; NFC - Avoid duplication of regmask size calculation. - Simplify allocateRegisterMask() call. - Rename allocateRegisterMask() to allocateRegMask() to be consistent with naming in MachineOperand. llvm-svn: 337986	2018-07-26 00:27:47 +00:00
Yonghong Song	71d81e5c8f	bpf: new option -bpf-expand-memcpy-in-order to expand memcpy in order Some BPF JIT backends would want to optimize memcpy in their own architecture specific way. However, at the moment, there is no way for JIT backends to see memcpy semantics in a reliable way. This is due to LLVM BPF backend is expanding memcpy into load/store sequences and could possibly schedule them apart from each other further. So, BPF JIT backends inside kernel can't reliably recognize memcpy semantics by peephole BPF sequence. This patch introduce new intrinsic expand infrastructure to memcpy. To get stable in-order load/store sequence from memcpy, we first lower memcpy into BPF::MEMCPY node which then expanded into in-order load/store sequences in expandPostRAPseudo pass which will happen after instruction scheduling. By this way, kernel JIT backends could reliably recognize memcpy through scanning BPF sequence. This new memcpy expand infrastructure is gated by a new option: -bpf-expand-memcpy-in-order Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 337977	2018-07-25 22:40:02 +00:00
Martin Storsjo	d78b394543	Add missing 'override', fixing compilation with some compilers since SVN r337950 llvm-svn: 337952	2018-07-25 19:01:36 +00:00
Martin Storsjo	d2662c32fb	[COFF] Hoist constant pool handling from X86AsmPrinter into AsmPrinter In SVN r334523, the first half of comdat constant pool handling was hoisted from X86WindowsTargetObjectFile (which despite the name only was used for msvc targets) into the arch independent TargetLoweringObjectFileCOFF, but the other half of the handling was left behind in X86AsmPrinter::GetCPISymbol. With only half of the handling in place, inconsistent comdat sections/symbols are created, causing issues with both GNU binutils (avoided for X86 in SVN r335918) and with the MS linker, which would complain like this: fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4 Differential Revision: https://reviews.llvm.org/D49644 llvm-svn: 337950	2018-07-25 18:35:31 +00:00
Eli Friedman	733f4ed1bb	[ARM] Prefer lsls+lsrs over lsls+ands or lsrs+ands in Thumb1. Saves materializing the immediate for the "ands". Corresponding patterns exist for lsrs+lsls, but that seems less common in practice. Now implemented as a DAGCombine. Differential Revision: https://reviews.llvm.org/D49585 llvm-svn: 337945	2018-07-25 18:22:22 +00:00
Stanislav Mekhanoshin	7e7268ac1c	[AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion Differential Revision: https://reviews.llvm.org/D49761 llvm-svn: 337938	2018-07-25 17:02:11 +00:00
Krzysztof Parzyszek	4e07509d18	[Hexagon] Properly scale bit index when extracting elements from vNi1 For example v = <2 x i1> is represented as bbbbaaaa in a predicate register, where b = v[1], a = v[0]. Extracting v[1] is equivalent to extracting bit 4 from the predicate register. llvm-svn: 337934	2018-07-25 16:20:59 +00:00
Petar Jovanovic	58c0210023	[MIPS GlobalISel] Lower pointer arguments Add support for lowering pointer arguments. Changing type from pointer to integer is already done in MipsTargetLowering::getRegisterTypeForCallingConv. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D49419 llvm-svn: 337912	2018-07-25 12:35:01 +00:00
Jonas Paulsson	374af8070e	[SystemZ] Use tablegen loops in SchedModels NFC changes to make scheduler TableGen files more readable, by using loops instead of a lot of similar defs with just e.g. a latency value that changes. https://reviews.llvm.org/D49598 Review: Ulrich Weigand, Javed Abshar llvm-svn: 337909	2018-07-25 11:42:55 +00:00
Chandler Carruth	4f6481dc81	[x86/SLH] Sink the return hardening into the main block-walk + hardening code. This consolidates all our hardening calls, and simplifies the code a bit. It seems much more clear to handle all of these together. No functionality changed here. llvm-svn: 337895	2018-07-25 09:18:48 +00:00
Chandler Carruth	196e719acd	[x86/SLH] Improve name and comments for the main hardening function. This function actually does two things: it traces the predicate state through each of the basic blocks in the function (as that isn't directly handled by the SSA updater) and it hardens everything necessary in the block as it goes. These need to be done together so that we have the currently active predicate state to use at each point of the hardening. However, this also made obvious that the flag to disable actual hardening of loads was flawed -- it also disabled tracing the predicate state across function calls within the body of each block. So this patch sinks this debugging flag test to correctly guard just the hardening of loads. Unless load hardening was disabled, no functionality should change with tis patch. llvm-svn: 337894	2018-07-25 09:00:26 +00:00
Simon Atanasyan	b524459288	[mips] Replace custom parsing logic for data directives by the `addAliasForDirective` The target independent AsmParser doesn't recognise .hword, .word, .dword which are required for Mips. Currently MipsAsmParser recognises these through dispatch to MipsAsmParser::parseDataDirective. This contains equivalent logic to AsmParser::parseDirectiveValue. This patch allows reuse of AsmParser::parseDirectiveValue by making use of addAliasForDirective to support .hword, .word and .dword. Original patch provided by Alex Bradbury at D47001 was modified to fix handling of microMIPS symbols. The `AsmParser::parseDirectiveValue` calls either `EmitIntValue` or `EmitValue`. In this patch we override `EmitIntValue` in the `MipsELFStreamer` to clear a pending set of microMIPS symbols. Differential revision: https://reviews.llvm.org/D49539 llvm-svn: 337893	2018-07-25 07:07:43 +00:00
Craig Topper	dc0e8a601d	[X86] Use X86ISD::MUL_IMM instead of ISD::MUL for multiply we intend to be selected to LEA. This prevents other combines from possibly disturbing it. llvm-svn: 337890	2018-07-25 05:33:36 +00:00
Chandler Carruth	7024921c0a	[x86/SLH] Teach the x86 speculative load hardening pass to harden against v1.2 BCBS attacks directly. Attacks using spectre v1.2 (a subset of BCBS) are described in the paper here: https://people.csail.mit.edu/vlk/spectre11.pdf The core idea is to speculatively store over the address in a vtable, jumptable, or other target of indirect control flow that will be subsequently loaded. Speculative execution after such a store can forward the stored value to subsequent loads, and if called or jumped to, the speculative execution will be steered to this potentially attacker controlled address. Up until now, this could be mitigated by enableing retpolines. However, that is a relatively expensive technique to mitigate this particular flavor. Especially because in most cases SLH will have already mitigated this. To fully mitigate this with SLH, we need to do two core things: 1) Unfold loads from calls and jumps, allowing the loads to be post-load hardened. 2) Force hardening of incoming registers even if we didn't end up needing to harden the load itself. The reason we need to do these two things is because hardening calls and jumps from this particular variant is importantly different from hardening against leak of secret data. Because the "bad" data here isn't a secret, but in fact speculatively stored by the attacker, it may be loaded from any address, regardless of whether it is read-only memory, mapped memory, or a "hardened" address. The only 100% effective way to harden these instructions is to harden the their operand itself. But to the extent possible, we'd like to take advantage of all the other hardening going on, we just need a fallback in case none of that happened to cover the particular input to the control transfer instruction. For users of SLH, currently they are paing 2% to 6% performance overhead for retpolines, but this mechanism is expected to be substantially cheaper. However, it is worth reminding folks that this does not mitigate all of the things retpolines do -- most notably, variant #2 is not in any way mitigated by this technique. So users of SLH may still want to enable retpolines, and the implementation is carefuly designed to gracefully leverage retpolines to avoid the need for further hardening here when they are enabled. Differential Revision: https://reviews.llvm.org/D49663 llvm-svn: 337878	2018-07-25 01:51:29 +00:00
Craig Topper	fc501a9223	[X86] Use a shift plus an lea for multiplying by a constant that is a power of 2 plus 2/4/8. The LEA allows us to combine an add and the multiply by 2/4/8 together so we just need a shift for the larger power of 2. llvm-svn: 337875	2018-07-25 01:15:38 +00:00
Craig Topper	5be253d988	[X86] Expand mul by pow2 + 2 using a shift and two adds similar to what we do for pow2 - 2. llvm-svn: 337874	2018-07-25 01:15:35 +00:00
Craig Topper	56c104f104	[X86] Use a two lea sequence for multiply by 37, 41, and 73. These fit a pattern used by 11, 21, and 19. llvm-svn: 337871	2018-07-24 23:44:17 +00:00
Craig Topper	f8fcee70a3	[X86] Change multiply by 26 to use two multiplies by 5 and an add instead of multiply by 3 and 9 and a subtract. Same number of operations, but ending in an add is friendlier due to it being commutable. llvm-svn: 337869	2018-07-24 23:44:12 +00:00
Craig Topper	5ddc0a2b14	[X86] When expanding a multiply by a negative of one less than a power of 2, like 31, don't generate a negate of a subtract that we'll never optimize. We generated a subtract for the power of 2 minus one then negated the result. The negate can be optimized away by swapping the subtract operands, but DAG combine doesn't know how to do that and we don't add any of the new nodes to the worklist anyway. This patch makes use explicitly emit the swapped subtract. llvm-svn: 337858	2018-07-24 21:31:21 +00:00
Craig Topper	6d29891bef	[X86] Generalize the multiply by 30 lowering to generic multipy by power 2 minus 2. Use a left shift and 2 subtracts like we do for 30. Move this out from behind the slow lea check since it doesn't even use an LEA. Use this for multiply by 14 as well. llvm-svn: 337856	2018-07-24 21:15:41 +00:00
Craig Topper	86d6320b94	[X86] Change multiply by 19 to use (9 * X) * 2 + X instead of (5 * X) * 4 - 1. The new lowering can be done in 2 LEAs. The old code took 1 LEA, 1 shift, and 1 sub. llvm-svn: 337851	2018-07-24 20:31:48 +00:00
Jessica Paquette	69f517df27	[MachineOutliner][NFC] Move target frame info into OutlinedFunction Just some gardening here. Similar to how we moved call information into Candidates, this moves outlined frame information into OutlinedFunction. This allows us to remove TargetCostInfo entirely. Anywhere where we returned a TargetCostInfo struct, we now return an OutlinedFunction. This establishes OutlinedFunctions as more of a general repeated sequence, and Candidates as occurrences of those repeated sequences. llvm-svn: 337848	2018-07-24 20:13:10 +00:00
Peter Collingbourne	e06bac4796	Put "built-in" function definitions in global Used list, for LTO. (fix bug 34169) When building with LTO, builtin functions that are defined but whose calls have not been inserted yet, get internalized. The Global Dead Code Elimination phase in the new LTO implementation then removes these function definitions. Later optimizations add calls to those functions, and the linker then dies complaining that there are no definitions. This CL fixes the new LTO implementation to check if a function is builtin, and if so, to not internalize (and later DCE) the function. As part of this fix I needed to move the RuntimeLibcalls.{def,h} files from the CodeGen subidrectory to the IR subdirectory. I have updated all the files that accessed those two files to access their new location. Fixes PR34169 Patch by Caroline Tice! Differential Revision: https://reviews.llvm.org/D49434 llvm-svn: 337847	2018-07-24 19:34:37 +00:00
Chandler Carruth	c9313a9ecb	[x86] Teach the x86 backend that it can fold between TCRETURNm* and TCRETURNr* and fix latent bugs with register class updates. Summary: Enabling this fully exposes a latent bug in the instruction folding: we never update the register constraints for the register operands when fusing a load into another operation. The fused form could, in theory, have different register constraints on its operands. And in fact, TCRETURNm* needs its memory operands to use tailcall compatible registers. I've updated the folding code to re-constrain all the registers after they are mapped onto their new instruction. However, we still can't enable folding in the general case from TCRETURNr* to TCRETURNm* because doing so may require more registers to be available during the tail call. If the call itself uses all but one register, and the folded load would require both a base and index register, there will not be enough registers to allocate the tail call. It would be better, IMO, to teach the register allocator to unfold TCRETURNm* when it runs out of registers (or specifically check the number of registers available during the TCRETURNr) but I'm not going to try and solve that for now. Instead, I've just blocked the forward folding from r -> m, leaving LLVM free to unfold from m -> r as that doesn't introduce new register pressure constraints. The down side is that I don't have anything that will directly exercise this. Instead, I will be immediately using this it my SLH patch. =/ Still worse, without allowing the TCRETURNr -> TCRETURNm* fold, I don't have any tests that demonstrate the failure to update the memory operand register constraints. This patch still seems correct, but I'm nervous about the degree of testing due to this. Suggestions? Reviewers: craig.topper Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D49717 llvm-svn: 337845	2018-07-24 19:04:37 +00:00
Jessica Paquette	fca55129b1	[MachineOutliner][NFC] Make Candidates own their call information Before this, TCI contained all the call information for each Candidate. This moves that information onto the Candidates. As a result, each Candidate can now supply how it ought to be called. Thus, Candidates will be able to, say, call the same function in cheaper ways when possible. This also removes that information from TCI, since it's no longer used there. A follow-up patch for the AArch64 outliner will demonstrate this. llvm-svn: 337840	2018-07-24 17:42:11 +00:00
Simon Atanasyan	28ded4ee19	[mips] Fix local dynamic TLS with Sym64 For the final DTPREL addition, rather than a lui/daddiu/daddu triple, LLVM was erronously emitting a daddiu/daddiu pair, treating the %dtprel_hi as if it were a %dtprel_lo, since Mips::Hi expands unshifted for Sym64. Instead, use a new TlsHi node and, although unnecessary due to the exact structure of the nodes emitted, use TlsHi for local exec too to prevent future bugs. Also garbage-collect the unused TprelLo and TlsGd nodes, and TprelHi since its functionality is provided by the new common TlsHi node. Patch by James Clarke. Differential revision: https://reviews.llvm.org/D49259 llvm-svn: 337827	2018-07-24 13:47:52 +00:00
Chandler Carruth	54529146c6	[x86/SLH] Extract the core register hardening logic to a low-level helper and restructure the post-load hardening to use this. This isn't as trivial as I would have liked because the post-load hardening used a trick that only works for it where it swapped in a temporary register to the load rather than replacing anything. However, there is a simple way to do this without that trick that allows this to easily reuse a friendly API for hardening a value in a register. That API will in turn be usable in subsequent patcehs. This also techincally changes the position at which we insert the subreg extraction for the predicate state, but that never resulted in an actual instruction and so tests don't change at all. llvm-svn: 337825	2018-07-24 12:44:00 +00:00
Chandler Carruth	376113da89	[x86/SLH] Tidy up a comment, using doxygen structure and wording it to be more accurate and understandable. llvm-svn: 337822	2018-07-24 12:19:01 +00:00
Sam Parker	8b93e82c3d	[ARM] Disable ARMCodeGenPrepare by default ARM Stage 2 builders have been suspiciously broken since the pass was committed. Disabling to hopefully fix the bots and give me time to debug. llvm-svn: 337821	2018-07-24 12:04:23 +00:00
Tom Stellard	b7f19e6d1e	AMDGPU/GlobalISel: Legalize G_INSERT Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49601 llvm-svn: 337798	2018-07-24 02:19:20 +00:00
Tom Stellard	2d37929c10	AMDGPU/GlobalISel: Remove unnecessary legality constraint for G_EXTRACT Summary: We were marking G_EXTRACT operations unsupported if the output type was larger than the input type. I don't see how this could ever actually happen, so I dropped the constraint. Doing this makes it possible to reuse the same legality code for G_INSERT. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49600 llvm-svn: 337794	2018-07-24 01:43:49 +00:00
Chandler Carruth	66fbbbca60	[x86/SLH] Simplify the code for hardening a loaded value. NFC. This is in preparation for extracting this into a re-usable utility in this code. llvm-svn: 337785	2018-07-24 00:35:36 +00:00
Chandler Carruth	b46c22de00	[x86/SLH] Remove complex SHRX-based post-load hardening. This code was really nasty, had several bugs in it originally, and wasn't carrying its weight. While on Zen we have all 4 ports available for SHRX, on all of the Intel parts with Agner's tables, SHRX can only execute on 2 ports, giving it 1/2 the throughput of OR. Worse, all too often this pattern required two SHRX instructions in a chain, hurting the critical path by a lot. Even if we end up needing to safe/restore EFLAGS, that is no longer so bad. We pay for a uop to save the flag, but we very likely get fusion when it is used by forming a test/jCC pair or something similar. In practice, I don't expect the SHRX to be a significant savings here, so I'd like to avoid the complex code required. We can always resurrect this if/when someone has a specific performance issue addressed by it. llvm-svn: 337781	2018-07-24 00:21:59 +00:00
Martin Storsjo	c2b701408e	[AArch64] Use MCAsmInfoMicrosoft and MCAsmInfoGNUCOFF as base classes This matches the structure used on X86 and ARM. This requires a little bit of duplication of the parts that are equal in both AArch64 COFF variants though. Before SVN r335286, these classes didn't add anything that MCAsmInfoCOFF didn't, but now they do. This makes AArch64 match X86 in how comdat is used for float constants for MinGW. Differential Revision: https://reviews.llvm.org/D49637 llvm-svn: 337755	2018-07-23 22:15:14 +00:00
Reid Kleckner	980c4df037	Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models" Don't try to generate large PIC code for non-ELF targets. Neither COFF nor MachO have relocations for large position independent code, and users have been using "large PIC" code models to JIT 64-bit code for a while now. With this change, if they are generating ELF code, their JITed code will truly be PIC, but if they target MachO or COFF, it will contain 64-bit immediates that directly reference external symbols. For a JIT, that's perfectly fine. llvm-svn: 337740	2018-07-23 21:14:35 +00:00
Krzysztof Parzyszek	9500a24fce	[Hexagon] Handle unnamed globals in HexagonConstExpr Instead of comparing names, compare positions in the parent module. llvm-svn: 337723	2018-07-23 18:30:17 +00:00
Fangrui Song	58407ca045	[ARM] Use unique_ptr to fix memory leak introduced in r337701 llvm-svn: 337714	2018-07-23 17:43:21 +00:00
Jordan Rupprecht	e5daf61229	OpChain has subclasses, so add a virtual destructor. Summary: OpChain has subclasses, so add a virtual destructor. This fixes an issue when deleting subclasses of OpChain (see MatchSMLAD() specifically) in r337701. Reviewers: javed.absar Subscribers: llvm-commits, SjoerdMeijer, samparker Differential Revision: https://reviews.llvm.org/D49681 llvm-svn: 337713	2018-07-23 17:38:05 +00:00
Matt Morehouse	d773cb99f1	[ARM] Follow-up to r337709. Fix double-free. llvm-svn: 337711	2018-07-23 17:22:53 +00:00
Matt Morehouse	a70685f63a	[ARM] Add doFinalization() to ARMCodeGenPrepare pass. Attempt to fix the leak introduced in r337687 and make sanitizer buildbots green again. llvm-svn: 337709	2018-07-23 17:00:45 +00:00
Sam Parker	89a3799a69	[ARM][NFC] ParallelDSP reorganisation In preparing to allow ARMParallelDSP pass to parallelise more than smlads, I've restructed some elements: - The ParallelMAC struct has been renamed to BinOpChain. - The BinOpChain struct holds two value lists: LHS and RHS, as well as inheriting from the OpChain base class. - The OpChain struct holds all the values of the represented chain and has had the memory locations functionality inserted into it. - ParallelMACList becomes OpChainList and it now holds pointers instead of objects. Differential Revision: https://reviews.llvm.org/D49020 llvm-svn: 337701	2018-07-23 15:25:59 +00:00
Jonas Paulsson	59c94bec0d	[SystemZ] Fix dumpSU() method in SystemZHazardRecognizer. Two minor issues: The new MCD SchedWrite name does not contain "Unit" like all the others, so a check is needed. Also, print "LSU" instead of "LS". Review: Ulrich Weigand llvm-svn: 337700	2018-07-23 15:08:35 +00:00
Sam Parker	3828c6ff94	[ARM] ARMCodeGenPrepare backend pass Arm specific codegen prepare is implemented to perform type promotion on icmp operands, which can enable the removal of uxtb and uxth (unsigned extend) instructions. This is possible because performing type promotion before ISel alleviates this duty from the DAG builder which has to perform legalisation, but has a limited view on data ranges. The pass visits any instruction operand of an icmp and creates a worklist to traverse the use-def tree to determine whether the values can simply be promoted. Our concern is values in the registers overflowing the narrow (i8, i16) data range, so instructions marked with nuw can be promoted easily. For add and sub instructions, we are able to use the parallel dsp instructions to operate on scalar data types and avoid overflowing bits. Underflowing adds and subs are also permitted when the result is only used by an unsigned icmp. Differential Revision: https://reviews.llvm.org/D48832 llvm-svn: 337687	2018-07-23 12:27:47 +00:00
Roman Lebedev	52b85377eb	[NFC][MCA] ZnVer1: Update RegisterFile to identify false dependencies on partially written registers. Summary: Pretty mechanical follow-up for D49196. As microarchitecture.pdf notes, "20 AMD Ryzen pipeline", "20.8 Register renaming and out-of-order schedulers": The integer register file has 168 physical registers of 64 bits each. The floating point register file has 160 registers of 128 bits each. "20.14 Partial register access": The processor always keeps the different parts of an integer register together. ... An instruction that writes to part of a register will therefore have a false dependence on any previous write to the same register or any part of it. Reviewers: andreadb, courbet, RKSimon, craig.topper, GGanesh Reviewed By: GGanesh Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D49393 llvm-svn: 337676	2018-07-23 10:10:13 +00:00
Chandler Carruth	1d926fb9f4	[x86/SLH] Fix a bug where we would harden tail calls twice -- once as a call, and then again as a return. Also added a comment to try and explain better why we would be doing what we're doing when hardening the (non-call) returns. llvm-svn: 337673	2018-07-23 07:56:15 +00:00
Chandler Carruth	0477b40137	[x86/SLH] Rename and comment the main hardening function. NFC. This provides an overview of the algorithm used to harden specific loads. It also brings this our terminology further in line with hardening rather than checking. Differential Revision: https://reviews.llvm.org/D49583 llvm-svn: 337667	2018-07-23 04:01:34 +00:00
Craig Topper	b2a626b52e	[X86] Remove the max vector width restriction from combineLoopMAddPattern and rely splitOpsAndApply to handle splitting. This seems to be a net improvement. There's still an issue under avx512f where we have a 512-bit vpaddd, but not vpmaddwd so we end up doing two 256-bit vpmaddwds and inserting the results before a 512-bit vpaddd. It might be better to do two 512-bits paddds with zeros in the upper half. Same number of instructions, but breaks a dependency. llvm-svn: 337656	2018-07-22 19:44:35 +00:00
Simon Atanasyan	2c5b18f70f	[mips] Factor out register class selection for global base register. NFC Factor out register class selection for global base register into a separate function to escape long chain of ternary operators. llvm-svn: 337647	2018-07-21 16:16:08 +00:00
Simon Atanasyan	ecd1e0afdd	[mips] Move out the WrapperPat declaration from the NotInMicroMips predicate This is a follow-up to the rL335185. Those commit adds some WrapperPat patterns for microMIPS target. But declaration of the WrapperPat class is under the NotInMicroMips predicate and microMIPS patterns cannot be selected because predicate (Subtarget->inMicroMipsMode()) && (!Subtarget->inMicroMipsMode()) is always false. This change move out the WrapperPat class declaration from the NotInMicroMips predicate and enables microMIPS WrapperPat patterns. Differential revision: https://reviews.llvm.org/D49533 llvm-svn: 337646	2018-07-21 16:16:03 +00:00
Matt Arsenault	5ae765e68c	AMDGPU: Use existing function to check for VGPRs llvm-svn: 337621	2018-07-20 21:20:36 +00:00
Benjamin Kramer	64c7fa3201	Revert "[X86][AVX] Convert X86ISD::VBROADCAST demanded elts combine to use SimplifyDemandedVectorElts" This reverts commit r337547. It triggers an infinite loop. llvm-svn: 337617	2018-07-20 20:59:46 +00:00
Craig Topper	28ac623f6f	[X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types. Ideally our ISD node types going into the isel table would have types consistent with their instruction domain. This prevents us having to duplicate patterns with different types for the same instruction. Unfortunately, it seems our shuffle combining is currently relying on this a little remove some bitcasts. This seems to enable some switching between shufps and shufd. Hopefully there's some way we can address this in the combining. Differential Revision: https://reviews.llvm.org/D49280 llvm-svn: 337590	2018-07-20 17:57:53 +00:00
Craig Topper	6194ccf8c7	[X86] Remove what appear to be unnecessary uses of DCI.CombineTo CombineTo is most useful when you need to replace multiple results, avoid the worklist management, or you need to something else after the combine, etc. Otherwise you should be able to just return the new node and let DAGCombiner go through its usual worklist code. All of the places changed in this patch look to be standard cases where we should be able to use the more stand behavior of just returning the new node. Differential Revision: https://reviews.llvm.org/D49569 llvm-svn: 337589	2018-07-20 17:57:42 +00:00
Simon Pilgrim	70fcd0f481	[X86][XOP] Fix SUB constant folding for VPSHA/VPSHL shift lowering We can safely use getConstant here as we're still lowering, which allows constant folding to kick in and simplify the vector shift codegen. Noticed while working on D49562. llvm-svn: 337578	2018-07-20 16:55:18 +00:00
Evandro Menezes	fffa9b5897	[ARM] Add new feature to enable optimizing the VFP registers Enable the optimization of operations on DPR and SPR via a feature instead of checking the target. Differential revision: https://reviews.llvm.org/D49463 llvm-svn: 337575	2018-07-20 16:49:28 +00:00
Simon Pilgrim	c7132031a2	[X86][SSE] Use SplitOpsAndApply to improve HADD/HSUB lowering Improve AVX1 256-bit vector HADD/HSUB matching by using SplitOpsAndApply to split into 128-bit instructions. llvm-svn: 337568	2018-07-20 16:20:45 +00:00
Simon Pilgrim	a85b86a982	[X86][AVX] Add support for i16 256-bit vector horizontal op redundant shuffle removal llvm-svn: 337566	2018-07-20 15:51:01 +00:00
Simon Pilgrim	7c56bce996	[X86][AVX] Add support for 32/64 bits 256-bit vector horizontal op redundant shuffle removal llvm-svn: 337561	2018-07-20 15:24:12 +00:00
Simon Pilgrim	6fb8b68b2d	[X86][AVX] Convert X86ISD::VBROADCAST demanded elts combine to use SimplifyDemandedVectorElts This is an early step towards using SimplifyDemandedVectorElts for target shuffle combining - this merely moves the existing X86ISD::VBROADCAST simplification code to use the SimplifyDemandedVectorElts mechanism. Adds X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode to handle X86ISD::VBROADCAST - in time we can support all target shuffles (and other ops) here. llvm-svn: 337547	2018-07-20 13:26:51 +00:00
Jonas Paulsson	c88d3f6a99	[SystemZ] Reimplent SchedModel IssueWidth and WriteRes/ReadAdvance mappings. As a consequence of recent discussions (http://lists.llvm.org/pipermail/llvm-dev/2018-May/123164.html), this patch changes the SystemZ SchedModels so that the IssueWidth is 6, which is the decoder capacity, and NumMicroOps become the number of decoder slots needed per instruction. In addition, the SchedWrite latencies now match the MachineInstructions def-operand indexes, and ReadAdvances have been added on instructions with one register operand and one memory operand. Review: Ulrich Weigand https://reviews.llvm.org/D47008 llvm-svn: 337538	2018-07-20 09:40:43 +00:00
Andrew V. Tischenko	ee2e3144ba	Improved sched model for X86 BSWAP* instrs. Differential Revision: https://reviews.llvm.org/D49477 llvm-svn: 337537	2018-07-20 09:39:14 +00:00
Matt Arsenault	4bec7d4261	Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering" Reverts r337079 with fix for msan error. llvm-svn: 337535	2018-07-20 09:05:08 +00:00
Sander de Smalen	33f588acb9	[AArch64][SVE] Asm: Support for bit/byte reverse operations. This patch adds the following instructions: RBIT reverse bits within each active elemnt (predicated), e.g. rbit z0.d, p0/m, z1.d for 8, 16, 32 and 64 bit elements. REV reverse order of elements in data/predicate vector (unpredicated), e.g. rev z0.d, z1.d rev p0.d, p1.d for 8, 16, 32 and 64 bit elements. REVB reverse order of bytes within each active element, e.g. revb z0.d, p0/m, z1.d for 16, 32 and 64 bit elements. REVH reverse order of 16-bit half-words within each active element, e.g. revh z0.d, p0/m, z1.d for 32 and 64 bit elements. REVW reverse order of 32-bit words within each active element, e.g. revw z0.d, p0/m, z1.d for 64 bit elements. llvm-svn: 337534	2018-07-20 09:00:44 +00:00
Sander de Smalen	3ed7f81ce1	[AArch64][SVE] Asm: Support for FTMAD instruction. Floating-point trigonometric multiply-add coefficient, e.g. ftmad z0.h, z0.h, z1.h, #7 with variants for 16, 32 and 64-bit elements. llvm-svn: 337533	2018-07-20 08:47:26 +00:00
Heejin Ahn	022a37af77	[WebAssembly] Disable a test that violates DR1696 Summary: lifetime2.C violates DR1696, which prevents reference members from being initialized to temporaries, whose lifetime would end at the end of ctor. Reviewers: sbc100 Subscribers: dschuff, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49577 llvm-svn: 337512	2018-07-20 00:13:42 +00:00
Chandler Carruth	a3a03ac247	[x86/SLH] Clean up helper naming for return instruction handling and remove dead declaration of a call instruction handling helper. This moves to the 'harden' terminology that I've been trying to settle on for returns. It also adds a really detailed comment explaining what all we're trying to accomplish with return instructions and why. Hopefully this makes it much more clear what exactly is being "hardened". Differential Revision: https://reviews.llvm.org/D49571 llvm-svn: 337510	2018-07-19 23:46:24 +00:00
Simon Pilgrim	1d181bc992	[X86][AVX] Use extract_subvector to reduce vector op widths (PR36761) We have a number of cases where we fail to reduce vector op widths, performing the op in a larger vector and then extracting a subvector. This is often because by default it would create illegal types. This peephole patch attempts to handle a few common cases detailed in PR36761, which typically involved extension+conversion to vX2f64 types. Differential Revision: https://reviews.llvm.org/D49556 llvm-svn: 337500	2018-07-19 21:52:06 +00:00
Craig Topper	9888670c6b	[X86] Fix some 'return SDValue()' after DCI.CombineTo instead return the output of CombineTo Returning SDValue() means nothing was changed. Returning the result of CombineTo returns the first argument of CombineTo. This is specially detected by DAGCombiner as meaning that something changed, but worklist management was already taken care of. I think the only real effect of this change is that we now properly update the Statistic the counts the number of combines performed. That's the only thing between the check for null and the check for N in the DAGCombiner. llvm-svn: 337491	2018-07-19 20:10:44 +00:00
Stefan Pintilie	0c122d5a41	[Power9] Code Cleanup - Remove needsAggressiveScheduling() As we already return true from needsAggressiveScheduling() for the most recent hardware it would be cleaner to just return true for all PowerPC hardware. Differential Revision: https://reviews.llvm.org/D48663 llvm-svn: 337488	2018-07-19 19:34:18 +00:00
Andrea Di Biagio	b6022aa8d9	[X86][BtVer2] correctly model the latency/throughput of LEA instructions. This patch fixes the latency/throughput of LEA instructions in the BtVer2 scheduling model. On Jaguar, A 3-operands LEA has a latency of 2cy, and a reciprocal throughput of 1. That is because it uses one cycle of SAGU followed by 1cy of ALU1. An LEA with a "Scale" operand is also slow, and it has the same latency profile as the 3-operands LEA. An LEA16r has a latency of 3cy, and a throughput of 0.5 (i.e. RThrouhgput of 2.0). This patch adds a new TIIPredicate named IsThreeOperandsLEAFn to X86Schedule.td. The tablegen backend (for instruction-info) expands that definition into this (file X86GenInstrInfo.inc): ``` static bool isThreeOperandsLEA(const MachineInstr &MI) { return ( ( MI.getOpcode() == X86::LEA32r \|\| MI.getOpcode() == X86::LEA64r \|\| MI.getOpcode() == X86::LEA64_32r \|\| MI.getOpcode() == X86::LEA16r ) && MI.getOperand(1).isReg() && MI.getOperand(1).getReg() != 0 && MI.getOperand(3).isReg() && MI.getOperand(3).getReg() != 0 && ( ( MI.getOperand(4).isImm() && MI.getOperand(4).getImm() != 0 ) \|\| (MI.getOperand(4).isGlobal()) ) ); } ``` A similar method is generated in the X86_MC namespace, and included into X86MCTargetDesc.cpp (the declaration lives in X86MCTargetDesc.h). Back to the BtVer2 scheduling model: A new scheduling predicate named JSlowLEAPredicate now checks if either the instruction is a three-operands LEA, or it is an LEA with a Scale value different than 1. A variant scheduling class uses that new predicate to correctly select the appropriate latency profile. Differential Revision: https://reviews.llvm.org/D49436 llvm-svn: 337469	2018-07-19 16:42:15 +00:00
Tim Northover	a7c18ee8c4	ARM: switch armv7em MachO triple to hard-float defaults and libcalls. We were emitting incorrect calls to libm functions that LLVM had decided it knew about because the default is soft-float. Recommitted without breaking ELF this time. llvm-svn: 337450	2018-07-19 12:44:51 +00:00
Chandler Carruth	4b0028a3d1	[x86/SLH] Major refactoring of SLH implementaiton. There are two big changes that are intertwined here: 1) Extracting the tracing of predicate state through the CFG to its own function. 2) Creating a struct to manage the predicate state used throughout the pass. Doing #1 necessitates and motivates the particular approach for #2 as now the predicate management is spread across different functions focused on different aspects of it. A number of simplifications then fell out as a direct consequence. I went with an Optional to make it more natural to construct the MachineSSAUpdater object. This is probably the single largest outstanding refactoring step I have. Things get a bit more surgical from here. My current goal, beyond generally making this maintainable long-term, is to implement several improvements to how we do interprocedural tracking of predicate state. But I don't want to do that until the predicate state management and tracing is in reasonably clear state. Differential Revision: https://reviews.llvm.org/D49427 llvm-svn: 337446	2018-07-19 11:13:58 +00:00
Simon Pilgrim	dd7bf598cc	Fix spelling mistake in comments. NFCI. llvm-svn: 337442	2018-07-19 09:14:39 +00:00
Heejin Ahn	47068a42d2	[WebAssembly] Add missing -mattr=+exception-handling guards Summary: The use of exception handling instructions should only be enabled with `-mattr=+exception-handling` option. Reviewers: jgravelle-google Subscribers: dschuff, sbc100, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49391 llvm-svn: 337425	2018-07-18 21:42:22 +00:00
Tim Northover	da142d10d9	Revert "ARM: switch armv7em triple to hard-float defaults and libcalls." This reverts commit r337385 until it can be targeted at MachO only. llvm-svn: 337424	2018-07-18 21:32:49 +00:00
Simon Pilgrim	d4b82da113	[X86][SSE] Canonicalize scalar fp arithmetic shuffle patterns As discussed on PR38197, this canonicalizes MOVS(N0, OP(N0, N1)) --> MOVS(N0, SCALAR_TO_VECTOR(OP(N0[0], N1[0]))) This returns the scalar-fp codegen lost by rL336971. Additionally it handles the OP(N1, N0)) case for commutable (FADD/FMUL) ops. Differential Revision: https://reviews.llvm.org/D49474 llvm-svn: 337419	2018-07-18 19:55:19 +00:00
Simon Atanasyan	e9d7b3198a	[mips] Fix predicate for the MipsTruncIntFP pattern This is a follow-up to the rL337171. This patch fixes regression introduced by the r337171 and enables MipsTruncIntFP pattern. Differential revision: https://reviews.llvm.org/D49469 llvm-svn: 337392	2018-07-18 14:11:22 +00:00
Simon Pilgrim	3a45369b9e	[X86][SSE] Remove BLENDPD canonicalization from combineTargetShuffle When rL336971 removed the scalar-fp isel patterns, we lost the need for this canonicalization - commutation/folding can handle everything else. llvm-svn: 337387	2018-07-18 13:01:20 +00:00
Tim Northover	e00cf4fc68	ARM: stop explicitly marking armv7k libcalls as hard-float. NFC. Since the triple's default is hard float, the libcalls will already use VFP registers. llvm-svn: 337386	2018-07-18 12:37:43 +00:00
Tim Northover	d4abd14c1b	ARM: switch armv7em triple to hard-float defaults and libcalls. We were emitting incorrect calls to libm functions that LLVM had decided it knew about because the default is soft-float. llvm-svn: 337385	2018-07-18 12:37:04 +00:00
Tim Northover	097a3e3d95	ARM: deduplicate hard-float detection code. NFC. ARMSubtarget had a copy/pasted block to determine whether the target was hard-float, but it just delegated to triple features anyway so it's better at the TargetMachine level. llvm-svn: 337384	2018-07-18 12:36:25 +00:00
Sander de Smalen	330d887d72	[AArch64][SVE] Asm: Support for unpredicated FP operations. This patch adds support for the following unpredicated floating-point instructions: FADD Floating point add FSUB Floating point subtract FMUL Floating point multiplication FTSMUL Floating point trigonometric starting value FRECPS Floating point reciprocal step FRSQRTS Floating point reciprocal square root step The instructions have the following assembly format: fadd z0.h, z1.h, z2.h and have variants for 16, 32 and 64-bit FP elements. llvm-svn: 337383	2018-07-18 11:59:12 +00:00
Daniel Cederman	959c8bf51c	Revert "[Sparc] Use the IntPair reg class for r constraints with value type f64" This reverts commit 55222c9183c6e07f53a54c4061677734f54feac1. I missed that this patch has a dependency on https://reviews.llvm.org/D49219 that has not been approved yet. llvm-svn: 337373	2018-07-18 10:05:30 +00:00
Sander de Smalen	ccdc7ebc1d	[AArch64][SVE] Asm: Support for UDOT/SDOT instructions. The signed/unsigned DOT instructions perform a dot-product on quadtuplets from two source vectors and accumulate the result in the destination register. The instructions come in two forms: Vector form, e.g. sdot z0.s, z1.b, z2.b - signed dot product on four 8-bit quad-tuplets, accumulating results in 32-bit elements. udot z0.d, z1.h, z2.h - unsigned dot product on four 16-bit quad-tuplets, accumulating results in 64-bit elements. Indexed form, e.g. sdot z0.s, z1.b, z2.b[3] - signed dot product on four 8-bit quad-tuplets with specified quadtuplet from second source vector, accumulating results in 32-bit elements. udot z0.d, z1.h, z2.h[1] - dot product on four 16-bit quad-tuplets with specified quadtuplet from second source vector, accumulating results in 64-bit elements. llvm-svn: 337372	2018-07-18 09:37:51 +00:00
Daniel Cederman	4e38df18ea	[Sparc] Use the IntPair reg class for r constraints with value type f64 Summary: This is how it appears to be handled in GCC and it prevents a "Unknown mismatch" error in the SelectionDAGBuilder. Reviewers: venkatra, jyknight, jrtc27 Reviewed By: jyknight, jrtc27 Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D49218 llvm-svn: 337370	2018-07-18 09:25:33 +00:00
Sander de Smalen	889fe81ce5	[AArch64][SVE] Asm: Integer divide instructions. This patch adds the following predicated instructions: UDIV Unsigned divide active elements UDIVR Unsigned divide active elements, reverse form. SDIV Signed divide active elements SDIVR Signed divide active elements, reverse form. e.g. udiv z0.s, p0/m, z0.s, z1.s (unsigned divide active elements in z0 by z1, store result in z0) sdivr z0.s, p0/m, z0.s, z1.s (signed divide active elements in z1 by z0, store result in z0) llvm-svn: 337369	2018-07-18 09:17:29 +00:00
Sander de Smalen	ac0cb5bf75	[AArch64][SVE] Asm: Support for integer MUL instructions. This patch adds the following instructions: MUL - multiply vectors, e.g. mul z0.h, p0/m, z0.h, z1.h - multiply with immediate, e.g. mul z0.h, z0.h, #127 SMULH - signed multiply returning high half, e.g. smulh z0.h, p0/m, z0.h, z1.h UMULH - unsigned multiply returning high half, e.g. umulh z0.h, p0/m, z0.h, z1.h llvm-svn: 337358	2018-07-18 08:10:03 +00:00
Craig Topper	92ea7a7b48	[X86] Enable commuting of VUNPCKHPD to VMOVLHPS to enable load folding by using VMOVLPS with a modified address. This required an annoying amount of tablegen multiclass changes to make only VUNPCKHPDZ128rr commutable. llvm-svn: 337357	2018-07-18 07:31:32 +00:00
Hiroshi Inoue	cd83d459bc	[NFC] fix trivial typos in comments llvm-svn: 337351	2018-07-18 06:04:43 +00:00
Justin Hibbits	22e939a15b	Fix build failures from r337347, found by clang * Delete a no-longer-used override, and mark the other getRegisterTypeForCallingConv() as override. * SPE only supports i32, not i64, as the internal type, so simply remove the type check, so that DestReg and Opc are provably always set. GCC 6.4 did not warn about either of the above. llvm-svn: 337350	2018-07-18 05:19:25 +00:00
Craig Topper	95063a45b8	[X86] Remove patterns that mix X86ISD::MOVLHPS/MOVHLPS with v2i64/v2f64 types. The X86ISD::MOVLHPS/MOVHLPS should now only be emitted in SSE1 only. This means that the v2i64/v2f64 types would be illegal thus we don't need these patterns. llvm-svn: 337349	2018-07-18 05:10:53 +00:00
Craig Topper	1425e10cc6	[X86] Generate v2f64 X86ISD::UNPCKL/UNPCKH instead of X86ISD::MOVLHPS/MOVHLPS for unary v2f64 {0,0} and {1,1} shuffles with SSE2. I'm trying to restrict the MOVLHPS/MOVHLPS ISD nodes to SSE1 only. With SSE2 we can use unpcks. I believe this will allow some patterns to be cleaned up to require fewer bitcasts. I've put in an odd isel hack to still select MOVHLPS instruction from the unpckh node to avoid changing tests and because movhlps is a shorter encoding. Ideally we'd do execution domain switching on this, but the operands are in the wrong order and are tied. We might be able to try a commute in the domain switching using custom code. We already support domain switching for UNPCKLPD and MOVLHPS. llvm-svn: 337348	2018-07-18 05:10:51 +00:00
Justin Hibbits	d52990c71b	Introduce codegen for the Signal Processing Engine Summary: The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1, e500v2, and several e200 cores. This adds support targeting the e500v2, as this is more common than the e500v1, and is in SoCs still on the market. This patch is very intrusive because the SPE is binary incompatible with the traditional FPU. After discussing with others, the cleanest solution was to make both SPE and FPU features on top of a base PowerPC subset, so all FPU instructions are now wrapped with HasFPU predicates. Supported by this are: * Code generation following the SPE ABI at the LLVM IR level (calling conventions) * Single- and Double-precision math at the level supported by the APU. Still to do: * Vector operations * SPE intrinsics As this changes the Callee-saved register list order, one test, which tests the precise generated code, was updated to account for the new register order. Reviewed by: nemanjai Differential Revision: https://reviews.llvm.org/D44830 llvm-svn: 337347	2018-07-18 04:25:10 +00:00
Justin Hibbits	4fa4fa6a73	Complete the SPE instruction set patterns This is the lead-up to having SPE codegen. Add the rest of the instructions, along with MC tests. Differential Revision: https://reviews.llvm.org/D44829 llvm-svn: 337346	2018-07-18 04:24:57 +00:00
Justin Hibbits	ceb3cd96f7	Add PowerPC e500(v2) core scheduler and directives. Differential Revision: https://reviews.llvm.org/D44828 llvm-svn: 337345	2018-07-18 04:24:49 +00:00
Craig Topper	a29f58dc31	[X86] Remove the vector alignment requirement from the patterns added in r337320. The resulting instruction will only load 64 bits so alignment isn't required. llvm-svn: 337334	2018-07-17 23:26:20 +00:00
Craig Topper	9ef92865ec	[X86] Add patterns for folding full vector load into MOVHPS and MOVLPS with SSE1 only. llvm-svn: 337320	2018-07-17 20:16:18 +00:00
Chandler Carruth	c0cb5731fc	[x86/SLH] Flesh out the data-invariant instruction table a bit based on feedback from Craig. Summary: The only thing he suggested that I've skipped here is the double-wide multiply instructions. Multiply is an area I'm nervous about there being some hidden data-dependent behavior, and it doesn't seem important for any benchmarks I have, so skipping it and sticking with the minimal multiply support that matches what I know is widely used in existing crypto libraries. We can always add double-wide multiply when we have clarity from vendors about its behavior and guarantees. I've tried to at least cover the fundamentals here with tests, although I've not tried to cover every width or permutation. I can add more tests where folks think it would be helpful. Reviewers: craig.topper Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D49413 llvm-svn: 337308	2018-07-17 18:07:59 +00:00
Sam Clegg	28b3e99482	[WebAssembly] Update WebAssemblyLowerEmscriptenEHSjLj to handle separate compilation Previously we were assuming whole program compilation. Now that separate compilation is a thing we need to update this pass. Firstly, it can no longer assert on the existence of malloc and free. This functions might not be in the current translation unit. If we need them then we will generate not imports for them. Secondly the global helper function we create should be marked as weak since we will be generating a separate copy in each translation unit. Finally the names of the symbols used must be unique and fixed since they need to agree across translation units. Differential Revision: https://reviews.llvm.org/D49263 llvm-svn: 337301	2018-07-17 16:40:03 +00:00
Craig Topper	9187bca71b	[X86] Remove some standalone patterns in favor of the patterns in the MOVLPD instruction definitions. Previously we passed 'null_frag' into the instruction definition. The multiclass is shared with MOVHPD which doesn't use null_frag. It turns out by passing X86Movsd it produces patterns equivalent to some standalone patterns. llvm-svn: 337299	2018-07-17 16:24:33 +00:00
Sander de Smalen	7b33faaf38	[AArch64][SVE]: Integer multiply-add/subtract instructions. This patch adds support for the following instructions: MLA mul-add, writing addend (Zda = Zda + Zn * Zm) MLS mul-sub, writing addend (Zda = Zda + -Zn * Zm) MAD mul-add, writing multiplicant (Zdn = Za + Zdn * Zm) MSB mul-sub, writing multiplicant (Zdn = Za + -Zdn * Zm) llvm-svn: 337293	2018-07-17 15:41:58 +00:00
Petar Jovanovic	f10e4798b4	[Mips][FastISel] Fix handling of icmp with i1 type The Mips FastISel back-end does not extend i1 values while lowering icmp. Ensure that we bail into DAG ISel when handling this case. Patch by Dragan Mladjenovic. Differential Revision: https://reviews.llvm.org/D49290 llvm-svn: 337288	2018-07-17 14:57:46 +00:00
Sander de Smalen	f41dd122d6	[AArch64][SVE] Asm: FP fused multiply-add/subtract instructions. This patch adds support for the following instructions: FMLA mul-add, writing addend (Zda = Zda + Zn * Zm) FNMLA negated mul-add, writing addend (Zda = -Zda + -Zn * Zm) FMLS mul-sub, writing addend (Zda = Zda + -Zn * Zm) FNMLS negated mul-sub, writing addend (Zda = -Zda + Zn * Zm) FMAD mul-add, writing multiplicant (Zdn = Za + Zdn * Zm) FNMAD negated mul-add, writing multiplicant (Zdn = -Za + -Zdn * Zm) FMSB mul-sub, writing multiplicant (Zdn = Za + -Zdn * Zm) FNMSB negated mul-sub, writing multiplicant (Zdn = -Za + Zdn * Zm) llvm-svn: 337282	2018-07-17 13:58:46 +00:00
Sander de Smalen	5dabcf887b	[AArch64][SVE] Asm: Support for predicated FP operations (FP immediate) This patch completes support for the following floating point instructions that take FP immediates: FADD* (addition) FSUB (subtract) FSUBR (subtract reverse form) FMUL* (multiplication) FMAX* (maximum) FMAXNM (maximum number) FMIN (maximum) FMINNM (maximum number) All operations are predicated and take a FP immediate operand, e.g. fadd z0.h, p0/m, z0.h, #0.5 fmin z0.s, p0/m, z0.s, #1.0 ^___________^ (tied) * Instructions added in a previous patch. llvm-svn: 337272	2018-07-17 12:36:08 +00:00
whitequark	7c4a074505	[LLVM-C] Add target triple normalization to the C API. rL333307 was introduced to remove automatic target triple normalization when calling sys::getDefaultTargetTriple(), arguing that users of the latter already called Triple::normalize() if necessary. However, users of the C API currently have no way of doing target triple normalization. This patch introduces an LLVMNormalizeTargetTriple function to the C API which wraps Triple::normalize() and can be used on the result of LLVMGetDefaultTargetTriple to achieve the same effect. Differential Revision: https://reviews.llvm.org/D49414 Reviewed By: whitequark llvm-svn: 337263	2018-07-17 10:57:39 +00:00
Sander de Smalen	3b9e342ae1	[AArch64][SVE] Asm: Support for predicated FP operations. This patch adds support for the following floating point instructions: FABD (absolute difference) FADD (addition) FSUB (subtract) FSUBR (subtract reverse form) FDIV (divide) FDIVR (divide reverse form) FMAX (maximum) FMAXNM (maximum number) FMIN (minimum) FMINNM (minimum number) FSCALE (adjust exponent) FMULX (multiply extended) All operations are predicated and binary form, e.g. fadd z0.h, p0/m, z0.h, z1.h ^___________^ (tied) Supporting 16, 32 and 64-bit FP elements. llvm-svn: 337259	2018-07-17 09:48:57 +00:00
Simon Pilgrim	e4d12bb2d6	[DAGCombiner] Call SimplifyDemandedVectorElts from EXTRACT_VECTOR_ELT If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops. Differential Revision: https://reviews.llvm.org/D49262 llvm-svn: 337258	2018-07-17 09:45:35 +00:00
Sander de Smalen	ec229abb9b	[AArch64][SVE] Asm: Support for SPLICE instruction. The SPLICE instruction splices two vectors into one vector using a predicate. It copies the active elements from the first vector, and then fills the remaining elements with the low-numbered elements from the second vector. The instruction has the following form, e.g. splice z0.b, p0, z0.b, z1.b for 8-bit elements. It also supports 16, 32 and 64-bit elements. llvm-svn: 337253	2018-07-17 08:52:45 +00:00
Sander de Smalen	78fe30a662	[AArch64][SVE] Asm: Support for EXT instruction. This patch adds an instruction that allows extracting a vector from a pair of vectors, given an immediate index that describes the element position to extract from. The instruction has the following assembly: ext z0.b, z0.b, z1.b, #imm where #imm is an immediate between 0 and 255. llvm-svn: 337251	2018-07-17 08:39:48 +00:00
Craig Topper	880f92ad71	[X86] Properly qualify some MOVSS/MOVSD patterns with OptSize. These are integer versions of patterns that I already fixed for floating point. llvm-svn: 337240	2018-07-17 06:24:16 +00:00
Daniel Cederman	c812dcc3db	[Sparc] Do not depend on icc for ta 1 The ta instruction will always trap, regardless of the value of the integer condition codes. TRAPri is marked as using icc, so we cannot use a pattern for TRAPri to implement ta 1, as verify-machineinstrs can complain that icc is not defined. Instead we implement ta 1 the same way as ta 5. llvm-svn: 337236	2018-07-17 05:49:33 +00:00
Craig Topper	c376a1916b	[X86] Add full set of patterns for turning ceil/floor/trunc/rint/nearbyint into rndscale with loads, broadcast, and masking. This amounts to pretty ridiculous number of patterns. Ideally we'd canonicalize the X86ISD::VRNDSCALE earlier to reuse those patterns. I briefly looked into doing that, but some strict FP operations could still get converted to rint and nearbyint during isel. It's probably still worthwhile to look into. This patch is meant as a starting point to work from. llvm-svn: 337234	2018-07-17 05:48:48 +00:00
Craig Topper	6751727d76	[X86] Add a missing FMA3 scalar intrinsic pattern. This allows us to use 231 form to fold an insertelement on the add input to the fma. There is technically no software intrinsic that can use this until AVX512F, but it can be manually built up from other intrinsics. llvm-svn: 337223	2018-07-16 23:10:58 +00:00
Sam Clegg	cf2a9e28b1	[WebAssembly] Remove ELF file support. This support was partial and temporary. Now that we have wasm object file support its no longer needed. Differential Revision: https://reviews.llvm.org/D48744 llvm-svn: 337222	2018-07-16 23:09:29 +00:00
Farhana Aleen	c370d7b33d	[AMDGPU] [AMDGPU] Support a fdot2 pattern. Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z)) -> fdot2((v2f16)S0, (v2f16)S1, (float)z) Author: FarhanaAleen Reviewed By: rampitec, b-sumner Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D49146 llvm-svn: 337198	2018-07-16 18:19:59 +00:00
Chandler Carruth	fa065aa75c	[x86/SLH] Completely rework how we sink post-load hardening past data invariant instructions to be both more correct and much more powerful. While testing, I continued to find issues with sinking post-load hardening. Unfortunately, it was amazingly hard to create any useful tests of this because we were mostly sinking across copies and other loading instructions. The fact that we couldn't sink past normal arithmetic was really a big oversight. So first, I've ported roughly the same set of instructions from the data invariant loads to also have their non-loading varieties understood to be data invariant. I've also added a few instructions that came up so often it again made testing complicated: inc, dec, and lea. With this, I was able to shake out a few nasty bugs in the validity checking. We need to restrict to hardening single-def instructions with defined registers that match a particular form: GPRs that don't have a NOREX constraint directly attached to their register class. The (tiny!) test case included catches all of the issues I was seeing (once we can sink the hardening at all) except for the NOREX issue. The only test I have there is horrible. It is large, inexplicable, and doesn't even produce an error unless you try to emit encodings. I can keep looking for a way to test it, but I'm out of ideas really. Thanks to Ben for giving me at least a sanity-check review. I'll follow up with Craig to go over this more thoroughly post-commit, but without it SLH crashes everywhere so landing it for now. Differential Revision: https://reviews.llvm.org/D49378 llvm-svn: 337177	2018-07-16 14:58:32 +00:00
Simon Atanasyan	738a6e449e	[mips] Eliminate the usage of hasStdEnc in MipsPat. Instead, the pattern is tagged with the correct predicate when it is declared. Some patterns have been duplicated as necessary. Patch by Simon Dardis. Differential revision: https://reviews.llvm.org/D48365 llvm-svn: 337171	2018-07-16 13:52:41 +00:00
Petar Jovanovic	021e4c82eb	[MIPS GlobalISel] Select instructions to load and store i32 on stack Add code for selection of G_LOAD, G_STORE, G_GEP, G_FRAMEINDEX and G_CONSTANT. Support loads and stores of i32 values. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D48957 llvm-svn: 337168	2018-07-16 13:29:32 +00:00
Roman Lebedev	de506632aa	[X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' pattern Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 \| PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. But the IR-optimal patter does not lower efficiently, so we want to undo it.. This handles the simple pattern. There is a second pattern with predicate and constants inverted. NOTE: we do not check uses here. we always do the transform. Reviewers: spatel, craig.topper, RKSimon, javed.absar Reviewed By: spatel Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49266 llvm-svn: 337166	2018-07-16 12:44:10 +00:00
Daniel Cederman	92a700a215	[Sparc] Use the correct encoding for ta 3 Summary: The old encoding generated a "tn %g1 + 3" instruction instead of the expected "ta 3". Reviewers: venkatra, jyknight Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D49171 llvm-svn: 337165	2018-07-16 12:28:26 +00:00
Daniel Cederman	ab09da7b57	[Sparc] Use the names .rem and .urem instead of __modsi3 and __umodsi3 Summary: These are the names used in libgcc. Reviewers: venkatra, jyknight, ekedaigle Reviewed By: jyknight Subscribers: joerg, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48915 llvm-svn: 337164	2018-07-16 12:22:08 +00:00
Daniel Cederman	68765757d4	[Sparc] Generate ta 1 for the @llvm.debugtrap intrinsic Summary: Software trap number one is the trap used for breakpoints in the Sparc ABI. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48637 llvm-svn: 337163	2018-07-16 12:16:53 +00:00
Chandler Carruth	e66a6f48e3	[x86/SLH] Fix a bug where we would try to post-load harden non-GPRs. Found cases that hit the assert I added. This patch factors the validity checking into a nice helper routine and calls it when deciding to harden post-load, and asserts it when doing so later. I've added tests for the various ways of loading a floating point type, as well as loading all vector permutations. Even though many of these go to identical instructions, it seems good to somewhat comprehensively test them. I'm confident there will be more fixes needed here, I'll try to add tests each time as I get this predicate adjusted. llvm-svn: 337160	2018-07-16 11:38:48 +00:00
Chandler Carruth	3620b9957a	[x86/SLH] Extract another small helper function, add better comments and use better terminology. NFC. llvm-svn: 337157	2018-07-16 10:46:16 +00:00
Mark Searles	f4e7025299	[AMDGPU][Waitcnt] Re-apply fix "comparison of integers of different signs" build error" Re-apply "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error"" ( fe0a456510131f268e388c4a18a92f575c0db183 ), which was inadvertantly reverted via 2b2ee080f0164485562593b1b87291a48cea4a9a . llvm-svn: 337156	2018-07-16 10:21:36 +00:00
Mark Searles	72da47df25	run post-RA hazard recognizer pass late Memory legalizer, waitcnt, and shrink passes can perturb the instructions, which means that the post-RA hazard recognizer pass should run after them. Otherwise, one of those passes may invalidate the work done by the hazard recognizer. Note that this has adverse side-effect that any consecutive S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a single S_NOP <N>. This should be addressed in a follow-on patch. Differential Revision: https://reviews.llvm.org/D49288 llvm-svn: 337154	2018-07-16 10:02:41 +00:00
Mark Searles	c2d5d9adb5	Revert "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error" This reverts commit fe0a456510131f268e388c4a18a92f575c0db183. llvm-svn: 337153	2018-07-16 10:02:40 +00:00
Craig Topper	07a1787501	[X86] Merge the FR128 and VR128 regclass since they have identical spill and alignment characteristics. This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid. The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well. llvm-svn: 337147	2018-07-16 06:56:09 +00:00
Chandler Carruth	bc46bca99e	[x86/SLH] Fix an unused variable warning in release builds after r337144. llvm-svn: 337145	2018-07-16 04:42:27 +00:00

... 9 10 11 12 13 ...

49400 Commits