llvm-project

Commit Graph

Author	SHA1	Message	Date
Alina Sbirlea	cb4ed8a7bc	[MemorySSA] When applying updates, clean unnecessary Phis. Summary: After applying a set of insert updates, there may be trivial Phis left over. Clean them up. Reviewers: george.burgess.iv Subscribers: jlebar, Prazek, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63033 llvm-svn: 363094	2019-06-11 19:09:34 +00:00
Alina Sbirlea	3cef1f7d64	Only passes that preserve MemorySSA must mark it as preserved. Summary: The method `getLoopPassPreservedAnalyses` should not mark MemorySSA as preserved, because it's being called in a lot of passes that do not preserve MemorySSA. Instead, mark the MemorySSA analysis as preserved by each pass that does preserve it. These changes only affect the new pass mananger. Reviewers: chandlerc Subscribers: mehdi_amini, jlebar, Prazek, george.burgess.iv, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62536 llvm-svn: 363091	2019-06-11 18:27:49 +00:00
Amy Huang	9970817c57	Deduplicate S_CONSTANTs in LLD. Summary: Deduplicate S_CONSTANTS when linking, if they have the same value. Reviewers: rnk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63151 llvm-svn: 363089	2019-06-11 18:02:39 +00:00
Jinsong Ji	ef2d6d99c0	[PowerPC] Enable MachinePipeliner for P9 with -ppc-enable-pipeliner Implement necessary target hooks to enable MachinePipeliner for P9 only. The pass is off by default, can be enabled with -ppc-enable-pipeliner for P9. Differential Revision: https://reviews.llvm.org/D62164 llvm-svn: 363085	2019-06-11 17:40:39 +00:00
Jonas Devlieghere	a6fe345ac9	[Path] Set FD to -1 in moved-from TempFile When moving a temp file, explicitly set the file descriptor to -1 so we can never accidentally close the moved-from TempFile. Differential revision: https://reviews.llvm.org/D63087 llvm-svn: 363083	2019-06-11 16:42:42 +00:00
Cameron McInally	08200d6d26	[InstCombine] Handle -(X-Y) --> (Y-X) for unary fneg when NSZ Differential Revision: https://reviews.llvm.org/D62612 llvm-svn: 363082	2019-06-11 16:21:21 +00:00
Cameron McInally	796de11331	[InstCombine] Update fptrunc (fneg x)) -> (fneg (fptrunc x) for unary FNeg Differential Revision: https://reviews.llvm.org/D62629 llvm-svn: 363080	2019-06-11 15:45:41 +00:00
Nico Weber	af6bc65ddf	lld-link: Reject more than one resource .obj file Users are exepcted to pass all .res files to the linker, which then merges all the resource in all .res files into a tree structure and then converts the final tree structure to a .obj file with .rsrc$01 and .rsrc$02 sections and then links that. If the user instead passes several .obj files containing such resources, the correct thing to do would be to have custom code to merge the trees in the resource sections instead of doing normal section merging -- but link.exe rejects if multiple resource obj files are passed in with LNK4078, so let lld-link do that too instead of silently writing broken .rsrc sections in that case. The only real way to run into this is if users manually convert .res files to .obj files by running cvtres and then handing the resulting .obj files to lld-link instead, which in practice likely never happens. (lld-link is slightly stricter than link.exe now: If link.exe is passed one .obj file created by cvtres, and a .res file, for some reason it just emits a warning instead of an error and outputs strange looking data. lld-link now errors out on mixed input like this.) One way users could accidentally run into this is the following scenario: If a .res file is passed to lib.exe, then lib.exe calls cvtres.exe on the .res file before putting it in the output .lib. (llvm-lib currently doesn't do this.) link.exe's /wholearchive seems to only add obj files referenced from the static library index, but lld-link current really adds all files in the archive. So if lld-link /wholearchive is used with .lib files produced by lib.exe and .res files were among the files handed to lib.exe, we previously silently produced invalid output, but now we error out. link.exe's /wholearchive semantics on the other hand mean that it wouldn't load the resource object files from the .lib file at all. Since this scenario is probably still an unlikely corner case, the difference in behavior here seems fine -- and lld-link might have to change to use link.exe's /wholearchive semantics in the future anyways. Vaguely related to PR42180. Differential Revision: https://reviews.llvm.org/D63109 llvm-svn: 363078	2019-06-11 15:22:28 +00:00
Lewis Revill	a5240361dd	[RISCV] Add lowering of addressing sequences for PIC This patch allows lowering of PIC addresses by using PC-relative addressing for DSO-local symbols and accessing the address through the global offset table for non-DSO-local symbols. Differential Revision: https://reviews.llvm.org/D55303 llvm-svn: 363058	2019-06-11 12:57:47 +00:00
Lewis Revill	28a5cadb3a	[RISCV] Lower inline asm constraints I, J & K for RISC-V This validates and lowers arguments to inline asm nodes which have the constraints I, J & K, with the following semantics (equivalent to GCC): I: Any 12-bit signed immediate. J: Immediate integer zero only. K: Any 5-bit unsigned immediate. Differential Revision: https://reviews.llvm.org/D54093 llvm-svn: 363054	2019-06-11 12:42:13 +00:00
Mikhail Maltsev	7bd5c55cad	[ARM] First MVE instructions: scalar shifts. This introduces a new decoding table for MVE instructions, and starts by adding the family of scalar shift instructions that are part of the MVE architecture extension: saturating shifts within a single GPR, and long shifts across a pair of GPRs (both saturating and normal). Some of these shift instructions have only 3-bit register fields in the encoding, with the low bit fixed. So they can only address an odd or even numbered GPR (depending on the operand), and therefore I add two new register classes, GPREven and GPROdd. Differential Revision: https://reviews.llvm.org/D62668 Change-Id: Iad95d5f83d26aef70c674027a184a6b1e0098d33 llvm-svn: 363051	2019-06-11 12:04:32 +00:00
Nico Weber	dd6019526d	Let writeWindowsResourceCOFF() take a TimeStamp parameter For lld, pass in Config->Timestamp (which is set based on lld's /timestamp: and /Brepro flags). Since the writeWindowsResourceCOFF() data is only used in-memory by LLD and the obj's timestamp isn't used for anything in the output, this doesn't change behavior. For llvm-cvtres, add an optional /timestamp: parameter, and use the current behavior of calling time() if the parameter is not passed in. This doesn't really change observable behavior (unless someone passes /timestamp: to llvm-cvtres, which wasn't possible before), but it removes the last unqualified call to time() from llvm/lib, which seems like a good thing. Differential Revision: https://reviews.llvm.org/D63116 llvm-svn: 363050	2019-06-11 11:26:50 +00:00
Simon Pilgrim	266f43964e	[TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI. As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075. llvm-svn: 363048	2019-06-11 11:00:23 +00:00
Orlando Cazalet-Hyams	1a0f7a2077	[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. I have set up a separate review D61933 for a fix which is required for this patch. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse Reviewed By: hfinkel, jmorse Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 llvm-svn: 363046	2019-06-11 10:37:20 +00:00
Simon Tatham	14241378d3	[ARM] Fix unused-variable warning in rL363039. The variable `OffsetMask` is currently only used in an assertion, so if assertions are compiled out and -Werror is enabled, it becomes a build failure. llvm-svn: 363043	2019-06-11 10:09:12 +00:00
Simon Pilgrim	287e78c82b	[DAGCombine] GetNegatedExpression - constant float vector support (PR42105) Add support for negation of constant build vectors. Differential Revision: https://reviews.llvm.org/D62963 llvm-svn: 363040	2019-06-11 09:44:33 +00:00
Simon Tatham	8c865cacda	[ARM] Add the non-MVE instructions in Arm v8.1-M. This adds support for the new family of conditional selection / increment / negation instructions; the low-overhead branch instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole list of registers at once; the new VMRS/VMSR and VLDR/VSTR instructions to get data in and out of 8.1-M system registers, particularly including the new VPR register used by MVE vector predication. To support this, we also add a register name 'zr' (used by the CSEL family to force one of the inputs to the constant 0), and operand types for lists of registers that are also allowed to include APSR or VPR (used by CLRM). The VLDR/VSTR instructions also need a new addressing mode. The low-overhead branch instructions exist in their own separate architecture extension, which we treat as enabled by default, but you can say -mattr=-lob or equivalent to turn it off. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Reviewed By: samparker Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62667 llvm-svn: 363039	2019-06-11 09:29:18 +00:00
Sander de Smalen	cbeb563cfb	Change semantics of fadd/fmul vector reductions. This patch changes how LLVM handles the accumulator/start value in the reduction, by never ignoring it regardless of the presence of fast-math flags on callsites. This change introduces the following new intrinsics to replace the existing ones: llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul and adds functionality to auto-upgrade existing LLVM IR and bitcode. Reviewers: RKSimon, greened, dmgreen, nikic, simoll, aemerson Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D60261 llvm-svn: 363035	2019-06-11 08:22:10 +00:00
Craig Topper	627d8168e7	[X86] Add load folding isel patterns to scalar_math_patterns and AVX512_scalar_math_fp_patterns. Also add a FIXME for the peephole pass not being able to handle this. llvm-svn: 363032	2019-06-11 04:30:53 +00:00
Tom Stellard	4b0b26199b	Revert CMake: Make most target symbols hidden by default This reverts r362990 (git commit `374571301d`) This was causing linker warnings on Darwin: ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)' from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol 'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&), std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings. llvm-svn: 363028	2019-06-11 03:21:13 +00:00
Peter Collingbourne	e5bdedac9d	Symbolize: Make DWPName a symbolizer option instead of an argument to symbolize{,Inlined}Code. This makes the interface simpler and more consistent with the interface for .dSYM files and fixes a bug where llvm-symbolizer would not read the dwp if it was asked to symbolize data before symbolizing code. Differential Revision: https://reviews.llvm.org/D63114 llvm-svn: 363025	2019-06-11 02:32:27 +00:00
Matt Arsenault	c5830f5f05	AtomicExpand: Don't crash on non-0 alloca This now produces garbage on AMDGPU with a call to an nonexistent, anonymous libcall but won't assert. llvm-svn: 363022	2019-06-11 01:35:07 +00:00
Matt Arsenault	383e72fcfe	AMDGPU: Expand < 32-bit atomics Also fix AtomicExpand asserting on atomicrmw fadd/fsub. llvm-svn: 363021	2019-06-11 01:35:00 +00:00
Nico Weber	b941fa8821	llvm-lib: Implement /machine: argument And share some code with lld-link. While here, also add a FIXME about PR42180 and merge r360150 to llvm-lib. Differential Revision: https://reviews.llvm.org/D63021 llvm-svn: 363016	2019-06-11 01:13:41 +00:00
Yi Kong	432f48fcd4	[AArch64] Add more CPUs to host detection Returns "cortex-a73" for 3rd and 4th gen Kryo; not precisely correct, but close enough. Differential Revision: https://reviews.llvm.org/D63099 llvm-svn: 363013	2019-06-11 00:05:36 +00:00
Puyan Lotfi	4d89462a1c	[MIR-Canon] Fixing non-determinism that was breaking bots (NFC). An earlier fix of a subtle iterator invalidation bug had uncovered a nondeterminism that was present in the MultiUsers bag. Problem was that MultiUsers was being looked up using pointers. This patch is an NFC change that numbers each multiuser and processes each in numbered order. This fixes the test failure on netbsd and will likely fix the green-dragon bot too. llvm-svn: 363012	2019-06-11 00:00:25 +00:00
Shoaib Meenai	5062cf599c	[Support] Explicitly detect recursive response files Previous detection relied upon an arbitrary hard coded limit of 21 response files, which some code bases were running up against. The new detection maintains a stack of processing response files and explicitly checks if a newly encountered file is in the current stack. Some bookkeeping data is necessary in order to detect when to pop the stack. Patch by Chris Glover. Differential Revision: https://reviews.llvm.org/D62798 llvm-svn: 363005	2019-06-10 23:24:02 +00:00
Rong Xu	7ea131c20c	[PGO] Fix the buildbot failure in r362995 Fixed one unused variable warning. llvm-svn: 363004	2019-06-10 23:20:04 +00:00
Rong Xu	e44fa83c37	[PGO] Handle cases of non-instrument BBs As shown in PR41279, some basic blocks (such as catchswitch) cannot be instrumented. This patch filters out these BBs in PGO instrumentation. It also sets the profile count to the fail-to-instrument edge, so that we can propagate the counts in the CFG. Differential Revision: https://reviews.llvm.org/D62700 llvm-svn: 362995	2019-06-10 22:36:27 +00:00
Tom Stellard	374571301d	CMake: Make most target symbols hidden by default Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 36221 nm after/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439 llvm-svn: 362990	2019-06-10 22:12:56 +00:00
Jessica Paquette	b22954384e	[GlobalISel] Translate memset/memmove/memcpy from undef ptrs into nops If the source is undef, then just don't do anything. This matches SelectionDAG's behaviour in SelectionDAG.cpp. Also add a test showing that we do the right thing here. (irtranslator-memfunc-undef.ll) Differential Revision: https://reviews.llvm.org/D63095 llvm-svn: 362989	2019-06-10 21:53:56 +00:00
Philip Reames	4bf1c23990	Factor out a helper function for readability and reuse in a future patch [NFC] llvm-svn: 362980	2019-06-10 20:41:27 +00:00
Philip Reames	a9633d5f0b	[LFTR] Use recomputed BE count This was discussed as part of D62880. The basic thought is that computing BE taken count after widening should produce (on average) an equally good backedge taken count as the one before widening. Since there's only one test in the suite which is impacted by this change, and it's essentially equivelent codegen, that seems to be a reasonable assertion. This change was separated from r362971 so that if this turns out to be problematic, the triggering piece is obvious and easily revertable. For the nestedIV example from elim-extend.ll, we end up with the following BE counts: BEFORE: (-2 + (-1 * %innercount) + %limit) AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>) Note that before is an i32 type, and the after is an i64. Truncating the i64 produces the i32. llvm-svn: 362975	2019-06-10 19:18:53 +00:00
Jinsong Ji	9c7f93e914	[PowerPC][HTM]Fix $zero is not a GPRC register for builtin_ttest This was found during HTM cleanup. Adding a test for builtin_ttest would expose following issue. * Bad machine code: Illegal physical register for instruction * - function: test10 - basic block: %bb.0 entry (0xf0e57497b58) - instruction: %5:crrc0 = TABORTWCI 0, $zero, 0 - operand 2: $zero $zero is not a GPRC register. LLVM ERROR: Found 1 machine code errors. Differential Revision: https://reviews.llvm.org/D63079 llvm-svn: 362974	2019-06-10 19:04:14 +00:00
Philip Reames	5d84ccb230	Prepare for multi-exit LFTR [NFC] This change does the plumbing to wire an ExitingBB parameter through the LFTR implementation, and reorganizes the code to work in terms of a set of individual loop exits. Most of it is fairly obvious, but there's one key complexity which makes it worthy of consideration. The actual multi-exit LFTR patch is in D62625 for context. Specifically, it turns out the existing code uses the backedge taken count from before a IV is widened. Oddly, we can end up with a different (more expensive, but semantically equivelent) BE count for the loop when requerying after widening. For the nestedIV example from elim-extend, we end up with the following BE counts: BEFORE: (-2 + (-1 * %innercount) + %limit) AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>) This is the only test in tree which seems sensitive to this difference. The actual result of using the wider BETC on this example is that we actually produce slightly better code. :) In review, we decided to accept that test change. This patch is structured to preserve the old behavior, but a separate change will immediate follow with the behavior change. (I wanted it separate for problem attribution purposes.) Differential Revision: https://reviews.llvm.org/D62880 llvm-svn: 362971	2019-06-10 17:51:13 +00:00
Francis Visoiu Mistrih	a438432acc	[FastISel] Skip creating unnecessary vregs for arguments This behavior was added in r130928 for both FastISel and SD, and then disabled in r131156 for FastISel. This re-enables it for FastISel with the corresponding fix. This is triggered only when FastISel can't lower the arguments and falls back to SelectionDAG for it. FastISel contains a map of "register fixups" where at the end of the selection phase it replaces all uses of a register with another register that FastISel sometimes pre-assigned. Code at the end of SelectionDAGISel::runOnMachineFunction is doing the replacement at the very end of the function, while other pieces that come in before that look through the MachineFunction and assume everything is done. In this case, the real issue is that the code emitting COPY instructions for the liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg assigned to the physreg is used, and if it's not, it will skip the COPY. If a register wasn't replaced with its assigned fixup yet, the copy will be skipped and we'll end up with uses of undefined registers. This fix moves the replacement of registers before the emission of copies for the live-ins. The initial motivation for this fix is to enable tail calls for swiftself functions, which were blocked because we couldn't prove that the swiftself argument (which is callee-save) comes from a function argument (live-in), because there was an extra copy (vreg to vreg). A few tests are affected by this: * llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21 (callee-save) but never reload it because it's attached to the return. We now don't even spill it anymore. * llvm/test/CodeGen//swiftself.ll: we tail-call now. llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this test was not really testing the right thing, but it worked because the same registers were re-used. * llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes * llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy * llvm/test/CodeGen/Mips/: get rid of spills and copies llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack * llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack * llvm/test/CodeGen/X86/swifterror.ll: same as AArch64 * llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed Differential Revision: https://reviews.llvm.org/D62361 llvm-svn: 362963	2019-06-10 16:53:37 +00:00
Cameron McInally	670d0f478b	[ExecutionEngine] Fix rL362941: Add UnaryOperator visitor to the interpreter Missed break statements. This was D62881. llvm-svn: 362958	2019-06-10 16:05:25 +00:00
Piotr Sobczak	9b11e93d90	[AMDGPU] Optimize image_[load\|store]_mip Summary: Replace image_load_mip/image_store_mip with image_load/image_store if lod is 0. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63073 llvm-svn: 362957	2019-06-10 15:58:51 +00:00
Simon Tatham	67065c5c70	Revert rL362953 and its followup rL362955. These caused a build failure because I managed not to notice they depended on a later unpushed commit in my current stack. Sorry about that. llvm-svn: 362956	2019-06-10 15:58:19 +00:00
Simon Tatham	42078d41d5	[ARM] Add the non-MVE instructions in Arm v8.1-M. This should have been part of r362953, but I had a finger-trouble incident and committed the old rather than new version of the patch. Sorry. llvm-svn: 362955	2019-06-10 15:41:58 +00:00
Sanjay Patel	9650c95b7e	[InstCombine] allow unordered preds when canonicalizing to fabs() We have a known-never-nan value via 'nnan', so an unordered predicate is the same as its ordered sibling. Similar to: rL362937 llvm-svn: 362954	2019-06-10 15:39:00 +00:00
Simon Tatham	baeea91933	[ARM] Add the non-MVE instructions in Arm v8.1-M. This adds support for the new family of conditional selection / increment / negation instructions; the low-overhead branch instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole list of registers at once; the new VMRS/VMSR and VLDR/VSTR instructions to get data in and out of 8.1-M system registers, particularly including the new VPR register used by MVE vector predication. To support this, we also add a register name 'zr' (used by the CSEL family to force one of the inputs to the constant 0), and operand types for lists of registers that are also allowed to include APSR or VPR (used by CLRM). The VLDR/VSTR instructions also need some new addressing modes. The low-overhead branch instructions exist in their own separate architecture extension, which we treat as enabled by default, but you can say -mattr=-lob or equivalent to turn it off. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Reviewed By: samparker Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62667 llvm-svn: 362953	2019-06-10 15:36:34 +00:00
Jeremy Morse	bcff417292	[DebugInfo] Terminate all location-lists at end of block This commit reapplies r359426 (which was reverted in r360301 due to performance problems) and rolls in D61940 to address the performance problem. I've combined the two to avoid creating a span of slow-performance, and to ease reverting if more problems crop up. The summary of D61940: This patch removes the "ChangingRegs" facility in DbgEntityHistoryCalculator, as its overapproximate nature can produce incorrect variable locations. An unchanging register doesn't mean a variable doesn't change its location. The patch kills off everything that calculates the ChangingRegs vector. Previously ChangingRegs spotted epilogues and marked registers as unchanging if they weren't modified outside the epilogue, increasing the chance that we can emit a single-location variable record. Without this feature, debug-loc-offset.mir and pr19307.mir become temporarily XFAIL. They'll be re-enabled by D62314, using the FrameDestroy flag to identify epilogues, I've split this into two steps as FrameDestroy isn't necessarily supported by all backends. The logic for terminating variable locations at the end of a basic block now becomes much more enjoyably simple: we just terminate them all. Other test changes: inlined-argument.ll becomes XFAIL, but for a longer term. The current algorithm for detecting that a variable has a single-location doesn't work in this scenario (inlined function in multiple blocks), only other bugs were making this test work. fission-ranges.ll gets slightly refreshed too, as the location of "p" is now correctly determined to be a single location. Differential Revision: https://reviews.llvm.org/D61940 llvm-svn: 362951	2019-06-10 15:23:46 +00:00
Sanjay Patel	85de9634e6	[InstCombine] fix bug in canonicalization to fabs() Forgot to translate the predicate clauses in rL362943. llvm-svn: 362945	2019-06-10 14:57:45 +00:00
Sanjay Patel	8b6d9f60ed	[InstCombine] change canonicalization to fabs() to use FMF on fsub Similar to rL362909: This isn't the ideal fix (use FMF on the select), but it's still an improvement until we have better FMF propagation to selects and other FP math operators. I don't think there's much risk of regression from this change by not including the FMF on the fcmp any more. The nsz/nnan FMF should be the same on the fcmp and the fsub because they have the same operand. llvm-svn: 362943	2019-06-10 14:46:36 +00:00
Simon Tatham	b87669f166	[ARM] Disallow PC, and optionally SP, in VMOVRH and VMOVHR. Arm v8.1-M supports the VMOV instructions that move a half-precision value to and from a GPR, but not if the GPR is SP or PC. To fix this, I've changed those instructions to use the rGPR register class instead of GPR. rGPR always excludes PC, and it excludes SP except in the presence of the HasV8Ops target feature (i.e. Arm v8-A). So the effect is that VMOV.F16 to and from PC is now illegal everywhere, but VMOV.F16 to and from SP is illegal only on non-v8-A cores (which I believe is all as it should be). Reviewers: dmgreen, samparker, SjoerdMeijer, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60704 llvm-svn: 362942	2019-06-10 14:43:55 +00:00
Cameron McInally	ce49e2231b	[ExecutionEngine] Add UnaryOperator visitor to the interpreter This is to support the unary FNeg instruction. Differential Revision: https://reviews.llvm.org/D62881 llvm-svn: 362941	2019-06-10 14:38:48 +00:00
Sanjay Patel	8cd8c5784b	[InstCombine] allow unordered preds when canonicalizing to fabs() PR42179: https://bugs.llvm.org/show_bug.cgi?id=42179 llvm-svn: 362937	2019-06-10 14:14:51 +00:00
Andrea Di Biagio	47db08dbb1	[MCA] Further refactor the bottleneck analysis view. NFCI. llvm-svn: 362933	2019-06-10 12:50:08 +00:00
George Rimar	1e41007aeb	[yaml2obj/obj2yaml] - Make RawContentSection::Content and RawContentSection::Size optional This is a follow-up for D62809. Content and Size fields should be optional as was discussed in comments of the D62809's thread. With that, we can describe a specific string table and symbol table sections in a more correct way and also show appropriate errors. The patch adds lots of test cases where the behavior is described in details. Differential revision: https://reviews.llvm.org/D62957 llvm-svn: 362931	2019-06-10 12:43:18 +00:00
David Green	d847aa573b	[ARM] Enable Unroll UpperBound This option allows loops with small max trip counts to be fully unrolled. This can help with code like the remainder loops from manually unrolled loops like those that appear in the cmsis dsp library. We would apparently previously runtime unroll them with the default unroll count (4). Differential Revision: https://reviews.llvm.org/D63064 llvm-svn: 362928	2019-06-10 10:22:14 +00:00
Nikola Prica	abc1dff7e4	[DebugInfo] More strict debug range for stack variables Variable's stack location can stretch longer than it should. If a variable is placed at the stack in a some nested basic block its range can be calculated to be up to the next occurrence of the variable's DBG_VALUE, or up to the end of the function, thus covering a basic blocks that should not be included in the variable’s location range. This happens because the DbgEntityHistoryCalculator ends register locations at the end of a basic block only if the variable’s location register has been changed throughout the function, which is not the case for the register used to reference stack objects. This patch also tries to produce a single value location if the location list builder managed to merge all the locations into one. Reviewers: aprantl, dstenb, jmorse Reviewed By: aprantl, dstenb, jmorse Subscribers: djtodoro, ivanbaev, asowda Tags: #debug-info Differential Revision: https://reviews.llvm.org/D61600 llvm-svn: 362923	2019-06-10 08:41:06 +00:00
QingShan Zhang	ab846da7e8	[DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D62897 llvm-svn: 362921	2019-06-10 05:40:21 +00:00
Craig Topper	9000a72a4b	[X86] When promoting i16 compare with immediate to i32, try to use sign_extend for eq/ne if the input is truncated from a type with enough sign its. Summary: Our default behavior is to use sign_extend for signed comparisons and zero_extend for everything else. But for equality we have the freedom to use either extension. If we can prove the input has been truncated from something with enough sign bits, we can use sign_extend instead and let DAG combine optimize it out. A similar rule is used by type legalization in LegalizeIntegerTypes. This gets rid of the movzx in PR42189. The immediate will still take 4 bytes instead of the 2 bytes plus 0x66 prefix a cmp di, 32767 would get, but it avoids a length changing prefix. Reviewers: RKSimon, spatel, xbolva00 Reviewed By: xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63032 llvm-svn: 362920	2019-06-10 04:50:12 +00:00
Craig Topper	ceb807bbbc	[X86] Disable f32->f64 extload when sse2 is enabled Summary: We can only use the memory form of cvtss2sd under optsize due to a partial register update. So previously we were emitting 2 instructions for extload when optimizing for speed. Also due to a late optimization in preprocessiseldag we had to handle (fpextend (loadf32)) under optsize. This patch forces extload to expand so that it will always be in the (fpextend (loadf32)) form during isel. And when optimizing for speed we can just let each of those pieces select an instruction independently. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62710 llvm-svn: 362919	2019-06-10 04:37:16 +00:00
Vivek Pandya	11cb15f8ed	Do not derive no-recurse attribute if function does not have exact definition. This is fix for https://bugs.llvm.org/show_bug.cgi?id=41336 Reviewers: jdoerfert Reviewed by: jdoerfert Differential Revision: https://reviews.llvm.org/D63045 llvm-svn: 362918	2019-06-10 04:16:04 +00:00
Craig Topper	dd10099d5c	[X86] Use EVEX instructions for f128 FAND/FOR/FXOR when avx512vl is enabled. llvm-svn: 362915	2019-06-10 01:18:55 +00:00
Craig Topper	f7ba8b808a	[X86] Convert f32/f64 FANDN/FAND/FOR/FXOR to vector logic ops and scalar_to_vector/extract_vector_elts to reduce isel patterns. Previously we did the equivalent operation in isel patterns with COPY_TO_REGCLASS operations to transition. By inserting scalar_to_vetors and extract_vector_elts before isel we can allow each piece to be selected individually and accomplish the same final result. I ideally we'd use vector operations earlier in lowering/combine, but that looks to be more difficult. The scalar-fp-to-i64.ll changes are because we have a pattern for using movlpd for store+extract_vector_elt. While an f64 store uses movsd. The encoding sizes are the same. llvm-svn: 362914	2019-06-10 00:41:07 +00:00
Nico Weber	80fee25776	Revert r361953 "[SVE][IR] Scalable Vector IR Type" This reverts commit `f4fc01f8dd`. It caused a 3-4x slowdown when doing thinlto links, PR42210. llvm-svn: 362913	2019-06-09 19:27:50 +00:00
David Bolvansky	dcf5e6abdf	[TargetLowering] Simplify (ctpop x) == 1 Reviewers: craig.topper, spatel, RKSimon, bkramer Reviewed By: spatel Subscribers: javed.absar, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63004 llvm-svn: 362912	2019-06-09 18:18:57 +00:00
Roman Lebedev	d669758d84	[InstCombine] foldICmpWithLowBitMaskedVal(): 'icmp sgt/sle': avoid miscompiles A precondition 'x != 0' was forgotten by me: https://rise4fun.com/Alive/JFNP https://rise4fun.com/Alive/jHvL These 4 folds with non-constants could be re-enabled, but for now let's go for the simplest solution. https://bugs.llvm.org/show_bug.cgi?id=42198 llvm-svn: 362911	2019-06-09 16:30:42 +00:00
Sanjay Patel	87cd16a86e	[InstCombine] change canonicalization to fabs() to use FMF on fneg This isn't the ideal fix (use FMF on the select), but it's still an improvement until we have better FMF propagation to selects and other FP math operators. I don't think there's much risk of regression from this change by not including the FMF on the fcmp any more. The nsz/nnan FMF should be the same on the fcmp and the fneg (fsub) because they have the same operand. This works around the most glaring FMF logical inconsistency cited in PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 llvm-svn: 362909	2019-06-09 16:22:01 +00:00
Sanjay Patel	866db10228	[InstSimplify] reduce code duplication for fcmp folds; NFC llvm-svn: 362904	2019-06-09 13:58:46 +00:00
Sanjay Patel	73f5a855b3	[InstSimplify] enhance fcmp fold with never-nan operand This is another step towards correcting our usage of fast-math-flags when applied on an fcmp. In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of the fcmp. But I'm leaving that clause in until we're more confident that we can stop relying on fcmp's FMF. By using the more general "isKnownNeverNaN()", we gain a simplification shown on the tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN). On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction in addition to the FMF on the fcmp. This is a continuation of D62979 / rL362879. llvm-svn: 362903	2019-06-09 13:48:59 +00:00
Anton Afanasyev	623d9ba068	[MIR] Add simple PRE pass to MachineCSE This is the second part of the commit fixing PR38917 (hoisting partitially redundant machine instruction). Most of PRE (partitial redundancy elimination) and CSE work is done on LLVM IR, but some of redundancy arises during DAG legalization. Machine CSE is not enough to deal with it. This simple PRE implementation works a little bit intricately: it passes before CSE, looking for partitial redundancy and transforming it to fully redundancy, anticipating that the next CSE step will eliminate this created redundancy. If CSE doesn't eliminate this, than created instruction will remain dead and eliminated later by Remove Dead Machine Instructions pass. The third part of the commit is supposed to refactor MachineCSE, to make it more clear and to merge MachinePRE with MachineCSE, so one need no rely on further Remove Dead pass to clear instrs not eliminated by CSE. First step: https://reviews.llvm.org/D54839 Fixes llvm.org/PR38917 This is fixed recommit of r361356 after PowerPC64 multistage build failure. llvm-svn: 362901	2019-06-09 12:15:47 +00:00
Ayke van Laethem	f18cf230e4	[CaptureTracking] Don't let comparisons against null escape inbounds pointers Pointers that are in-bounds (either through dereferenceable_or_null or thorough a getelementptr inbounds) cannot be captured with a comparison against null. There is no way to construct a pointer that is still in bounds but also NULL. This helps safe languages that insert null checks before load/store instructions. Without this patch, almost all pointers would be considered captured even for simple loads. With this patch, an icmp with null will not be seen as escaping as long as certain conditions are met. There was a lot of discussion about this patch. See the Phabricator thread for detals. Differential Revision: https://reviews.llvm.org/D60047 llvm-svn: 362900	2019-06-09 10:20:33 +00:00
Jatin Bhateja	2a30aeb010	[X86] NFCI : Comment updation for EVEX to VEX translation. Reviewers: llvm-commits, jbhateja Reviewed By: jbhateja Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63055 llvm-svn: 362898	2019-06-09 09:59:26 +00:00
Simon Pilgrim	5f337149fa	Use for-range loop. NFCI. llvm-svn: 362897	2019-06-09 09:07:30 +00:00
Amara Emerson	0d20969dea	[AArch64][GlobalISel] Select immediate forms of cmp instructions. A simple re-use of the immediate operand matcher and renderer functions. rdar://43795178 llvm-svn: 362896	2019-06-09 07:31:25 +00:00
Craig Topper	2ba0e2518b	[X86] Remove (store (f32 (extractelt (v4f32))) isel patterns which is redundant. We emit a MOVSSmr and a COPY_TO_REGCLASS, but that's what we would get from selecting the store and extractelt independently. llvm-svn: 362895	2019-06-09 03:21:33 +00:00
Craig Topper	7d8494c41c	[X86] Mutate scalar fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE during PreProcessIselDAG to cut down on number of isel patterns. Similar was done for vectors in r362535. Removes about 1200 bytes from the isel table. llvm-svn: 362894	2019-06-08 23:53:31 +00:00
Simon Pilgrim	6bae6d5a5d	[DAGCombine] visitAND - merge (zext_inreg ((s)extload x)) -> (zextload x) combines. NFCI. Same codegen, only differ by the oneuse limit for the sextload case. llvm-svn: 362880	2019-06-08 17:02:00 +00:00
Sanjay Patel	4329c15f11	[InstSimplify] enhance fcmp fold with never-nan operand This is 1 step towards correcting our usage of fast-math-flags when applied on an fcmp. In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of the fcmp. But I'm leaving that clause in until we're more confident that we can stop relying on fcmp's FMF. By using the more general "isKnownNeverNaN()", we gain a simplification shown on the tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN). On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction in addition to the FMF on the fcmp. I'll update the 'ult' case below here as a follow-up assuming no problems here. Differential Revision: https://reviews.llvm.org/D62979 llvm-svn: 362879	2019-06-08 15:12:33 +00:00
David Green	c5471c2a57	[ARM] Adjust isLegalT1AddressImmediate for non-legal types Types such as float and i64's do not have legal loads in Thumb1, but will still be loaded with a LDR (or potentially multiple LDR's). As such we can treat the cost of addressing mode calculations the same as an i32 and get some optimisation benefits. Differential Revision: https://reviews.llvm.org/D62968 llvm-svn: 362874	2019-06-08 10:32:53 +00:00
David Green	342d1b81a3	[ARM] Add MVE addressing to isLegalT2AddressImmediate Now with MVE being added, we can add the vector addressing mode costs for it. These are generally imm7 multiplied by the size of the type being loaded / stored. Differential Revision: https://reviews.llvm.org/D62967 llvm-svn: 362873	2019-06-08 10:18:23 +00:00
David Green	4ecce205d5	[ARM] Add fp16 addressing to isLegalT2AddressImmediate The fp16 version of VLDR takes a imm8 multiplied by 2. This updates the costs to account for those, and adds extra testing. It is dependant upon hasFPRegs16 as this is what the load/store instructions require. Differential Revision: https://reviews.llvm.org/D62966 llvm-svn: 362872	2019-06-08 10:09:02 +00:00
David Green	10fbaa96c5	[ARM] Add HasNEON for all Neon patterns in ARMInstrNEON.td. NFCI We are starting to add an entirely separate vector architecture to the ARM backend. To do that we need at least some separation between the existing NEON and the new MVE code. This patch just goes through the Neon patterns and ensures that they are predicated on HasNEON, giving MVE a stable place to start from. No tests yet as this is largely an NFC, and we don't have the other target that will treat any of these intructions as legal. Differential Revision: https://reviews.llvm.org/D62945 llvm-svn: 362870	2019-06-08 09:36:49 +00:00
Jonas Paulsson	bca56ab073	[SystemZ] Fix CMakeLists.txt for alphabetical order (NFC). llvm-svn: 362869	2019-06-08 06:42:02 +00:00
Jonas Paulsson	fdc4ea34e3	[SystemZ, RegAlloc] Favor 3-address instructions during instruction selection. This patch aims to reduce spilling and register moves by using the 3-address versions of instructions per default instead of the 2-address equivalent ones. It seems that both spilling and register moves are improved noticeably generally. Regalloc hints are passed to increase conversions to 2-address instructions which are done in SystemZShortenInst.cpp (after regalloc). Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are the same), foldMemoryOperandImpl() can no longer trivially fold a spilled source register since the reg/reg instruction is now 3-address. In order to remedy this, new 3-address pseudo memory instructions are used to perform the folding only when the dst and lhs virtual registers are known to be allocated to the same physreg. In order to not let MachineCopyPropagation run and change registers on these transformed instructions (making it 3-address), a new target pass called SystemZPostRewrite.cpp is run just after VirtRegRewriter, that immediately lowers the pseudo to a target instruction. If it would have been possibe to insert a COPY instruction and change a register operand (convert to 2-address) in foldMemoryOperandImpl() while trusting that the caller (e.g. InlineSpiller) would update/repair the involved LiveIntervals, the solution involving pseudo instructions would not have been needed. This is perhaps a potential improvement (see Phabricator post). Common code changes: * A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a target pass immediately before MachineCopyPropagation. * VirtRegMap is passed as an argument to foldMemoryOperand(). Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D60888 llvm-svn: 362868	2019-06-08 06:19:15 +00:00
Amara Emerson	829037a914	Factor out SelectionDAG's switch analysis and lowering into a separate component. In order for GlobalISel to re-use the significant amount of analysis and optimization code in SDAG's switch lowering, we first have to extract it and create an interface to be used by both frameworks. No test changes as it's NFC. Differential Revision: https://reviews.llvm.org/D62745 llvm-svn: 362857	2019-06-08 00:05:17 +00:00
Keno Fischer	eb4a561fa3	[GVN] non-functional code movement Summary: Move some code around, in preparation for later fixes to the non-integral addrspace handling (D59661) Patch By Jameson Nash <jameson@juliacomputing.com> Reviewed By: reames, loladiro Differential Revision: https://reviews.llvm.org/D59729 llvm-svn: 362853	2019-06-07 23:08:38 +00:00
Matt Arsenault	ddd2c9ac86	AMDGPU: Force skips around traps llvm-svn: 362852	2019-06-07 23:02:52 +00:00
Alina Sbirlea	eaea538d18	[DomTreeUpdater] Add all insert before all delete updates to reduce compile time. Summary: The cleanup in D62751 introduced a compile-time regression due to the way DT updates are performed. Add all insert edges then all delete edges in DTU to match the previous compile time. Compile time on the test provided by @mstorsjo before and after this patch on my machine: 113.046s vs 35.649s Repro: clang -target x86_64-w64-mingw32 -c -O3 glew-preproc.c; on https://martin.st/temp/glew-preproc.c. Reviewers: kuhar, NutshellySima, mstorsjo Subscribers: jlebar, mstorsjo, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62981 llvm-svn: 362839	2019-06-07 20:43:55 +00:00
Craig Topper	bd03230cb0	[X86] Remove unnecessary new line escape from the end of a macro. NFC llvm-svn: 362837	2019-06-07 20:30:40 +00:00
Volkan Keles	97204a6788	[GlobalISel] IRTranslator: Translate the intrinsics ignored by CodeGen Summary: Translate `llvm.assume`, `llvm.var.annotation` and `llvm.sideeffect` to nothing as they have no effect on CodeGen. Reviewers: qcolombet, aditya_nandakumar, dsanders, paquette, aemerson, arsenm Reviewed By: arsenm Subscribers: hiraditya, wdng, rovka, kristof.beyls, javed.absar, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63022 llvm-svn: 362834	2019-06-07 20:19:27 +00:00
Nick Desaulniers	7ddd694d36	[APFloat] APFloat::Storage::Storage - refix use after move Summary: Re-land r360675 after it was reverted in r360770. This was reported in: https://llvm.org/reports/scan-build/ Based on feedback in: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190513/652286.html Reviewers: RKSimon, efriedma Reviewed By: RKSimon, efriedma Subscribers: eli.friedman, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62767 llvm-svn: 362833	2019-06-07 19:51:22 +00:00
Lang Hames	d4a8089f03	[ORC] Update symbol lookup to use a single callback with a required symbol state rather than two callbacks. The asynchronous lookup API (which the synchronous lookup API wraps for convenience) used to take two callbacks: OnResolved (called once all requested symbols had an address assigned) and OnReady to be called once all requested symbols were safe to access). This patch updates the asynchronous lookup API to take a single 'OnComplete' callback and a required state (SymbolState) to determine when the callback should be made. This simplifies the common use case (where the client is interested in a specific state) and will generalize neatly as new states are introduced to track runtime initialization of symbols. Clients who were making use of both callbacks in a single query will now need to issue two queries (one for SymbolState::Resolved and another for SymbolState::Ready). Synchronous lookup API clients who were explicitly passing the WaitOnReady argument will now need neeed to pass a SymbolState instead (for 'WaitOnReady == true' use SymbolState::Ready, for 'WaitOnReady == false' use SymbolState::Resolved). Synchronous lookup API clients who were using default arugment values should see no change. llvm-svn: 362832	2019-06-07 19:33:51 +00:00
Simon Pilgrim	f0240ee76d	[DAGCombine] visitAND - fix local shadow variable warnings. NFCI. llvm-svn: 362825	2019-06-07 18:36:43 +00:00
Simon Pilgrim	4c9db2045a	[DAGCombine] Use APInt::extractBits in "sub-splat" constant mask detection. NFCI. llvm-svn: 362820	2019-06-07 18:07:06 +00:00
Sanjay Patel	e490e4a0e7	[Analysis] simplify code for getSplatValue(); NFC AFAIK, this is only currently called by TTI, but it could be used from instcombine or CGP to help solve problems like: https://bugs.llvm.org/show_bug.cgi?id=37428 https://bugs.llvm.org/show_bug.cgi?id=42174 llvm-svn: 362810	2019-06-07 16:09:54 +00:00
Jinsong Ji	7aafdef627	[MachineScheduler] checkResourceLimit boundary condition update When we call checkResourceLimit in bumpCycle or bumpNode, and we know the resource count has just reached the limit (the equations are equal). We should return true to mark that we are resource limited for next schedule, or else we might continue to schedule in favor of latency for 1 more schedule and create a schedule that actually overbook the resource. When we call checkResourceLimit to estimate the resource limite before scheduling, we don't need to return true even if the equations are equal, as it shouldn't limit the schedule for it . Differential Revision: https://reviews.llvm.org/D62345 llvm-svn: 362805	2019-06-07 14:54:47 +00:00
Stefan Stipanovic	128e8e8fb9	test-commit llvm-svn: 362802	2019-06-07 14:18:02 +00:00
Matt Arsenault	94a609e343	TailDuplicator: Remove no-op analyzeBranch call This could fail, which looked concerning. However nothing was actually using the results of this. I assume this was intended to use the anti-feature of analyzeBranch of removing instructions, but wasn't actually calling it with AllowModify = true. Fixes bug 42162. llvm-svn: 362800	2019-06-07 13:33:34 +00:00
Joerg Sonnenberger	b2e96169b0	[NFC] Don't export helpers of ConstantFoldCall llvm-svn: 362799	2019-06-07 13:28:52 +00:00
Nico Weber	d546b5052b	llvm-lib: Disallow mixing object files with different machine types lib.exe doesn't allow creating .lib files with object files that have differing machine types. Update llvm-lib to match. The motivation is to make it possible to infer the machine type of a .lib file in lld, so that it can warn when e.g. a 32-bit .lib file is passed to a 64-bit link (PR38965). Fixes PR38782. Differential Revision: https://reviews.llvm.org/D62913 llvm-svn: 362798	2019-06-07 13:24:34 +00:00
Sanjay Patel	6880bceda2	[x86] narrow extract subvector of vector select This is a potentially large perf win for AVX1 targets because of the way we auto-vectorize to 256-bit but then expect the backend to legalize/optimize for the half-implemented AVX1 ISA. On the motivating example from PR37428 (even though this patch doesn't solve the vector shift issue): https://bugs.llvm.org/show_bug.cgi?id=37428 ...there's a 16% speedup when compiling with "-mavx" (perf tested on Haswell) because we eliminate the remaining 256-bit vblendv ops. I added comments on a couple of tests that require further work. If we have 256-bit logic ops separating the vselect and extract, we should probably narrow everything to 128-bit, but that requires a larger pattern match. Differential Revision: https://reviews.llvm.org/D62969 llvm-svn: 362797	2019-06-07 13:17:46 +00:00
Simon Tatham	5d66f2b0af	[ARM] Fix bugs introduced by the fp64/d32 rework. Change D60691 caused some knock-on failures that weren't caught by the existing tests. Firstly, selecting a CPU that should have had a restricted FPU (e.g. `-mcpu=cortex-m4`, which should have 16 d-regs and no double precision) could give the unrestricted version, because `ARM::getFPUFeatures` returned a list of features including subtracted ones (here `-fp64`,`-d32`), but `ARMTargetInfo::initFeatureMap` threw away all the ones that didn't start with `+`. Secondly, the preprocessor macros didn't reliably match the actual compilation settings: for example, `-mfpu=softvfp` could still set `__ARM_FP` as if hardware FP was available, because the list of features on the cc1 command line would include things like `+vfp4`,`-vfp4d16` and clang didn't realise that one of those cancelled out the other. I've fixed both of these issues by rewriting `ARM::getFPUFeatures` so that it returns a list that enables every FP-related feature compatible with the selected FPU and disables every feature not compatible, which is more verbose but means clang doesn't have to understand the dependency relationships between the backend features. Meanwhile, `ARMTargetInfo::handleTargetFeatures` is testing for all the various forms of the FP feature names, so that it won't miss cases where it should have set `HW_FP` to feed into feature test macros. That in turn caused an ordering problem when handling `-mcpu=foo+bar` together with `-mfpu=something_that_turns_off_bar`. To fix that, I've arranged that the `+bar` suffixes on the end of `-mcpu` and `-march` cause feature names to be put into a separate vector which is concatenated after the output of `getFPUFeatures`. Another side effect of all this is to fix a bug where `clang -target armv8-eabi` by itself would fail to set `__ARM_FEATURE_FMA`, even though `armv8` (aka Arm v8-A) implies FP-Armv8 which has FMA. That was because `HW_FP` was being set to a value including only the `FPARMV8` bit, but that feature test macro was testing only the `VFP4FPU` bit. Now `HW_FP` ends up with all the bits set, so it gives the right answer. Changes to tests included in this patch: * `arm-target-features.c`: I had to change basically all the expected results. (The Cortex-M4 test in there should function as a regression test for the accidental double-precision bug.) * `arm-mfpu.c`, `armv8.1m.main.c`: switched to using `CHECK-DAG` everywhere so that those tests are no longer sensitive to the order of cc1 feature options on the command line. * `arm-acle-6.5.c`: been updated to expect the right answer to that FMA test. * `Preprocessor/arm-target-features.c`: added a regression test for the `mfpu=softvfp` issue. Reviewers: SjoerdMeijer, dmgreen, ostannard, samparker, JamesNagurne Reviewed By: ostannard Subscribers: srhines, javed.absar, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62998 llvm-svn: 362791	2019-06-07 12:42:54 +00:00
Sam Elliott	f720647ddd	[RISCV] Support Bit-Preserving FP in F/D Extensions Summary: This allows some integer bitwise operations to instead be performed by hardware fp instructions. This is correct because the RISC-V spec requires the F and D extensions to use the IEEE-754 standard representation, and fp register loads and stores to be bit-preserving. This is tested against the soft-float ABI, but with hardware float extensions enabled, so that the tests also ensure the optimisation also fires in this case. Reviewers: asb, luismarques Reviewed By: asb Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62900 llvm-svn: 362790	2019-06-07 12:20:14 +00:00
Valery Pykhtin	cb8de55f47	[AMDGPU] Constrain the AMDGPU inliner on maximum number of basic blocks in a caller function (compile time performance) Differential revision: https://reviews.llvm.org/D62917 llvm-svn: 362789	2019-06-07 12:16:46 +00:00
Cullen Rhodes	1f0d251244	[AArch64][AsmParser] error on unexpected SVE predicate type suffix Summary: This patch fixes a bug in the assembler that permitted a type suffix on predicate registers when not expected. For instance, the following was previously valid: faddv h0, p0.q, z1.h This bug was present in all SVE instructions containing predicates with no type suffix and no predication form qualifier, i.e. /z or /m. The latter instructions are already caught with an appropiate error message by the assembler, e.g.: .text <stdin>:1:13: error: not expecting size suffix cmpne p1.s, p0.b/z, z2.s, 0 ^ A similar issue for SVE vector registers was fixed in: https://reviews.llvm.org/D59636 Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62942 llvm-svn: 362780	2019-06-07 08:46:56 +00:00
Cullen Rhodes	f730548484	[AArch64][AsmParser] Provide better diagnostics for SVE predicates Patch by Sander de Smalen (sdesmalen) Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62941 llvm-svn: 362779	2019-06-07 08:37:00 +00:00
Pengfei Wang	f8b28931a7	[X86] -march=cooperlake (llvm) Support intel -march=cooperlake in llvm Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62836 llvm-svn: 362776	2019-06-07 08:31:35 +00:00
Sam Parker	67f9dc60b8	Fix for lld buildbot Removed unused (in non-debug builds) variable. llvm-svn: 362775	2019-06-07 08:04:18 +00:00
Sam Parker	c5ef502ee8	[CodeGen] Generic Hardware Loop Support Patch which introduces a target-independent framework for generating hardware loops at the IR level. Most of the code has been taken from PowerPC CTRLoops and PowerPC has been ported over to use this generic pass. The target dependent parts have been moved into TargetTransformInfo, via isHardwareLoopProfitable, with HardwareLoopInfo introduced to transfer information from the backend. Three generic intrinsics have been introduced: - void @llvm.set_loop_iterations Takes as a single operand, the number of iterations to be executed. - i1 @llvm.loop_decrement(anyint) Takes the maximum number of elements processed in an iteration of the loop body and subtracts this from the total count. Returns false when the loop should exit. - anyint @llvm.loop_decrement_reg(anyint, anyint) Takes the number of elements remaining to be processed as well as the maximum numbe of elements processed in an iteration of the loop body. Returns the updated number of elements remaining. llvm-svn: 362774	2019-06-07 07:35:30 +00:00
Dylan McKay	04b418f246	[AVR] Expand 16-bit rotations during the legalization stage In r356860, the legalization logic for BSWAP was modified to ISD::ROTL, rather than the old ISD::{SHL, SRL, OR} nodes. This works fine on AVR for 8-bit rotations, but 16-bit rotations are currently unimplemented - they always trigger an assertion error in the AVRExpandPseudoInsts pass ("RORW unimplemented"). This patch instructions the legalizer to expand 16-bit rotations into the previous SHL, SRL, OR pattern it did previously. This fixes the 'issue-cannot-select-bswap.ll' test. Interestingly, this test failure seems flaky - it passes successfully on the avr-build-01 buildbot, but fails locally on my Arch Linux install. llvm-svn: 362773	2019-06-07 06:55:00 +00:00
Fangrui Song	c841b9abf0	[MC][ELF] Don't create relocations with section symbols for STB_LOCAL ifunc We should keep the symbol type (STT_GNU_IFUNC) for a local ifunc because it may result in an IRELATIVE reloc that the dynamic loader will use to resolve the address at startup time. There is another problem that is not fixed by this patch: a PC relative relocation should also create a relocation with the ifunc symbol. llvm-svn: 362767	2019-06-07 03:47:22 +00:00
Fangrui Song	19189993c9	[LV] Fix -Wunused-function after r362736 llvm-svn: 362762	2019-06-07 01:48:26 +00:00
Matt Arsenault	c0edb8f5cf	AMDGPU: Don't count mask branch pseudo towards skip threshold llvm-svn: 362761	2019-06-07 00:14:55 +00:00
Matt Arsenault	99ee81b183	AMDGPU: Insert skips for blocks with FLAT This already forced a skip for VMEM, so it should also be done for flat. I'm somewhat skeptical about the benefit of this though. llvm-svn: 362760	2019-06-07 00:14:45 +00:00
Nemanja Ivanovic	ef4a3aa549	[PowerPC] Exploit the vector min/max instructions Use the PPC vector min/max instructions for computing the corresponding operation as these should be faster than the compare/select sequences we currently emit. Differential revision: https://reviews.llvm.org/D47332 llvm-svn: 362759	2019-06-06 23:49:01 +00:00
Matt Arsenault	b6cfa129cc	AMDGPU: Insert skip branches over return blocks SIInsertSkips really doesn't understand the control flow, and makes very stupid assumptions about the block layout. This was able to get away with not skipping return blocks, since usually after structurization there is only one placed at the end of the function. Tail duplication can break this assumption. llvm-svn: 362754	2019-06-06 22:51:51 +00:00
Alexey Lapshin	b9f1e7b16e	[DebugInfo] Incorrect debug info record generated for loop counter. Incorrect Debug Variable Range was calculated while "COMPUTING LIVE DEBUG VARIABLES" stage. Range for Debug Variable("i") computed according to current state of instructions inside of basic block. But Register Allocator creates new instructions which were not taken into account when Live Debug Variables computed. In the result DBG_VALUE instruction for the "i" variable was put after these newly inserted instructions. This is incorrect. Debug Value for the loop counter should be inserted before any loop instruction. Differential Revision: https://reviews.llvm.org/D62650 llvm-svn: 362750	2019-06-06 21:19:39 +00:00
Alexander Timofeev	37bd9bd137	[AMDGPU] Partial revert for the `ba447bae74` "Divergence driven ISel. Assign register class for cross block values according to the divergence." that discovered the design flaw leading to several issues that required to be solved before. This change reverts AMDGPU specific changes and keeps common part unaffected. llvm-svn: 362749	2019-06-06 21:13:02 +00:00
Craig Topper	f320f26716	[X86] Make a bunch of merge masked binops commutable for loading folding. This primarily affects add/fadd/mul/fmul/and/or/xor/pmuludq/pmuldq/max/min/fmaxc/fminc/pmaddwd/pavg. We already commuted the unmasked and zero masked versions. I've added 512-bit stack folding tests for most of the instructions affected. I've tested needing commuting and not commuting across unmasked, merged masked, and zero masked. The 128/256 bit instructions should behave similarly. llvm-svn: 362746	2019-06-06 21:00:04 +00:00
Craig Topper	ca541b20d0	[CFLGraph] Add support for unary fneg instruction. Differential Revision: https://reviews.llvm.org/D62791 llvm-svn: 362737	2019-06-06 19:21:23 +00:00
Renato Golin	9e97caf594	[LV] Wrap LV illegality reporting in a function. NFC. A function for loop vectorization illegality reporting has been introduced: void LoopVectorizationLegality::reportVectorizationFailure( const StringRef DebugMsg, const StringRef OREMsg, const StringRef ORETag, Instruction * const I) const; The function prints a debug message when the debug for the compilation unit is enabled as well as invokes the optimization report emitter to generate a message with a specified tag. The function doesn't cover any complicated logic when a custom lambda should be passed to the emitter, only generating a message with a tag is supported. The function always prints the instruction `I` after the debug message whenever the instruction is specified, otherwise the debug message ends with a dot: 'LV: Not vectorizing: Disabled/already vectorized.' Patch by Pavel Samolysov <samolisov@gmail.com> llvm-svn: 362736	2019-06-06 19:15:52 +00:00
Jason Liu	60ec248148	[AIX] Implement function descriptor on SDAG Summary: (1) Function descriptor on AIX On AIX, a called routine may have 2 distinct symbols associated with it: * A function descriptor (Name) * A function entry point (.Name) The descriptor structure on AIX is the same as those in the ELF V1 ABI: * The address of the entry point of the function. * The TOC base address for the function. * The environment pointer. The descriptor symbol uses the same name as the source level function in C. The function entry point is analogous to the symbol we would generate for a function in a non-descriptor-based ABI, except that it is renamed by prepending a ".". Which symbol gets referenced depends on the context: * Taking the address of the function references the descriptor symbol. * Calling the function references the entry point symbol. (2) Speaking of implementation on AIX, for direct function call target, we create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to replace original TargetGlobalAddress SDNode. Then down the path, we can take advantage of this MCSymbol. Patch by: Xiangling_L Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara Differential Revision: https://reviews.llvm.org/D62532 llvm-svn: 362735	2019-06-06 19:13:36 +00:00
Craig Topper	6cda33ba36	[InlineCost] Add support for unary fneg. This adds support for unary fneg based on the implementation of BinaryOperator without the soft float FP cost. Previously we would just delegate to visitUnaryInstruction. I think the only real change is that we will pass the FastMath flags to SimplifyFNeg now. Differential Revision: https://reviews.llvm.org/D62699 llvm-svn: 362732	2019-06-06 19:02:18 +00:00
Philip Reames	101915cfda	[LoopPred] Fix a bug in unconditional latch bailout introduced in r362284 This is a really silly bug that even a simple test w/an unconditional latch would have caught. I tried to guard against the case, but put it in the wrong if check. Oops. llvm-svn: 362727	2019-06-06 18:02:36 +00:00
Simon Pilgrim	842c7792aa	[DAGCombine] MergeConsecutiveStores - improve non-temporal load\store handling (PR42123) This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores: 1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another. 2 - The merged load\store node must retain the non-temporal flag. Differential Revision: https://reviews.llvm.org/D62910 llvm-svn: 362723	2019-06-06 17:04:13 +00:00
Dmitri Gribenko	5438cc6910	Remove unused PPC.h includes under llvm/lib/Target/PowerPC. llvm-svn: 362718	2019-06-06 16:47:06 +00:00
Craig Topper	6b67dfa54c	[X86] Make masked floating point equality/ordered compares commutable for load folding purposes. Same as what is supported for the unmasked form. llvm-svn: 362717	2019-06-06 16:39:04 +00:00
Whitney Tsang	03e8369a72	[DA] Add an option to control delinearization validity checks Summary: Dependence Analysis performs static checks to confirm validity of delinearization. These checks often fail for 64-bit targets due to type conversions and integer wrapping that prevent simplification of the SCEV expressions. These checks would also fail at compile-time if the lower bound of the loops are compile-time unknown. For example: void foo(int n, int m, int a[][m]) { for (int i = 0; i < n; ++i) for (int j = 0; j < m; ++j) { a[i][j] = a[i+1][j-2]; } } opt -mem2reg -instcombine -indvars -loop-simplify -loop-rotate -inline -pass-remarks=.* -debug-pass=Arguments -da-permissive-validity-checks=false k3.ll -analyze -da will produce the following by default: da analyze - anti [* *\|<]! but will produce the following expected dependence vector if the validity checks are disabled: da analyze - consistent anti [1 -2]! This revision will introduce a debug option that will leave the validity checks in place by default, but allow them to be turned off. New tests are added for cases where it cannot be proven at compile-time that the individual subscripts stay in-bound with respect to a particular dimension of an array. These tests enable the option to provide user guarantee that the subscripts do not over/under-flow into other dimensions, thereby producing more accurate dependence vectors. For prior discussion on this topic, leading to this change, please see the following thread: http://lists.llvm.org/pipermail/llvm-dev/2019-May/132372.html Reviewers: Meinersbur, jdoerfert, kbarton, dmgreen, fhahn Reviewed By: Meinersbur, jdoerfert, dmgreen Subscribers: fhahn, hiraditya, javed.absar, llvm-commits, Whitney, etiotto Tag: LLVM Differential Revision: https://reviews.llvm.org/D62610 llvm-svn: 362711	2019-06-06 15:12:49 +00:00
Jason Liu	0338b88861	[AIX] Implement call lowering with parameters could pass onto GPRs Summary: This patch implements SDAG call lowering on AIX for functions which only have parameters that could fit into GPRs. Reviewers: hubert.reinterpretcast, syzaara Differential Revision: https://reviews.llvm.org/D62823 llvm-svn: 362708	2019-06-06 14:36:43 +00:00
Thomas Preud'homme	71d3f227a7	FileCheck [6/12]: Introduce numeric variable definition Summary: This patch is part of a patch series to add support for FileCheck numeric expressions. This specific patch introduces support for defining numeric variable in a CHECK directive. This commit introduces support for defining numeric variable from a litteral value in the input text. Numeric expressions can then use the variable provided it is on a later line. Copyright: - Linaro (changes up to diff 183612 of revision D55940) - GraphCore (changes in later versions of revision D55940 and in new revision created off D55940) Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield Tags: #llvm Differential Revision: https://reviews.llvm.org/D60386 llvm-svn: 362705	2019-06-06 13:21:06 +00:00
Adhemerval Zanella	559e69a821	AArch64] Handle ISD::LRINT and ISD::LLRINT for float16 This patch is a follow up for D62018 to add lrint/llrint support for float16. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62863 llvm-svn: 362700	2019-06-06 12:38:11 +00:00
Benjamin Kramer	f1249442cf	Revert "[SCEV] Use wrap flags in InsertBinop" This reverts commit r362687. Miscompiles llvm-profdata during selfhost. llvm-svn: 362699	2019-06-06 12:35:46 +00:00
Adhemerval Zanella	bce9e11a7b	[AArch64] Handle ISD::LROUND and ISD::LLROUND for float16 This patch is a follow up for D61391 to add lround/llround support for float16. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62861 llvm-svn: 362698	2019-06-06 11:53:26 +00:00
Dmitri Gribenko	8c2c072582	Include what you use in LanaiAsmParser.cpp llvm-svn: 362696	2019-06-06 10:37:06 +00:00
Simon Pilgrim	da993d08c8	[DAGCombine] Cleanup isNegatibleForFree/GetNegatedExpression. NFCI. Prep work for PR42105 - clang-format, use auto for cast and merge nested if()s llvm-svn: 362695	2019-06-06 10:21:18 +00:00
Petar Avramovic	81132ce0e9	[MIPS GlobalISel] Select sqrt Select G_FSQRT for MIPS32. Differential Revision: https://reviews.llvm.org/D62905 llvm-svn: 362692	2019-06-06 10:00:41 +00:00
Petar Avramovic	0a1fd355b2	[MIPS GlobalISel] Select fabs Select G_FABS for MIPS32. Differential Revision: https://reviews.llvm.org/D62903 llvm-svn: 362690	2019-06-06 09:22:37 +00:00
Petar Avramovic	a7d0006447	[MIPS GlobalISel] Select fpext and fptrunc Select G_FPEXT and G_FPTRUNC for MIPS32. Differential Revision: https://reviews.llvm.org/D62902 llvm-svn: 362689	2019-06-06 09:16:58 +00:00
Petar Avramovic	faaa2b5d21	[MIPS GlobalISel] Select floor and ceil Select G_FFLOOR and G_FCEIL for MIPS32. Differential Revision: https://reviews.llvm.org/D62901 llvm-svn: 362688	2019-06-06 09:02:24 +00:00
Sam Parker	7cc580f5e9	[SCEV] Use wrap flags in InsertBinop If the given SCEVExpr has no (un)signed flags attached to it, transfer these to the resulting instruction or use them to find an existing instruction. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 362687	2019-06-06 08:56:26 +00:00
Amara Emerson	d3144a4abc	[AArch64][GlobalISel] Add manual selection support for G_ZEXTLOADs to s64. We already get support for G_ZEXTLOAD to s32 from the importer, but it can't deal with the SUBREG_TO_REG in the pattern. Tweaking the existing manual selection code for G_LOAD to handle an additional SUBREG_TO_REG when dealing with G_ZEXTLOAD isn't much work. Also add tests to check the imported pattern selections to s32 work. llvm-svn: 362681	2019-06-06 07:58:37 +00:00
Amara Emerson	d940e20051	[AArch64][GlobalISel] Add the new changes to fix PR42129 that were supposed to go into r362666. The changes weren't staged so ended up just re-commiting the unmodified reverted change. llvm-svn: 362677	2019-06-06 07:33:47 +00:00
Craig Topper	9226ba6b37	[X86] Don't turn avx masked.load with constant mask into masked.load+vselect when passthru value is all zeroes. This is intended to enable the use of an immediate blend or more optimal instruction. But if the passthru is zero we don't need any additional instructions. llvm-svn: 362675	2019-06-06 05:41:27 +00:00
Amara Emerson	c37ff0d138	Revert "Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp"" When looking through copies, make sure to not try to find the vreg def of a physreg. Normally getVRegDef will return nullptr in this case, but if there happens to be multiple defs then it will assert. This fixes PR42129. llvm-svn: 362666	2019-06-05 23:46:16 +00:00
Matt Arsenault	34c8b835b1	AMDGPU: Don't fix emergency stack slot at offset 0 This forced the caller to be aware of this, which is an ugly ABI feature. Partially reverts r295877. The original reasons for doing this are mostly fixed. Alloca is now in a non-0 address space, so it should be OK to have 0 as a valid pointer. Since we treat the absolute address as the pointer value, this part only really needed to apply to kernels. Since r357093, we avoid the need to increment/decrement the offset register in more cases, and since r354816 the scavenger can fail without spilling, so it's less critical that we try to avoid an offset that fits in the MUBUF offset. Restrict to callable functions for now to split this into 2 steps to limit thte number of test updates and in case anything breaks. llvm-svn: 362665	2019-06-05 22:37:50 +00:00
Cameron McInally	c72fbe5dc1	[MSAN] Add unary FNeg visitor to the MemorySanitizer Differential Revision: https://reviews.llvm.org/D62909 llvm-svn: 362664	2019-06-05 22:37:05 +00:00
Ulrich Weigand	6c5d5ce551	Allow target to handle STRICT floating-point nodes The ISD::STRICT_ nodes used to implement the constrained floating-point intrinsics are currently never passed to the target back-end, which makes it impossible to handle them correctly (e.g. mark instructions are depending on a floating-point status and control register, or mark instructions as possibly trapping). This patch allows the target to use setOperationAction to switch the action on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code will stop converting the STRICT nodes to regular floating-point nodes, but instead pass the STRICT nodes to the target using normal SelectionDAG matching rules. To avoid having the back-end duplicate all the floating-point instruction patterns to handle both strict and non-strict variants, we make the MI codegen explicitly aware of the floating-point exceptions by introducing two new concepts: - A new MCID flag "mayRaiseFPException" that the target should set on any instruction that possibly can raise FP exception according to the architecture definition. - A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI instruction resulting from expansion of any constrained FP intrinsic. Any MI instruction that is both marked as mayRaiseFPException and FPExcept then needs to be considered as raising exceptions by MI-level codegen (e.g. scheduling). Setting those two new flags is straightforward. The mayRaiseFPException flag is simply set via TableGen by marking all relevant instruction patterns in the .td files. The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes in the SelectionDAG, and gets inherited in the MachineSDNode nodes created from it during instruction selection. The flag is then transfered to an MIFlag when creating the MI from the MachineSDNode. This is handled just like fast-math flags like no-nans are handled today. This patch includes both common code changes required to implement the new features, and the SystemZ implementation. Reviewed By: andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D55506 llvm-svn: 362663	2019-06-05 22:33:10 +00:00
Petr Hosek	2f94203e23	Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp" This reverts commit r362435 as this triggers ICE, see PR42129 for details. llvm-svn: 362662	2019-06-05 22:27:31 +00:00
Matt Arsenault	b812b7a45e	AMDGPU: Invert frame index offset interpretation Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661	2019-06-05 22:20:47 +00:00
Mircea Trofin	e3eeacd70a	[CallSite removal] Refactoring llvm::InlineFunction APIs Summary: This change only unifies the API previous API pair accepting CallInst and InvokeInst, thus making it easier to refactor inliner pass ode to CallBase. The implementation of the unified API still relies on the CallSite implementation. Reviewers: eraman, chandlerc, jdoerfert Reviewed By: jdoerfert Subscribers: jdoerfert, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62283 llvm-svn: 362656	2019-06-05 21:28:13 +00:00
Sanjay Patel	ac111e526d	[InstCombine] simplify code for bitcast of insertelement; NFC llvm-svn: 362655	2019-06-05 21:26:52 +00:00
Matt Arsenault	663d762c9a	NewGVN: Handle addrspacecast The AllConstant check needs to be moved out of the if/else if chain to avoid a test regression. The "there is no SimplifyZExt" comment puzzles me, since there is SimplifyCastInst. Additionally, the Simplify* calls seem to not see the operand as constant, so this needs to be tried if the simplify failed. llvm-svn: 362653	2019-06-05 21:15:52 +00:00
Craig Topper	3975b15dba	[X86] Fix mistake that marked VADDSSrrb_Int/VADDSDrrb_Int/VMULSSrrb_Int/VMULSDrrb_Int as commutable. One of the sources controls the pass through value for the upper bits of the result so we can't really commute it. In practice this problem isn't a functional issue because we would only try to commute this instruction in order to fold a load. But we can't do embedded rounding and fold a load at the same time. So the load fold would never succeed so I don't think we would ever commute or at least keep the version after commuting. llvm-svn: 362647	2019-06-05 21:00:31 +00:00
Whitney Tsang	2d0896c1cb	[LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, and loop induction variable. Summary: This PR extends the loop object with more utilities to get loop bounds, step, and loop induction variable. There already exists passes which try to obtain the loop induction variable in their own pass, e.g. loop interchange. It would be useful to have a common area to get these information. /// Example: /// for (int i = lb; i < ub; i+=step) /// <loop body> /// --- pseudo LLVMIR --- /// beforeloop: /// guardcmp = (lb < ub) /// if (guardcmp) goto preheader; else goto afterloop /// preheader: /// loop: /// i1 = phi[{lb, preheader}, {i2, latch}] /// <loop body> /// i2 = i1 + step /// latch: /// cmp = (i2 < ub) /// if (cmp) goto loop /// exit: /// afterloop: /// /// getBounds /// getInitialIVValue --> lb /// getStepInst --> i2 = i1 + step /// getStepValue --> step /// getFinalIVValue --> ub /// getCanonicalPredicate --> '<' /// getDirection --> Increasing /// getInductionVariable --> i1 /// getAuxiliaryInductionVariable --> {i1} /// isCanonical --> false Reviewers: kbarton, hfinkel, dmgreen, Meinersbur, jdoerfert, syzaara, fhahn Reviewed By: kbarton Subscribers: tvvikram, bmahjour, etiotto, fhahn, jsji, hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D60565 llvm-svn: 362644	2019-06-05 20:42:47 +00:00
Tim Northover	8d7f118ab2	InstCombine: correctly change byval type attribute alongside call args. When the byval attribute has a type, it must match the pointee type of any parameter; but InstCombine was not updating the attribute when folding casts of various kinds away. llvm-svn: 362643	2019-06-05 20:38:17 +00:00
Tim Northover	607c8a9d14	IR: make getParamByValType Just Work. NFC. Most parts of LLVM don't care whether the byval type is derived from an explicit Attribute or from the parameter's pointee type, so it makes sense for the main access function to just return the right value. The very few users who do care (only BitcodeReader so far) can find out how it's specified by accessing the Attribute directly. llvm-svn: 362642	2019-06-05 20:37:47 +00:00
Matt Arsenault	4fb580c314	AMDGPU: Remove amdgpu-max-work-group-size attribute This has been deprecated for a long time, and mesa recently switched to amdgpu-flat-work-group-size. llvm-svn: 362641	2019-06-05 20:32:32 +00:00
Matt Arsenault	0f8a764e8f	AMDGPU: Fix using 2 different enums for same operand flags These enums are really for the same namespace of flags set on arbitrary MachineOperands, so merge them to avoid value collisions. llvm-svn: 362640	2019-06-05 20:32:25 +00:00
Dan Gohman	53572d0470	[WebAssembly] Limit PIC support to the Emscripten target The current PIC support currently only works with Emscripten, so disable it for other targets. This is the PIC portion of https://reviews.llvm.org/D62542. Reviewed By: dschuff, sbc100 llvm-svn: 362638	2019-06-05 20:01:01 +00:00
Craig Topper	d0fff89b81	[X86] Add the vector integer min/max instructions to isAssociativeAndCommutative. As far as I know these should be freely reassociatable just like the floating point MAXC/MINC instructions. The reduce test changes are largely regressions and caused by the "generic" CPU we default to not having a scheduler model. The machine-combiner-int-vec.ll test shows the positive benefits of this change. Differential Revision: https://reviews.llvm.org/D62787 llvm-svn: 362629	2019-06-05 18:25:09 +00:00
Simon Pilgrim	77d6adc491	Fix shadow local variable warning. NFCI. llvm-svn: 362622	2019-06-05 17:26:29 +00:00
Sanjay Patel	2bf82879bd	[x86] split more 256-bit stores of concatenated vectors As suggested in D62498 - collectConcatOps() matches both concat_vectors and insert_subvector patterns, and we see more test improvements by using the more general match. llvm-svn: 362620	2019-06-05 16:40:57 +00:00
Simon Pilgrim	de586bd1fd	[X86][AVX] Generalize split256BitStore to splitVectorStore. NFCI. Enables us to use this to split 512-bit vectors in future patches. llvm-svn: 362617	2019-06-05 16:14:14 +00:00
Whitney Tsang	590b1aee60	Revert "Title: [LOOPINFO] Extend Loop object to add utilities to get the loop" This reverts commit `d34797dfc2`. llvm-svn: 362615	2019-06-05 15:32:56 +00:00
Dinar Temirbulatov	15c657d13d	[SLP] Fix regression in broadcasts caused by operand reordering patch D59973. This patch fixes a regression caused by the operand reordering refactoring patch https://reviews.llvm.org/D59973 . The fix changes the strategy to Splat instead of Opcode, if broadcast opportunities are found. Please see the lit test for some examples. Committed on behalf of @vporpo (Vasileios Porpodas) Differential Revision: https://reviews.llvm.org/D62427 llvm-svn: 362613	2019-06-05 15:26:28 +00:00
Sanjay Patel	ad62a3a299	[LoopUtils][SLPVectorizer] clean up management of fast-math-flags Instead of passing around fast-math-flags as a parameter, we can set those using an IRBuilder guard object. This is no-functional-change-intended. The motivation is to eventually fix the vectorizers to use and set the correct fast-math-flags for reductions. Examples of that not behaving as expected are: https://bugs.llvm.org/show_bug.cgi?id=23116 (should be able to reduce with less than 'fast') https://bugs.llvm.org/show_bug.cgi?id=35538 (possible miscompile for -0.0) D61802 (should be able to reduce with IR-level FMF) Differential Revision: https://reviews.llvm.org/D62272 llvm-svn: 362612	2019-06-05 14:58:04 +00:00
Benjamin Kramer	b90b354798	[LoopInfo] Fix unused variable warning. NFC. llvm-svn: 362610	2019-06-05 14:43:58 +00:00
Whitney Tsang	d34797dfc2	Title: [LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, and loop induction variable. Summary: This PR extends the loop object with more utilities to get loop bounds, step, and loop induction variable. There already exists passes which try to obtain the loop induction variable in their own pass, e.g. loop interchange. It would be useful to have a common area to get these information. /// Example: /// for (int i = lb; i < ub; i+=step) /// <loop body> /// --- pseudo LLVMIR --- /// beforeloop: /// guardcmp = (lb < ub) /// if (guardcmp) goto preheader; else goto afterloop /// preheader: /// loop: /// i1 = phi[{lb, preheader}, {i2, latch}] /// <loop body> /// i2 = i1 + step /// latch: /// cmp = (i2 < ub) /// if (cmp) goto loop /// exit: /// afterloop: /// /// getBounds /// getInitialIVValue --> lb /// getStepInst --> i2 = i1 + step /// getStepValue --> step /// getFinalIVValue --> ub /// getCanonicalPredicate --> '<' /// getDirection --> Increasing /// getInductionVariable --> i1 /// getAuxiliaryInductionVariable --> {i1} /// isCanonical --> false Reviewers: kbarton, hfinkel, dmgreen, Meinersbur, jdoerfert, syzaara, fhahn Reviewed By: kbarton Subscribers: tvvikram, bmahjour, etiotto, fhahn, jsji, hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D60565 llvm-svn: 362609	2019-06-05 14:34:12 +00:00
Petar Avramovic	22e99c434f	[MIPS GlobalISel] Select fcmp Select floating point compare for MIPS32. Differential Revision: https://reviews.llvm.org/D62721 llvm-svn: 362603	2019-06-05 14:03:13 +00:00
Sjoerd Meijer	a1bb4fb79d	[ARM] Allow "-march=foo+fp" to vary with foo This is the LLVM part of this change, the Clang part contains the full description in its commit message. Differential Revision: https://reviews.llvm.org/D60697 llvm-svn: 362600	2019-06-05 13:11:51 +00:00
Simon Pilgrim	886a55eaa0	[X86][AVX] combineX86ShuffleChain - combine shuffle(extractsubvector(x),extractsubvector(y)) We already handle the case where we combine shuffle(extractsubvector(x),extractsubvector(x)), this relaxes the requirement to permit different sources as long as they have the same value type. This causes a couple of cases where the VPERMV3 binary shuffles occur at a wider width than before, which I intend to improve in future commits - but as only the subvector's mask indices are defined, these will broadcast so we don't see any increase in constant size. llvm-svn: 362599	2019-06-05 12:56:53 +00:00
Simon Pilgrim	5a81af547c	[TargetLowering] SimplifyDemandedBits - pull out shift value type. NFCI. Will be used more in an upcoming patch. llvm-svn: 362595	2019-06-05 10:59:04 +00:00
Simon Pilgrim	db134aaec2	[IPO] Disabled 'default only' switch statements to fix MSVC warnings. @jdoerfert Looks like these are placeholders for incoming abstract attributes patches so I've just commented the code out, even though this is usually frowned upon. llvm-svn: 362592	2019-06-05 10:04:05 +00:00
Dmitri Gribenko	6fc4c1cc54	Include what you use in PPCFrameLowering.h llvm-svn: 362590	2019-06-05 08:58:00 +00:00
Yevgeny Rouban	a3e16719c4	Resubmit "[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst" This reverts commit `5b32f60ec3`. The fix is in commit `4f9e68148b`. This patch fixes the CorrelatedValuePropagation pass to keep prof branch_weights metadata of SwitchInst consistent. It makes use of SwitchInstProfUpdateWrapper. New tests are added. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D62126 llvm-svn: 362583	2019-06-05 05:46:40 +00:00
Michael Liao	fa449a9bb2	Suppress false-positive GCC -Wreturn-type warning. llvm-svn: 362582	2019-06-05 04:18:12 +00:00
Johannes Doerfert	aade782a98	[Attributor] Pass infrastructure and fixpoint framework NOTE: Note that no attributes are derived yet. This patch will not go in alone but only with others that derive attributes. The framework is split for review purposes. This commit introduces the Attributor pass infrastructure and fixpoint iteration framework. Further patches will introduce abstract attributes into this framework. In a nutshell, the Attributor will update instances of abstract arguments until a fixpoint, or a "timeout", is reached. Communication between the Attributor and the abstract attributes that are derived is restricted to the AbstractState and AbstractAttribute interfaces. Please see the file comment in Attributor.h for detailed information including design decisions and typical use case. Also consider the class documentation for Attributor, AbstractState, and AbstractAttribute. Reviewers: chandlerc, homerdin, hfinkel, fedor.sergeev, sanjoy, spatel, nlopes, nicholas, reames Subscribers: mehdi_amini, mgorny, hiraditya, bollu, steven_wu, dexonsmith, dang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59918 llvm-svn: 362578	2019-06-05 03:02:24 +00:00
Nemanja Ivanovic	7c842fadf1	[PowerPC] Collapse RLDICL/RLDICR into RLDIC when possible Generally speaking, we lower to an optimal rotate sequence for nodes visible in the SDAG. However, there are instances where the two rotates are not visible at ISEL time - most notably those in a very common sequence when lowering switch statements to jump tables. A common situation is a switch on a 32-bit integer. This value has to have the upper 32 bits cleared and because jump table offsets are word offsets, the value needs to be shifted left by 2 bits. We currently emit the clear and the left shift as two separate instructions, but this is not needed as we can lower it to a single RLDIC. This patch just cleans that up. Differential revision: https://reviews.llvm.org/D60402 llvm-svn: 362576	2019-06-05 02:36:40 +00:00
Fangrui Song	f090e6f7b6	[llvm-objdump/llvm-readobj/obj2yaml/yaml2obj] Support DT_PPC_GOT and DT_PPC_OPT In glibc, DT_PPC_GOT indicates that PowerPC32 Secure PLT ABI is used. I plan to use it in D62464. DT_PPC_OPT currently indicates if a TLSDESC inspired TLS optimization is enabled. Reviewed By: grimar, jhenderson, rupprecht Differential Revision: https://reviews.llvm.org/D62851 llvm-svn: 362569	2019-06-05 01:36:48 +00:00
Nemanja Ivanovic	fe97754acf	Initial support for IBM MASS vector library This is the LLVM portion of patch https://reviews.llvm.org/D59881. The clang portion is to follow. llvm-svn: 362568	2019-06-05 01:31:43 +00:00
Craig Topper	78fdce25a1	[X86] Cleanup convertIntLogicToFPLogic a little. NFCI -Use early returns to reduce indentation -Replace multipe ifs with a switch. -Replace an assert with an llvm_unreachable default in the switch. -Check that the FP type we're going to use for the X86ISD::FAND/FOR/FXOR is legal rather than checking that the integer type matches the width of a legal scalar fp type. This all runs after legalization so it shouldn't really matter, but making sure we're using a valid type in the X86ISD node is really whats important. llvm-svn: 362565	2019-06-05 01:00:34 +00:00
Cameron McInally	5c7245b830	[Scalarizer] Add UnaryOperator visitor to scalarization pass Differential Revision: https://reviews.llvm.org/D62858 llvm-svn: 362558	2019-06-04 23:01:36 +00:00
Amara Emerson	2d37cb82f0	[AArch64][GlobalISel] Make extloads to i64 legal. Although we had the support in the prelegalizer combiner to generate the G_SEXTLOAD or G_ZEXTLOAD ops, the legalizer definitions for arm64 had them as lowering back to separate ops. llvm-svn: 362553	2019-06-04 21:51:34 +00:00
Thomas Lively	3d9ca00e74	[WebAssembly] Fix ISel crash on sext_inreg/extract type mismatch Summary: Adjusts the index and adds a bitcast around the vector operand of EXTRACT_VECTOR_ELT so that its lane type matches the source type of its parent sext_inreg. Without this bitcast the ISel patterns do not match and ISel fails. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62646 llvm-svn: 362547	2019-06-04 21:08:20 +00:00
Johannes Doerfert	6b432dca5d	[SelectionDAG][FIX] Allow "returned" arguments to be bit-casted Summary: An argument that is return by a function but bit-casted before can still be annotated as "returned". Make sure we do not crash for this case. Reviewers: sunfish, stephenwlin, niravd, arsenm Subscribers: wdng, hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59917 llvm-svn: 362546	2019-06-04 20:34:43 +00:00
Johannes Doerfert	40107ce753	Introduce Value::stripPointerCastsSameRepresentation This patch allows current users of Value::stripPointerCasts() to force the result of the function to have the same representation as the value it was called on. This is useful in various cases, e.g., (non-)null checks. In this patch only a single call site was adjusted to fix an existing misuse that would cause nonnull where they may be wrong. Uses in attribute deduction and other areas, e.g., D60047, are to be expected. For a discussion on this topic, please see [0]. [0] http://lists.llvm.org/pipermail/llvm-dev/2018-December/128423.html Reviewers: hfinkel, arsenm, reames Subscribers: wdng, hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61607 llvm-svn: 362545	2019-06-04 20:21:46 +00:00
Nico Weber	1dce82636c	llvm-undname: Correctly demangle vararg parameters FunctionSignatureNode already had an IsVariadic field, but it wasn't used anywhere yet. Set it and use it. llvm-svn: 362541	2019-06-04 19:10:08 +00:00
Nico Weber	4638548468	llvm-undname: More coverage-related cleanups - The loop in demangleFunctionParameterList() only exits on Error, @, and Z. All 3 cases were handled, so the rest of the function is DEMANGLE_UNREACHABLE. - The loop in demangleTemplateParameterList() always returns on Error, so there's no need to check for that in the loop header and after the loop. - Add test cases for invalid function parameter manglings. - Add a (redundant) test case for a simple template parameter list mangling. - Add a test case pointing out that varargs functions aren't demangled correctly. llvm-svn: 362540	2019-06-04 18:49:05 +00:00
Nemanja Ivanovic	aed7227b71	Revert r362472 as it is breaking PPC build bots The patch https://reviews.llvm.org/rL362472 broke PPC LNT buildbots. Reverting it to bring the bots back to green. llvm-svn: 362539	2019-06-04 18:48:43 +00:00
Alina Sbirlea	bfceed49ce	[Utils] Clean another duplicated util method. Summary: Following the cleanup in D48202, method foldBlockIntoPredecessor has the same behavior. Replace its uses with MergeBlockIntoPredecessor. Remove foldBlockIntoPredecessor. Reviewers: chandlerc, dmgreen Subscribers: jlebar, javed.absar, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62751 llvm-svn: 362538	2019-06-04 18:45:15 +00:00
Nico Weber	878df1c2a9	llvm-undname: Add test coverage for demangleInitFiniStub() llvm-svn: 362536	2019-06-04 18:06:28 +00:00
Craig Topper	137de38009	[X86] Mutate fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE during PreProcessIselDAG to cut down on pattern permutations We already need to have patterns for X86ISD::RNDSCALE to support software intrinsics. But we currently have 5 sets of patterns for the 5 rounding operations. For of these 6 patterns we have to support 3 vectors widths, 2 element sizes, sse/vex/evex encodings, load folding, and broadcast load folding. This results in a fair amount of bytes in the isel table. This patch adds code to PreProcessIselDAG to morph the fceil/ffloor/ftrunc/fnearbyint/frint to X86ISD::RNDSCALE. This way we can remove everything, but the intrinsic pattern while still allowing the operations to be considered Legal for DAGCombine and Legalization. This shrinks the DAGISel by somewhere between 9K and 10K. There is one complication to this, the STRICT versions of these nodes are currently mutated to their none strict equivalents at isel time when the node is visited. This won't be true in the future since that loses the chain ordering information. For now I've also added support for the non-STRICT nodes to Select so we can change the STRICT versions there after they've been mutated to their non-STRICT versions. We'll probably need a STRICT version of RNDSCALE or something to handle this in the future. Which will take us back to needing 2 sets of patterns for strict and non-strict, but that's still better than the 11 or 12 sets of patterns we'd need. We can probably do something similar for scalar, but I haven't looked at it yet. Differential Revision: https://reviews.llvm.org/D62757 llvm-svn: 362535	2019-06-04 18:03:07 +00:00
Benjamin Kramer	03ff1b3c30	[X86] Fold single-use variable into assert. NFC. Avoids an unused variable warning in Release builds. llvm-svn: 362534	2019-06-04 18:01:07 +00:00
Craig Topper	09a4415803	[DAGCombiner][X86] Fold (not (neg X)) -> (add X, -1) This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial. Fixes PR42118 Differential Revision: https://reviews.llvm.org/D62828 llvm-svn: 362533	2019-06-04 17:44:18 +00:00
Alex Brachet	c33944832c	[MACHO] Replaced calls to getStruct with getStructOrErr in functions returning Error or Expected or similar llvm-svn: 362526	2019-06-04 16:55:30 +00:00
Sanjay Patel	606eb2367f	[x86] split 256-bit store of concatenated vectors This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is a reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 362524	2019-06-04 16:40:04 +00:00
Peter Smith	f15e3d856f	[AArch64][ELF] Add support for PLT decoding with BTI instructions present Arm Architecture v8.5a introduces Branch Target Identification (BTI). When enabled all indirect branches must target a bti instruction of the appropriate form. As PLT sequences may sometimes be the target of an indirect branch and PLT[0] always is, a static linker may need to generate PLT sequences that contain "bti c" as the first instruction. In effect: bti c adrp x16, page offset to .got.plt ... Instead of: adrp x16, page offset to .got.plt ... At present the PLT decoding assumes the adrp will always be the first instruction. This patch adds support for a single "bti c" to prefix it. A test binary has been uploaded with such a PLT sequence. A forthcoming LLD patch will make heavy use of the PLT decoding code. Differential Revision: https://reviews.llvm.org/D62598 llvm-svn: 362523	2019-06-04 16:35:40 +00:00
Nico Weber	d98a0a362f	llvm-undname: Yet more coverage for error paths - For error returns in demangleSpecialTableNode(), demangleLocalStaticGuard(), RTTITypeDescriptor, demangleRttiBaseClassDescriptorNode(), demangleUnsigned(), demangleUntypedVariable() (via RttiBaseClassArray) - For ?_A and ?_P which are handled at early levels of the demangler but are not implemented in a later stage; this is now more obvious - Replace a "default:" with an explicit list of cases, to get -Wswitch check we list all cases llvm-svn: 362520	2019-06-04 16:25:28 +00:00
Nikita Popov	df621bdfc8	[LVI][CVP] Add support for urem, srem and sdiv The underlying ConstantRange functionality has been added in D60952, D61207 and D61238, this just exposes it for LVI. I'm switching the code from using a whitelist to a blacklist, as we're down to one unsupported operation here (xor) and writing it this way seems more obvious :) Differential Revision: https://reviews.llvm.org/D62822 llvm-svn: 362519	2019-06-04 16:24:09 +00:00
Nico Weber	c1a0e6fe6b	llvm-undname: More no-op changes to increase test coverage - Add test coverage around invalid anon namespaces and for error paths in demanglePrimitiveType() and in demangleFullyQualifiedTypeName() - Use DEMANGLE_UNREACHABLE in two more unreachable places llvm-svn: 362514	2019-06-04 15:38:00 +00:00
Jinsong Ji	3144d7a2da	[PowerPC] P9 Scheduling Model: dispatching rule fixes This is to address some of the problems in existing P9 resource modeling, especially about the dispatching rules. Instead of using a hypothetical DISPATCHER , we try to use the number of actual dispatch slots, and define SchedWriteRes to model dispatch rules, then update instruction classes according to dispatch rules. All the dispatch rules and instruction classes update are made according to POWER9 User Manual. Differential Revision: https://reviews.llvm.org/D61873 llvm-svn: 362509	2019-06-04 15:22:23 +00:00
Sanjay Patel	1e63dd0b44	[SelectionDAG][x86] limit post-legalization store merging by type The proposal in D62498 showed that x86 would benefit from vector store splitting, but that may conflict with the generic DAG combiner's store merging transforms. Add memory type to the existing TLI hook that enables the merging transforms, so we can limit those changes to scalars only for x86. llvm-svn: 362507	2019-06-04 15:15:59 +00:00
Nico Weber	880d21d3cb	llvm-undname: Several behavior-preserving changes to increase coverage - Replace `Error = true` in a few branches that are truly unreachable with DEMANGLE_UNREACHABLE - Remove early return early in startsWithLocalScopePattern() because it's redundant with the next two early returns - Remove unreachable `case '0'` (it's handled in the branch below) - Remove an unused bool return - Add test coverage for several early error returns, mostly in array type parsing llvm-svn: 362506	2019-06-04 15:13:30 +00:00
Simon Pilgrim	a6e289e9f8	[X86][SSE] Pulled out (sub (xor X, M), M) 'ConditionalNegate' out pattern match code. NFCI. As discussed on D62777 - we should be able to use this in more SSE41+ cases as well but that requires us to separate it from the OR(AND(),ANDN()) matcher. llvm-svn: 362504	2019-06-04 15:02:33 +00:00
Shawn Landden	669775f9db	[Support] make countLeadingZeros() countTrailingZeros() countLeadingOnes() and countTrailingOnes() return unsigned This matches APInt's versions of these functions, and there is no need for these to be size_t. (as well as __builtin_clzll()) Differential Revision: https://reviews.llvm.org/D60823 llvm-svn: 362503	2019-06-04 14:51:15 +00:00
Dmitri Gribenko	454fc77872	Include what you use in PPCRegisterInfo.cpp llvm-svn: 362495	2019-06-04 12:55:00 +00:00
Peter Smith	49d7221f71	[AArch64][ELF][llvm-readobj] Add support for BTI and PAC dynamic tags ELF for the 64-bit Arm Architecture defines two processor-specific dynamic tags: DT_AARCH64_BTI_PLT 0x70000001, d_val DT_AARCH64_PAC_PLT 0x70000003, d_val These presence of these tags indicate that PLT sequences have been protected using Branch Target Identification and Pointer Authentication respectively. The presence of both indicates that the PLT sequences have been protected with both Branch Target Identification and Pointer Authentication. This patch adds the tags and tests for llvm-readobj and yaml2obj. As some of the processor specific dynamic tags overlap, this patch splits them up, keeping their original default value if they were not previously mentioned explicitly in a switch case. Differential Revision: https://reviews.llvm.org/D62596 llvm-svn: 362493	2019-06-04 11:44:33 +00:00
Roman Lebedev	3dce0326fe	[DAGCombine][X86][AArch64][MIPS][LANAI] (C - x) - y -> C - (x + y) fold (PR41952) Summary: This might be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet. As far as i can tell, there are no regressions here (ignoring x86-32), all changes are either good or neutral. This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`) `@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/vMd3 Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma Reviewed By: RKSimon Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62774 llvm-svn: 362488	2019-06-04 11:06:21 +00:00
Roman Lebedev	be6ce7b3f2	[DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold Summary: All changes except ARM look great. https://rise4fun.com/Alive/R2M The regression `test/CodeGen/ARM/addsubcarry-promotion.ll` is recovered fully by D62392 + D62450. Reviewers: RKSimon, craig.topper, spatel, rogfer01, efriedma Reviewed By: efriedma Subscribers: dmgreen, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62266 llvm-svn: 362487	2019-06-04 11:06:08 +00:00
Simon Pilgrim	ad298f86b7	[SelectionDAG] ComputeNumSignBits - support constant pool values from target As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits. The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select. It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup. Differential Revision: https://reviews.llvm.org/D62777 llvm-svn: 362486	2019-06-04 10:49:06 +00:00
Simon Pilgrim	3178546a27	[SelectionDAG] ComputeNumSignBits - clang-format + improve *EXTLOAD comments. NFCI. Pre-commit requested for D62777. llvm-svn: 362485	2019-06-04 10:17:56 +00:00
Owen Reynolds	5d5078e341	[llvm-ar] Reapply Fix relative thin archive path handling Includes a fix for an introduced build failure due to a post c++11 use of std::mismatch. This fixes some thin archive relative path issues, paths are shortened where possible and paths are output correctly when using the display table command. Differential Revision: https://reviews.llvm.org/D59491 llvm-svn: 362484	2019-06-04 10:13:03 +00:00
Simon Pilgrim	3018d505a3	[SelectionDAG] Add fpto[us]i(undef) --> undef constant fold Follow up to D62807. Differential Revision: https://reviews.llvm.org/D62811 llvm-svn: 362483	2019-06-04 10:04:55 +00:00
Mikhail Maltsev	08da01b496	[ARM] Add FP16 vector insert/extract patterns This change adds two FP16 extraction and two insertion patterns (one per possible vector length). Extractions are handled by copying a Q/D register into one of VFP2 class registers, where single FP32 sub-registers can be accessed. Then the extraction of even lanes are simple sub-register extractions (because we don't care about the top parts of registers for FP16 operations). Odd lanes need an additional VMOVX instruction. Unfortunately, insertions cannot be handled in the same way, because: * There is no instruction to insert FP16 into an even lane (VINS only works with odd lanes) * The patterns for odd lanes will have a form of a DAG (not a tree), and will not be implementable in pure tablegen Because of this insertions are handled in the same way as 16-bit integer insertions (with conversions between FP registers and GPRs using VMOVHR instructions). Without these patterns the ARM backend would sometimes fail during instruction selection. This patch also adds patterns which combine: * an FP16 element extraction and a store into a single VST1 instruction * an FP16 load and insertion into a single VLD1 instruction Differential Revision: https://reviews.llvm.org/D62651 llvm-svn: 362482	2019-06-04 09:39:55 +00:00
Dmitri Gribenko	63846039f5	Silenced a warning "implicit conversion turns string literal into bool" introduced in r362473 llvm-svn: 362480	2019-06-04 09:31:07 +00:00
Dmitri Gribenko	73a15d4b78	Include what you use in PPC.h llvm-svn: 362477	2019-06-04 09:16:35 +00:00
Dmitri Gribenko	067a17b51d	Include what you use in PPCMachineScheduler.cpp llvm-svn: 362476	2019-06-04 09:16:31 +00:00
Dmitri Gribenko	9d1c5ea165	Include what you use in PPCRegisterInfo.h llvm-svn: 362475	2019-06-04 09:13:08 +00:00
Yevgeny Rouban	4f9e68148b	Make SwitchInstProfUpdateWrapper safer While prof branch_weights inconsistencies are being fixed patch by patch (pass by pass) we need SwitchInstProfUpdateWrapper to be safe with respect to inconsistent metadata that can come from passes that have not been fixed yet. See the bug found by @nikic in https://reviews.llvm.org/D62126. This patch introduces one more state (called Invalid) to the wrapper class that allows users to work with the underlying SwitchInst ignoring the prof metadata changes. Created a unit test for the SwitchInstProfUpdateWrapper class. Reviewers: davidx, nikic, eraman, reames, chandlerc Reviewed By: davidx Differential Revision: https://reviews.llvm.org/D62656 llvm-svn: 362473	2019-06-04 09:03:39 +00:00
QingShan Zhang	11de0e71b0	[DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D61843 llvm-svn: 362472	2019-06-04 08:53:53 +00:00
Simon Tatham	ac02445524	[ARM] Turn some undefined encoding bits into 0s. The family of 32-bit Thumb instruction encodings that include t2ORR, t2AND and t2EOR are all listed in the ArmARM as having (0) in bit 15. The Tablegen descriptions of those instructions listed them as ?. This change tightens that up by making them into 0 + Unpredictable. In the specific case of t2ORR, we tighten it up still further by making the zero bit mandatory. This change comes from Arm v8.1-M, in which encodings with that bit equal to 1 will now be used for different instructions. Reviewers: dmgreen, samparker, SjoerdMeijer, efriedma Reviewed By: dmgreen, efriedma Subscribers: efriedma, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60705 llvm-svn: 362470	2019-06-04 08:28:48 +00:00
Cameron McInally	89f9af5487	[SCCP] Add UnaryOperator visitor to SCCP for unary FNeg Differential Revision: https://reviews.llvm.org/D62819 llvm-svn: 362449	2019-06-03 21:53:56 +00:00
Michael Berg	6ff978ee05	Propagate fmf for setcc in SDAG for select folds llvm-svn: 362448	2019-06-03 21:53:26 +00:00
Matt Arsenault	0ceda9fb5c	AMDGPU: Disable stack realignment for kernels This is something of a workaround, and the state of stack realignment controls is kind of a mess. Ideally, we would be able to specify the stack is infinitely aligned on entry to a kernel. TargetFrameLowering provides multiple controls which apply at different points. The StackRealignable field is used during SelectionDAG, and for some reason distinct from this hook. StackAlignment is a single field not dependent on the function. It would probably be better to make that dependent on the calling convention, and the maximum value for kernels. Currently this doesn't really change anything, since the frame lowering mostly does its own thing. This helps avoid regressions in a future change which will rely more heavily on hasFP. llvm-svn: 362447	2019-06-03 21:33:22 +00:00
Jessica Paquette	7500c97ce4	[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp Instead of emitting all of the test stuff for a compare when it's only used by a select, instead, just emit the compare + select. The select will use the value of NZCV correctly, so we don't need to emit all of the test instructions etc. For now, only support fp selects which use G_FCMP. Also only support condition codes which will only require one select to represent. Also add a test. Differential Revision: https://reviews.llvm.org/D62695 llvm-svn: 362446	2019-06-03 20:47:20 +00:00
George Burgess IV	c24a2f4ad9	CFLAA: reflow comments; NFC llvm-svn: 362442	2019-06-03 19:56:22 +00:00
Craig Topper	7a4eabef39	[CFLGraph] Add FAdd to visitConstantExpr. This looks like an oversight as all the other binary operators are present. Accidentally noticed while auditing places that need FNeg handling. No test because as noted in the review it would be contrived and amount to "don't crash" Differential Revision: https://reviews.llvm.org/D62790 llvm-svn: 362441	2019-06-03 19:35:52 +00:00
Craig Topper	dcf865f0ca	[X86] Fix the pattern for merge masked vcvtps2pd. r362199 fixed it for zero masking, but not zero masking. The load folding in the peephole pass hid the bug. This patch turns off the peephole pass on the relevant test to ensure coverage. llvm-svn: 362440	2019-06-03 19:29:14 +00:00
Michael Berg	0b7f98da65	Propagate fmf for setcc/select folds Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG. Reviewers: qcolombet, spatel Reviewed By: qcolombet Subscribers: nemanjai, jsji Differential Revision: https://reviews.llvm.org/D62552 llvm-svn: 362439	2019-06-03 19:12:15 +00:00
Nemanja Ivanovic	bad43d8f49	[PowerPC] Look through copies for compare elimination We currently miss the opportunities for optmizing comparisons in the peephole optimizer if the input is the result of a COPY since we look for record-form versions of the producing instruction. This patch simply lets the optimization peek through copies. Differential revision: https://reviews.llvm.org/D59633 llvm-svn: 362438	2019-06-03 19:09:15 +00:00
Matt Arsenault	8dbeb9256c	TTI: Improve default costs for addrspacecast For some reason multiple places need to do this, and the variant the loop unroller and inliner use was not handling it. Also, introduce a new wrapper to be slightly more precise, since on AMDGPU some addrspacecasts are free, but not no-ops. llvm-svn: 362436	2019-06-03 18:41:34 +00:00
Nikita Popov	c061b99c5b	[ConstantRange] Add sdiv() support The implementation is conceptually simple: We separate the LHS and RHS into positive and negative components and then also compute the positive and negative components of the result, taking into account that e.g. only pos/pos and neg/neg will give a positive result. However, there's one significant complication: SignedMin / -1 is UB for sdiv, and we can't just ignore it, because the APInt result of SignedMin would break the sign segregation. Instead we drop SignedMin or -1 from the corresponding ranges, taking into account some edge cases with wrapped ranges. Because of the sign segregation, the implementation ends up being nearly fully precise even for wrapped ranges (the remaining imprecision is due to ranges that are both signed and unsigned wrapping and are divided by a trivial divisor like 1). This means that the testing cannot just check the signed envelope as we usually do. Instead we collect all possible results in a bitvector and construct a better sign wrapped range (than the full envelope). Differential Revision: https://reviews.llvm.org/D61238 llvm-svn: 362430	2019-06-03 18:19:54 +00:00
Andrew Kaylor	4172dbab5d	Fix a crash when the default of a switch is removed This patch fixes a problem that occurs in LowerSwitch when a switch statement has a PHI node as its condition, and the PHI node only has two incoming blocks, and one of those incoming blocks is through an unreachable default in the switch statement. When this condition occurs, LowerSwitch holds a pointer to the condition value, but removes the switch block as a predecessor of the PHI block, causing the PHI node to be replaced. LowerSwitch then tries to use its stale pointer to the original condition value, causing a crash. Differential Revision: https://reviews.llvm.org/D62560 llvm-svn: 362427	2019-06-03 17:54:15 +00:00
Dmitri Gribenko	26c43d0ef8	Include what you use in Lanai.h Other files were not relying on these transitive includes, so I'm submitting this change separately. llvm-svn: 362423	2019-06-03 17:02:15 +00:00
Dmitri Gribenko	b8aeaf882e	Include what you use in LanaiAsmPrinter.cpp llvm-svn: 362422	2019-06-03 17:02:07 +00:00
Dmitri Gribenko	dc136847e3	Include what you use in LanaiMemAluCombiner.cpp llvm-svn: 362421	2019-06-03 17:02:02 +00:00
Dmitri Gribenko	f4d22bd0b4	Include what you use in LanaiISelDAGToDAG.cpp llvm-svn: 362420	2019-06-03 17:01:57 +00:00
Dmitri Gribenko	179154f6b9	Include what you use in LanaiFrameLowering.{cpp,h} llvm-svn: 362419	2019-06-03 17:01:52 +00:00
Dmitri Gribenko	8e317e29da	Include what you use in LanaiRegisterInfo.cpp llvm-svn: 362416	2019-06-03 16:31:37 +00:00
Philip Reames	9ed1673703	[LoopPred] Convert a second member function to a static helper [NFC] (And remember to actually mark the first one static.) llvm-svn: 362415	2019-06-03 16:23:20 +00:00
Dmitri Gribenko	857de979a7	Revert "[llvm-ar] Fix relative thin archive path handling" This reverts commit r362407. It broke compilation of llvm/lib/Object/ArchiveWriter.cpp: error: type 'llvm::sys::path::const_iterator' does not provide a call operator llvm-svn: 362413	2019-06-03 16:21:37 +00:00
Nemanja Ivanovic	009d08f313	[PowerPC] Set PROT_READ flag for MF_EXEC to prevent segfaults on PPC machines The big endian PPC buildbots are all failing now due to calls to cache invalidation in unit tests on data that has only the PROT_EXEC flag set. This has been an issue all along on FreeBSD but it can affect Linux machines depending on configuration. This patch mitigates the issue the same way it is mitigated on FreeBSD. Since this is needed to bring the buildbots back to green, I plan to commit this and allow for post-commit review, but I thought I would also post it here for ease of access/readability. Differential revision: https://reviews.llvm.org/D62741 llvm-svn: 362412	2019-06-03 16:20:59 +00:00
Philip Reames	0912b06f78	[LoopPred] Convert member function to free helper function [NFC] llvm-svn: 362411	2019-06-03 16:17:14 +00:00
Dmitri Gribenko	bedcaea99a	Include what you use in LanaiInstrInfo.cpp llvm-svn: 362408	2019-06-03 15:26:25 +00:00
Owen Reynolds	fade9cbed7	[llvm-ar] Fix relative thin archive path handling This fixes some thin archive relative path issues, paths are shortened where possible and paths are output correctly when using the display table command. Differential Revision: https://reviews.llvm.org/D59491 llvm-svn: 362407	2019-06-03 15:26:07 +00:00
Dmitri Gribenko	b3bd866c7f	Include what you use in PPCInstrInfo.h llvm-svn: 362405	2019-06-03 15:04:05 +00:00
Dmitri Gribenko	2b369f83c5	Include what you use in NVPTX.h Other files were not relying on these transitive includes, so I'm submitting this change separately. llvm-svn: 362403	2019-06-03 14:37:26 +00:00
Dmitri Gribenko	14c69fefe6	Include what you use in NVPTX.h I also fixed all other files that were including NVPTX.h and were relying on transitive includes. llvm-svn: 362402	2019-06-03 14:26:50 +00:00
Dmitry Preobrazhensky	9111f35f02	[AMDGPU][MC] Added support of SCC, VCCZ and EXECZ operands See bug 39292: https://bugs.llvm.org/show_bug.cgi?id=39292 Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D62660 llvm-svn: 362400	2019-06-03 13:51:24 +00:00
Simon Pilgrim	cb7e4e8193	[SelectionDAG] Add [us]itofp(undef) --> 0 constant fold (PR39205) We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction Differential Revision: https://reviews.llvm.org/D62807 llvm-svn: 362397	2019-06-03 13:02:07 +00:00
Dmitri Gribenko	7a3e4ab286	Include what you use in LanaiInstPrinter.cpp llvm-svn: 362395	2019-06-03 12:53:05 +00:00
Dmitri Gribenko	2f66316c96	Include what you use in LanaiMCCodeEmitter.cpp LanaiMCCodeEmitter.cpp was not using any APIs from Lanai.h, and was only including it for transitive dependencies. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Lanai target library and the MCTargetDesc library). llvm-svn: 362394	2019-06-03 12:42:48 +00:00
Dmitri Gribenko	c69ee63cb9	Include what you use in LanaiDisassembler.cpp llvm-svn: 362392	2019-06-03 12:37:11 +00:00
Nicolai Haehnle	edfa756f3f	AMDGPU/GFX10: V_CMPX_xxx instructions still have an omod operand Summary: Change-Id: If6ee98e4a723b643bc37254fc6ef8b3812db16da Reviewers: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62720 Change-Id: Id547ef152b2f92b24dc1c0efbf7e4467c4fb4b6e llvm-svn: 362390	2019-06-03 12:07:41 +00:00
Dmitri Gribenko	8668fc0102	Include what you use in HexagonInstPrinter.cpp HexagonInstPrinter.cpp was not using any APIs from HexagonAsmPrinter.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362389	2019-06-03 11:41:22 +00:00
Dmitri Gribenko	61b49ccb77	Include what you use in HexagonAsmPrinter.h llvm-svn: 362388	2019-06-03 11:41:18 +00:00
Dmitri Gribenko	03d1b33041	Include what you use in HexagonMCInstrInfo.cpp HexagonMCInstrInfo.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362387	2019-06-03 11:25:37 +00:00
Dmitri Gribenko	970b9f961f	Include what you use in HexagonMCCodeEmitter.cpp HexagonMCCodeEmitter.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362386	2019-06-03 11:20:53 +00:00
Dmitri Gribenko	ebe360edfa	Include what you use in HexagonMCCompound.cpp HexagonMCCompound.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362385	2019-06-03 11:20:48 +00:00
Dmitri Gribenko	6e076a081a	Include what you use in HexagonShuffler.cpp HexagonShuffler.cpp was not using any APIs from Hexagon.h, and was only including it for transitive dependencies. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362384	2019-06-03 11:14:20 +00:00
Dmitri Gribenko	6214b577b7	Include what you use in HexagonMCChecker.cpp HexagonMCChecker.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362383	2019-06-03 11:14:15 +00:00
Dmitri Gribenko	bf2a356ec0	Include what you use in HexagonMCTargetDesc.cpp HexagonMCTargetDesc.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362382	2019-06-03 11:14:10 +00:00
Dmitri Gribenko	beb7f48a29	Include what you use in HexagonMCShuffler.cpp HexagonMCShuffler.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362381	2019-06-03 11:14:05 +00:00
Simon Tatham	dc83a3c449	[ARM] Fix recent breakage of -mfpu=none. The recent change D60691 introduced a bug in clang when handling option combinations such as `-mcpu=cortex-m4 -mfpu=none`. Those options together should select Cortex-M4 but disable all use of hardware FP, but in fact, now hardware FP instructions can still be generated in that mode. The reason is because the handling of FPUVersion::NONE disables all the same feature names it used to, of which the base one is `vfp2`. But now there are further features below that, like `vfp2d16fp` and (following D60694) `fpregs`, which also need to be turned off to disable hardware FP completely. Added a tiny test which double-checks that compiling a simple FP function doesn't access the FP registers. Reviewers: SjoerdMeijer, dmgreen Reviewed By: dmgreen Subscribers: lebedev.ri, javed.absar, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62729 llvm-svn: 362380	2019-06-03 11:02:53 +00:00
Dmitri Gribenko	7ebfbebfe1	Include what you use in HexagonELFObjectWriter.cpp HexagonELFObjectWriter.cpp was not using any APIs from Hexagon.h, and was only including it for transitive dependencies. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362376	2019-06-03 09:56:40 +00:00
Nikola Prica	2d0106a110	[LiveDebugValues] Close range for previous variable's location when adding newly deduced location When LiveDebugValues deduces new variable's location from spill, restore or register copy instruction it should close old variable's location. Otherwise we can have multiple block output locations for same variable. That could lead to inserting two DBG_VALUEs for same variable to the beginning of the successor block which results to ignoring of first DBG_VALUE. Reviewers: aprantl, jmorse, wolfgangp, dstenb Reviewed By: aprantl Subscribers: probinson, asowda, ivanbaev, petarj, djtodoro Tags: #debug-info Differential Revision: https://reviews.llvm.org/D62196 llvm-svn: 362373	2019-06-03 09:48:29 +00:00
Dmitri Gribenko	0aa374a306	Include what you use in HexagonAsmBackend.cpp HexagonAsmBackend.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362372	2019-06-03 09:43:05 +00:00
Dmitri Gribenko	301f8fd632	Include what you use in HexagonAsmParser.cpp HexagonAsmParser.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the AsmParser library). llvm-svn: 362370	2019-06-03 09:38:48 +00:00
Dmitri Gribenko	c5327ab71d	Include what you use in HexagonShuffler.h HexagonShuffler.h was not using any APIs from Hexagon.h, and was only including it for transitive dependencies. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362369	2019-06-03 09:33:48 +00:00
Dmitri Gribenko	3c837201e0	Include what you use in BPFMCTargetDesc.cpp BPFMCTargetDesc.cpp was not using any APIs from BPF.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary BPF target library and the MCTargetDesc library). llvm-svn: 362368	2019-06-03 09:29:51 +00:00
Diogo N. Sampaio	df92f84110	[ARM][FIX] Ran out of registers due tail recursion Summary: - pr42062 When compiling for MinSize, ARMTargetLowering::LowerCall decides to indirect multiple calls to a same function. However, it disconsiders the limitation that thumb1 indirect calls require the callee to be in a register from r0 to r3 (llvm limiation). If all those registers are used by arguments, the compiler dies with "error: run out of registers during register allocation". This patch tells the function IsEligibleForTailCallOptimization if we intend to perform indirect calls, as to avoid tail call optimization. Reviewers: dmgreen, efriedma Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62683 llvm-svn: 362366	2019-06-03 08:58:05 +00:00
Sam Parker	a0bd6f8a1a	[AArch64] Check for simple type in FPToUInt DAGCombiner was hitting a SimpleType assertion when trying to combine a v3f32 before type legalization. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41916 Differential Revision: https://reviews.llvm.org/D62734 llvm-svn: 362365	2019-06-03 08:49:17 +00:00
Jim Lin	20b14dacbb	[AVR] Fix incorrect source regclass of LDWRdPtr Summary: LDWRdPtr would be expanded to ld+ldd. ldd only accepts the pointer register is Y or Z. So the register class of pointer of LDWRdPtr should be PTRDISPREGS instead of PTRREGS. Reviewers: dylanmckay Reviewed By: dylanmckay Subscribers: dylanmckay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62300 llvm-svn: 362351	2019-06-03 02:31:07 +00:00
Florian Hahn	e71963c850	Recommit r360171: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor. If we hit the limit, we do expand the outstanding tokenfactors. Otherwise, we might drop nodes with users in the unexpanded tokenfactors. This fixes the crashes reported by Jordan Rupprecht. Reviewers: niravd, spatel, craig.topper, rupprecht Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D62633 llvm-svn: 362350	2019-06-03 01:30:19 +00:00
Nico Weber	54362477c7	llvm-undname; Add more test coverage for demangleFunctionClass() Also add two FC_Far that seem to be missing, by symmetry from the public and protected cases. (But FC_Far isn't really a thing anymore, so this doesn't really have an observable effect.) llvm-svn: 362344	2019-06-02 23:26:57 +00:00
Craig Topper	50b35caf30	[DAGCombiner][X86] Fold away masked store and scatter with all zeroes mask. Similar to what was done for masked load and gather. llvm-svn: 362342	2019-06-02 22:52:38 +00:00
Craig Topper	5f79d74946	[X86] Add test cases for masked store and masked scatter with an all zeroes mask. Fix bug in ScalarizeMaskedMemIntrin Need to cast only to Constant instead of ConstantVector to allow ConstantAggregateZero. llvm-svn: 362341	2019-06-02 22:52:34 +00:00
Simon Pilgrim	8a32ca381d	[CostModel][X86] Improve masked load/store AVX1/AVX2 costs A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range. e.g. SandyBridge defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>; defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>; defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; e.g. Btver2 defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>; defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>; defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>; defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>; Differential Revision: https://reviews.llvm.org/D61257 llvm-svn: 362338	2019-06-02 20:37:02 +00:00
Craig Topper	a7bc31ebc6	[DAGCombiner] Replace masked loads with a zero mask with the passthru value Similar to what was recently done for gathers in r362015. llvm-svn: 362337	2019-06-02 18:58:46 +00:00
Simon Pilgrim	59a8db628b	[TTI][X86] Cleanup getMaskedMemoryOpCost. NFCI. Prep work before resurrecting D61257. llvm-svn: 362335	2019-06-02 18:06:42 +00:00
Nico Weber	b5cd6163f4	Remove code path that's dead after r358835 llvm-svn: 362333	2019-06-02 17:41:07 +00:00
Simon Pilgrim	71a39bcf68	[X86] isHorizontalBinOp - add extract_subvector(shuffle(x)) handling (PR39921) Let's us match horizontal op patterns on fast-variable-shuffle targets (Haswell etc.) llvm-svn: 362327	2019-06-02 15:47:49 +00:00
Simon Pilgrim	7a869e7036	[DAGCombine] Fold insert_subvector(bitcast(x),bitcast(y),c1) -> bitcast(insert_subvector(x,y),c2) Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize. Differential Revision: https://reviews.llvm.org/D59188 llvm-svn: 362324	2019-06-02 14:42:11 +00:00
Simon Pilgrim	ffb4d2bff7	[DAG] isBitwiseNot / isConstOrConstSplat - add support for build vector undefs + truncation (PR41020) Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot. PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe. Differential Revision: https://reviews.llvm.org/D62783 llvm-svn: 362323	2019-06-02 11:56:39 +00:00
Simon Pilgrim	88522ce388	[TargetLowering] SimplifyDemandedBits - don't use OriginalDemanded variables in analysis. These might have been replaced in multiple use cases. llvm-svn: 362322	2019-06-02 10:12:55 +00:00
Simon Pilgrim	30a6caa3e7	[TargetLowering] SimplifyDemandedVectorElts - use same arg names as SimplifyDemandedBits. NFCI. Helps with debugging as we recurse between them. llvm-svn: 362321	2019-06-02 10:03:56 +00:00
Craig Topper	f58ef87bb7	[DAGCombiner] Replace two unchecked dyn_casts with casts. The results of the dyn_casts were immediately dereferenced on the next line so they had better not be null. I don't think there's any way for these dyn_casts to fail, so use a cast of adding null check. llvm-svn: 362315	2019-06-02 03:31:01 +00:00
Craig Topper	78c794a70b	[X86] Fix several places that weren't passing what they though they were to MachineInstr::print Over a year ago, MachineInstr gained a fourth boolean parameter that occurs before the TII pointer. When this happened, several places started accidentally passing TII into this boolean parameter instead of the TII parameter. llvm-svn: 362312	2019-06-02 01:36:48 +00:00
Craig Topper	396a915c26	[X86] Add the SSE versions of PMULLW and PMULLD to isAssociativeAndCommutative. llvm-svn: 362309	2019-06-02 00:42:58 +00:00
Nikita Popov	900578d1c1	[SimplifyIndVar] Refactor overflow check elimination code; NFC Extract a willNotOverflow() helper function that is shared between eliminateOverflowIntrinsic() and strengthenOverflowingOperation(). Use WithOverflowInst for the former. We'll be able to reuse the same code for saturating intrinsics as well. llvm-svn: 362305	2019-06-01 20:21:53 +00:00
Craig Topper	7cebf0af40	[InlineCost] Don't add the soft float function call cost for the fneg idiom, fsub -0.0, %x Summary: Fneg can be implemented with an xor rather than a function call so we don't need to add the function call overhead. This was pointed out in D62699 Reviewers: efriedma, cameron.mcinally Reviewed By: efriedma Subscribers: javed.absar, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62747 llvm-svn: 362304	2019-06-01 19:40:07 +00:00
Andrea Di Biagio	6a989c358c	[MCA][Scheduler] Change how memory instructions are dispatched to the pending set. NFCI llvm-svn: 362302	2019-06-01 15:22:37 +00:00
Simon Atanasyan	25694e0084	[mips] Extend range of register indexes accepted by cfcmsa/ctcmsa The `cfcmsa` and `ctcmsa` instructions accept index of MSA control register. The MIPS64 SIMD Architecture define eight MSA control registers. But register index for `cfcmsa` and `ctcmsa` instructions might be any number in 0..31 range. If the index is greater then 7, `cfcmsa` writes zero to the destination registers and `ctcmsa` does nothing [1]. [1] MIPS Architecture for Programmers Volume IV-j: The MIPS64 SIMD Architecture Module https://www.mips.com/?do-download=the-mips64-simd-architecture-module Differential Revision: https://reviews.llvm.org/D62597 llvm-svn: 362299	2019-06-01 13:55:18 +00:00
Dylan McKay	45eb4c7e55	[AVR] Disable register coalescing to the PTRDISPREGS class If we would allow register coalescing on PTRDISPREGS class then register allocator can lock Z register to some virtual register. Larger instructions requiring a memory acces then fail during the register allocation phase since there is no available register to hold a pointer if Y register was already taken for a stack frame. This patch prevents it by keeping Z register spillable. It does it by not allowing coalescer to lock it. Original discussion on https://github.com/avr-rust/rust/issues/128. llvm-svn: 362298	2019-06-01 12:38:56 +00:00
Nikita Popov	46d4dba6e6	[IndVarSimplify] Fixup nowrap flags during LFTR (PR31181) Fix for https://bugs.llvm.org/show_bug.cgi?id=31181 and partial fix for LFTR poison handling issues in general. When LFTR moves a condition from pre-inc to post-inc, it may now depend on value that is poison due to nowrap flags. To avoid this, we clear any nowrap flag that SCEV cannot prove for the post-inc addrec. Additionally, LFTR may switch to a different IV that is dynamically dead and as such may be arbitrarily poison. This patch will correct nowrap flags in some but not all cases where this happens. This is related to the adoption of IR nowrap flags for the pre-inc addrec. (See some of the switch_to_different_iv tests, where flags are not dropped or insufficiently dropped.) Finally, there are likely similar issues with the handling of GEP inbounds, but we don't have a test case for this yet. Differential Revision: https://reviews.llvm.org/D60935 llvm-svn: 362292	2019-06-01 09:40:18 +00:00
Dylan McKay	038e3b9f57	Extend the DWARFExpression address handling to support 16-bit addresses This allows the DWARFExpression class to handle addresses without crashing on targets with 16-bit pointers like AVR. This is required in order to generate assembly from clang via the '-S' flag. This fixes an error with the following message: clang: llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h:132: llvm::DWARFExpression::DWARFExpression(llvm::DataExtractor, uint16_t, uint8_t): Assertion `AddressSize == 8 \|\| AddressSize == 4' failed. llvm-svn: 362290	2019-06-01 09:18:26 +00:00
Craig Topper	c288a19bb7	[X86] Add AVX512BF16 and AVX512VP2INTERSECT instructions to the loading folding tables. llvm-svn: 362288	2019-06-01 06:20:59 +00:00
Craig Topper	48fdb61766	[X86] Make the X86FoldTablesEmitter functional again. Fix the spacing in the output to make it easier to diff. Fix a few other formatting issues in the manual table. And remove some old FIXMEs. llvm-svn: 362287	2019-06-01 06:20:55 +00:00
Nick Desaulniers	b380846a12	[RuntimeDyld] fix too-small-bitmask error Summary: This was flagged in https://www.viva64.com/en/b/0629/ under "Snippet No. 33". It seems that this statement is doing the standard bitwise trick for adjusting a value to have a specific alignment. The issue is that getStubAlignment() returns an unsigned, while DataSize is declared a uint64_t. The right hand side of the expression is not extended to 64b before bitwise negation, resulting in the top half of the mask being 0s, which is not correct for realignment. Reviewers: lhames, MaskRay Reviewed By: MaskRay Subscribers: RKSimon, MaskRay, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62227 llvm-svn: 362286	2019-06-01 04:51:26 +00:00
Richard Trieu	4e875464df	Inline variable into assert to fix unused variable warning. llvm-svn: 362285	2019-06-01 03:32:20 +00:00
Philip Reames	19afdf74bb	[LoopPred] Eliminate a redundant/confusing cover function [NFC] llvm-svn: 362284	2019-06-01 03:09:28 +00:00
Philip Reames	099eca832e	[LoopPred] Handle a subset of NE comparison based latches At the moment, LoopPredication completely bails out if it sees a latch of the form: %cmp = icmp ne %iv, %N br i1 %cmp, label %loop, label %exit OR %cmp = icmp ne %iv.next, %NPlus1 br i1 %cmp, label %loop, label %exit This is unfortunate since this is exactly the form that LFTR likes to produce. So, go ahead and recognize simple cases where we can. For pre-increment loops, we leverage the fact that LFTR likes canonical counters (i.e. those starting at zero) and a (presumed) range fact on RHS to discharge the check trivially. For post-increment forms, the key insight is in remembering that LFTR had to insert a (N+1) for the RHS. CVP can hopefully prove that add nsw/nuw (if there's appropriate range on N to start with). This leaves us both with the post-inc IV and the RHS involving an nsw/nuw add, and SCEV can discharge that with no problem. This does still need to be extended to handle non-one steps, or other harder patterns of variable (but range restricted) starting values. That'll come later. Differential Revision: https://reviews.llvm.org/D62748 llvm-svn: 362282	2019-06-01 00:31:58 +00:00
Eli Friedman	d8e8722791	[CodeGen] Fix hashing for MO_ExternalSymbol MachineOperands. We were hashing the string pointer, not the string, so two instructions could be identical (isIdenticalTo), but have different hash codes. This showed up as a very rare, non-deterministic assertion failure rehashing a DenseMap constructed by MachineOutliner. So there's no "real" testcase, just a unittest which checks that the hash function behaves correctly. I'm a little scared fixing this is going to cause a regression in outlining or MachineCSE, but hopefully we won't run into any issues. Differential Revision: https://reviews.llvm.org/D61975 llvm-svn: 362281	2019-06-01 00:08:54 +00:00
Tom Tan	eb4d6142dc	[COFF, ARM64] Add CodeView register mapping CodeView has its own register map which is defined in cvconst.h. Missing this mapping before saving register to CodeView causes debugger to show incorrect value for all register based variables, like variables in register and local variables addressed by register (stack pointer + offset). This change added mapping between LLVM register and CodeView register so the correct register number will be stored to CodeView/PDB, it aso fixed the mapping from CodeView register number to register name based on current CPUType but print PDB to yaml still assumes X86 CPU and needs to be fixed. Differential Revision: https://reviews.llvm.org/D62608 llvm-svn: 362280	2019-05-31 23:43:31 +00:00
Nick Desaulniers	7fcad2f171	[PowerPC] check for INLINEASM_BR along w/ INLINEASM Summary: It looks like since INLINEASM_BR was created off of INLINEASM (r353563), a few checks for INLINEASM needed to be updated to check for either case. pr/41999 Reviewers: hfinkel Reviewed By: hfinkel Subscribers: nemanjai, hiraditya, kbarton, jsji, llvm-commits, craig.topper, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62403 llvm-svn: 362278	2019-05-31 23:02:13 +00:00
Reid Kleckner	eddd6c25b5	[codeview] Revert inline line table change of r362264 Testing with debuggers shows that our previous behavior was correct. The reason I thought MSVC did things differently is that MSVC prefers to use the 0xB combined code offset and code length update opcode when inline sites are discontiguous. Keep the test changes, and update the llvm-pdbutil inline line table dumper to account for this new interpretation of the opcodes. llvm-svn: 362277	2019-05-31 22:55:03 +00:00
Matt Arsenault	302eedcbfa	AMDGPU: Fix not adding ImplicitBufferPtr as a live-in Fixes missing test from r293000. llvm-svn: 362275	2019-05-31 22:47:36 +00:00
Erik Pilkington	abb2a93c53	[SimplifyLibCalls] Fold more fortified functions into non-fortified variants When the object size argument is -1, no checking can be done, so calling the _chk variant is unnecessary. We already did this for a bunch of these functions. rdar://50797197 Differential revision: https://reviews.llvm.org/D62358 llvm-svn: 362272	2019-05-31 22:41:36 +00:00
Erik Pilkington	5234921119	NFC: Pull out a function to reduce some duplication Part of https://reviews.llvm.org/D62358 llvm-svn: 362271	2019-05-31 22:41:31 +00:00
Craig Topper	bc9e04d0c3	[SelectionDAG] Make the code in mutateStrictFPToFP less aware of how many operands each node has. NFCI Just copy all of the operands except the chain and call MorphNode on that. This removes the IsUnary and IsTernary flags. Also always get the result type from the result type of the original nodes. Previously we got it from the operand except for two nodes where that didn't work. llvm-svn: 362269	2019-05-31 22:18:45 +00:00
Nick Desaulniers	103bd108a7	[RegisterCoalescer] fix potential use of undef value. NFC Summary: Fixes a warning produced from scan-build (llvm.org/reports/scan-build/), further warnings found by annotation isMoveInstr [[nodiscard]]. isMoveInstr potentially does not assign to its parameters, so if they were uninitialized, they will potentially stay uninitialized. It seems most call sites pass references to uninitialized values, then use them without checking the return value. Reviewers: wmi Reviewed By: wmi Subscribers: MatzeB, qcolombet, hiraditya, tpr, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62109 llvm-svn: 362265	2019-05-31 21:20:13 +00:00
Reid Kleckner	e98cf5fe47	[codeview] Fix inline line table accuracy for discontiguous segments After improving the inline line table dumper in llvm-pdbutil and looking at MSVC's inline line tables, it is clear that setting the length of the inlined code region does not update the code offset. This means that the delta to the beginning of a new discontiguous inlined code region should be calculated relative to the last code offset, excluding the length. Implementing this is a one line fix for MC: simply don't update LastLabel. While I'm updating these test cases, switch them to use llvm-objdump -d and llvm-pdbutil. This allows us to show offsets of each instruction and correlate the line table offsets to the actual code. llvm-svn: 362264	2019-05-31 20:55:31 +00:00
Nikita Popov	7bafae55c0	Reapply [CVP] Simplify non-overflowing saturating add/sub If we can determine that a saturating add/sub will not overflow based on range analysis, convert it into a simple binary operation. This is a sibling transform to the existing with.overflow handling. Reapplying this with an additional check that the saturating intrinsic has integer type, as LVI currently does not support vector types. Differential Revision: https://reviews.llvm.org/D62703 llvm-svn: 362263	2019-05-31 20:48:26 +00:00
Nikita Popov	23a02f6a5f	[CVP] Fix assertion failure on vector with.overflow Noticed on D62703. LVI only handles plain integers, not vectors of integers. This was previously not an issue, because vector support for with.overflow is only a relatively recent addition. llvm-svn: 362261	2019-05-31 20:42:07 +00:00
Craig Topper	c669629e6c	[X86] Resync Host.cpp with compiler-rt's cpu_model.c to enable 0x55 to be identified as cascadelake when avx512vnni is detected. Some other formatting changes. llvm-svn: 362256	2019-05-31 19:18:07 +00:00
Nikita Popov	ccb63e0bfe	Revert "[CVP] Simplify non-overflowing saturating add/sub" This reverts commit `1e692d1777`. Causes assertion failure in builtins-wasm.c clang test. llvm-svn: 362254	2019-05-31 19:04:47 +00:00
Puyan Lotfi	3ea6b24f41	[MIR-Canon] Don't do vreg skip for independent instructions if there are none. We don't want to create vregs if there is nothing to use them for. That causes verifier errors. Differential Revision: https://reviews.llvm.org/D62740 llvm-svn: 362247	2019-05-31 17:34:25 +00:00
Nikita Popov	1e692d1777	[CVP] Simplify non-overflowing saturating add/sub If we can determine that a saturating add/sub will not overflow based on range analysis, convert it into a simple binary operation. This is a sibling transform to the existing with.overflow handling. Differential Revision: https://reviews.llvm.org/D62703 llvm-svn: 362242	2019-05-31 16:46:05 +00:00
Kevin P. Neal	ac79007205	Revert revert of r362112 with minor SystemZ test file corrections. [FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully. Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Cameron McInally, Kevin P. Neal Approved by: Cameron McInally Differential Revision: https://reviews.llvm.org/D62546 llvm-svn: 362241	2019-05-31 16:32:12 +00:00
Stanislav Mekhanoshin	fbbe5230f4	[AMDGPU] Use InliningThresholdMultiplier for inline hint AMDGPU uses multiplier 9 for the inline cost. It is taken into account everywhere except for inline hint threshold. As a result we are penalizing functions with the inline hint making them less probable to be inlined than those without the hint. Defaults are 225 for a normal function and 325 for a function with an inline hint. Currently we have effective threshold 225 * 9 = 2025 for normal functions and just 325 for those with the hint. That is fixed by this patch. Differential Revision: https://reviews.llvm.org/D62707 llvm-svn: 362239	2019-05-31 16:19:26 +00:00
Guozhi Wei	c3a24e93d5	[PPC] Correctly adjust branch probability in PPCReduceCRLogicals In PPCReduceCRLogicals after splitting the original MBB into 2, the 2 impacted branches still use original branch probability. This is unreasonable. Suppose we have following code, and the probability of each successor is 50%. condc = conda \|\| condb br condc, label %target, label %fallthrough It can be transformed to following, br conda, label %target, label %newbb newbb: br condb, label %target, label %fallthrough Since each branch has a probability of 50% to each successor, the total probability to %fallthrough is 25% now, and the total probability to %target is 75%. This actually changed the original profiling data. A more reasonable probability can be set to 70% to the false side for each branch instruction, so the total probability to %fallthrough is close to 50%. This patch assumes the branch target with two incoming edges have same edge frequency and computes new probability fore each target, and keep the total probability to original targets unchanged. Differential Revision: https://reviews.llvm.org/D62430 llvm-svn: 362237	2019-05-31 16:11:17 +00:00
Jinsong Ji	18e7bf5c4d	[MachinePipeliner][NFC] Add some debug log and statistics This is to add some log and statistics for debugging Differential Revision: https://reviews.llvm.org/D62165 llvm-svn: 362233	2019-05-31 15:35:19 +00:00
Russell Gallop	802c9b59d5	ftime-trace: Trace loop passes These can take a significant amount of time in some builds. Suggested by Andrea Di Biagio. Differential Revision: https://reviews.llvm.org/D62666 llvm-svn: 362219	2019-05-31 10:14:04 +00:00
Roman Lebedev	39390d8317	[InstCombine] 'C-(C2-X) --> X+(C-C2)' constant-fold It looks this fold was already partially happening, indirectly via some other folds, but with one-use limitation. No other fold here has that restriction. https://rise4fun.com/Alive/ftR llvm-svn: 362217	2019-05-31 09:47:16 +00:00
Roman Lebedev	886c4ef35a	[InstCombine] 'add (sub C1, X), C2 --> sub (add C1, C2), X' constant-fold https://rise4fun.com/Alive/qJQ llvm-svn: 362216	2019-05-31 09:47:04 +00:00
Cullen Rhodes	0fc3a07398	[AArch64][SVE2] Asm: support WHILE instructions Summary: Patch adds support for the following instructions: * WHILEGE, WHILEGT, WHILEHS, WHILEHI, WHILEWR, WHILERW The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62601 llvm-svn: 362215	2019-05-31 09:13:55 +00:00
Cullen Rhodes	087d1337f8	[AArch64][SVE2] Asm: support TBL/TBX instructions Summary: A three sources variant of the TBL instruction is added to the existing SVE instruction in SVE2. This is implemented with minor changes to the existing TableGen class. TBX is a new instruction with its own definition. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62600 llvm-svn: 362214	2019-05-31 09:06:53 +00:00
Cullen Rhodes	2e870011b6	[AArch64][SVE2] Asm: support SVE2 store instructions Summary: Patch adds support for the following instructions: * STNT1B, STNT1H, STNT1S, STNT1D The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62599 llvm-svn: 362213	2019-05-31 08:59:40 +00:00
Petar Avramovic	efcd3c0009	[MIPS GlobalISel] Handle position independent code Handle position independent code for MIPS32. When callee is global address, lower call will emit callee as G_GLOBAL_VALUE and add target flag if needed. Support $gp in getRegBankFromRegClass(). Select G_GLOBAL_VALUE, specially handle case when there are target flags attached by lowerCall. Differential Revision: https://reviews.llvm.org/D62589 llvm-svn: 362210	2019-05-31 08:27:06 +00:00
Petar Avramovic	9058b50fb2	[mips] Move initGlobalBaseReg to MipsFunctionInfo. NFC Move initGlobalBaseReg from MipsSEDAGToDAGISel to MipsFunctionInfo. This way functions used for handling position independent code during instruction selection, getGlobalBaseReg and initGlobalBaseReg, end up in same class. Differential Revision: https://reviews.llvm.org/D62586 llvm-svn: 362206	2019-05-31 08:15:28 +00:00
Craig Topper	b457e430f3	[InstructionSimplify] Add missing implementation of llvm::SimplifyUnOp. NFC There are no callers currently, but the function is declared so we should at least implement it. llvm-svn: 362205	2019-05-31 08:10:23 +00:00
Petar Avramovic	f4a6dd28b6	[MIPS GlobalISel] Lower call for callee that is register Lower call for callee that is register for MIPS32. Register should contain callee function address. Differential Revision: https://reviews.llvm.org/D62585 llvm-svn: 362204	2019-05-31 08:06:17 +00:00
Craig Topper	31d00d80a2	[X86] Remove patterns for X86VSintToFP/X86VUintToFP+loadv4f32 to v2f64. These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits. Similar to PR42079. Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the load pattern used in the instructions. This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe we should use VZMOVL for widened loads even when we don't need the upper bits as zeroes? llvm-svn: 362203	2019-05-31 07:38:26 +00:00
Craig Topper	b79cc5f802	[X86] Remove avx512 isel patterns for fpextend+load. Prefer to only match fp extloads instead. DAG combine will usually fold fpextend+load to an fp extload anyway. So the 256 and 512 patterns were probably unnecessary. The 128 bit pattern was special in that it looked for a v4f32 load, but then used it in an instruction that only loads 64-bits. This is bad if the load happens to be volatile. We could probably make the patterns volatile aware, but that's more work for something that's probably rare. The peephole pass might kick in and save us anyway. We might also be able to fix this with some additional DAG combines. This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be used. Previously we looked for the unlikely vselect+fpextend+load. llvm-svn: 362199	2019-05-31 06:21:53 +00:00
Puyan Lotfi	0d63cef180	[MIR-Canon] Skip the first N vreg names lazily. This consolidates the vreg skip code into one function (SkipVRegs()). SkipVRegs() now knows if it should skip as if it is the first initialization or subsequent skips. The first skip is also done the first time createVirtualRegister is called by the cursor instead of by the cursor's constructor. This prevents verifier errors on machine functions that have no vregs (where the verifier will complain that there are vregs when the function uses none). Differential Revision: https://reviews.llvm.org/D62717 llvm-svn: 362195	2019-05-31 06:02:38 +00:00
Craig Topper	23066033a1	[X86] Correct the ins operand order for MASKPAIR16STORE to match other store instructions. This makes the 5 address operands come first. And the data operand comes last. This matches the operand order the instruction is created with. It's also the expected order in X86MCInstLower. So everything appeared to work, but the operands didn't match their declared type. Fixes a -verify-machineinstrs failure. Also remove the isel patterns from these instructions since they should only be used for stack spills and reloads. I'm not even sure what types the patterns were looking for to match. llvm-svn: 362193	2019-05-31 05:20:27 +00:00
Puyan Lotfi	2a901401fe	[MIR-Canon] Hardening propagateLocalCopies. This is am almost NFC, it does the following: - If there is no register class for a COPY's src or dst, bail. - Fixes uses iterator invalidation bug. Differential Revision: https://reviews.llvm.org/D62713 llvm-svn: 362191	2019-05-31 04:49:58 +00:00
Pengfei Wang	2e67d0c842	[X86] Add VP2INTERSECT instructions Support Intel AVX512 VP2INTERSECT instructions in llvm Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D62366 llvm-svn: 362188	2019-05-31 02:50:41 +00:00
Sam Clegg	9d21f510ee	Fix -DBUILD_SHARED_LIBS=ON build after rL362160 Differential Revision: https://reviews.llvm.org/D62709 llvm-svn: 362180	2019-05-31 01:04:00 +00:00
Craig Topper	70dc2200a2	[X86] Remove result type constraints from the extloadv2f32/extloadv4f32/extloadv8f32 PatFrags. NFC The result types aren't mentioned in the pattern name so really shouldn't be in the PatFrags. The users of these either have their own type constraint or rely on the type constranit system to realize the only legal extend would be to f64. llvm-svn: 362175	2019-05-30 23:35:24 +00:00
Matt Arsenault	18659f84b2	MISched: Fix -misched-regpressure=0 if subreg liveness enabled Test is waiting on fixing several more crashes in the AMDGPU scheduler implementation with this. llvm-svn: 362174	2019-05-30 23:31:36 +00:00
Craig Topper	d6b74cc859	[X86] Remove code that unnecessarily sets EXTLOAD with src type of v2f32/v4f32/v8f32 as Legal for SSE2/AVX/AVX512 respectively. NFC The LoadExt table defaults to all combinations being Legal. For vector types, only src VTs with an i1 element type were ever changed. So we don't need to mark them legal manually. llvm-svn: 362170	2019-05-30 22:29:06 +00:00
Francis Visoiu Mistrih	48998d10e0	[Remarks] Fix usage of enum class Breaks the build on some compilers: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9720/steps/build%20stage%201/logs/stdio llvm-svn: 362165	2019-05-30 22:01:56 +00:00
Francis Visoiu Mistrih	6ada11f134	[Remarks][NFC] Move the serialization to lib/Remarks Separate the remark serialization to YAML from the LLVM Diagnostics. This adds a new serialization abstraction: remarks::Serializer. It's completely independent from lib/IR and it provides an easy way to replace YAML by providing a new remarks::Serializer. Differential Revision: https://reviews.llvm.org/D62632 llvm-svn: 362160	2019-05-30 21:45:59 +00:00
Puyan Lotfi	daaecf98c9	[MIR-Canon] Fixing case where MachineFunction is empty. In cases where the machine function is empty: bail on the RPO traversal. Differential Revision: https://reviews.llvm.org/D62617 llvm-svn: 362158	2019-05-30 21:37:25 +00:00
Roman Lebedev	46511d75b5	[DAGCombine] Limit 'hoist add/sub binop w/ constant op' to non-opaque consts I don't have a test case for these, but there is a test case for D62266 where, even after all the constant-folding patches, we still end up with endless combine loop. Which makes sense, since we don't constant fold for opaque constants. llvm-svn: 362156	2019-05-30 21:10:37 +00:00
Nikita Popov	e906f2a370	[CVP] Generalize willNotOverflow(); NFC Change argument from WithOverflowInst to BinaryOpIntrinsic, so this function can also be used for saturating math intrinsics. llvm-svn: 362152	2019-05-30 21:03:10 +00:00
Lang Hames	a100042b27	[RuntimeDyld] Update reserveAllocationSpace to account for stub padding. This should fix the buildbot failures caused by r362139. llvm-svn: 362151	2019-05-30 20:58:28 +00:00
Martin Storsjo	0fe645c086	[InstCombine] Avoid use after free in DenseMap, when built with GCC Previously, this used a statement like this: Map[A] = Map[B]; This is equivalent to the following: const auto &Src = Map[B]; auto &Dest = Map[A]; Dest = Src; The second statement, "auto &Dest = Map[A];" can insert a new element into the DenseMap, which can potentially grow and reallocate the DenseMap's internal storage, which will invalidate the existing reference to the source. When doing the actual assignment, the Src reference is dereferenced, accessing memory that was freed when the DenseMap grew. This issue hasn't shown up when LLVM was built with Clang, because the right hand side ended up dereferenced before evaulating the left hand side. (If the value type is a larger data type, Clang doesn't do this but behaves like GCC.) With GCC, a cast to Value* isn't enough to make it dereference the right hand side reference before invoking operator[] (while that is enough to make Clang/LLVM do the right thing for larger types), but storing it in an intermediate variable in a separate statement works. This fixes PR42065. Differential Revision: https://reviews.llvm.org/D62624 llvm-svn: 362150	2019-05-30 20:53:21 +00:00
Roman Lebedev	a4e3b50e26	[DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold. Try 2 Summary: Only vector tests are being affected here, since subtraction by scalar constant is rewritten as addition by negated constant. No surprising test changes. https://rise4fun.com/Alive/pbT This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62257 llvm-svn: 362146	2019-05-30 20:37:49 +00:00
Roman Lebedev	57aa36ff91	[DAGCombine] (x - C) - y -> (x - y) - C fold. Try 3 Summary: Again only vectors affected. Frustrating. Let me take a look into that.. https://rise4fun.com/Alive/AAq This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62294 llvm-svn: 362145	2019-05-30 20:37:39 +00:00
Roman Lebedev	63b4741534	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 3 Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 362144	2019-05-30 20:37:29 +00:00
Roman Lebedev	05ad5fd213	[DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 3 Summary: Direct sibling of D62223 patch. While i don't have a direct motivational pattern for this, it would seem to make sense to handle both patterns (or none), for symmetry? The aarch64 changes look neutral; sparc and systemz look like improvement (one less instruction each); x86 changes - 32bit case improves, 64bit case shows that LEA no longer gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea` https://rise4fun.com/Alive/ffh This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62252 llvm-svn: 362143	2019-05-30 20:37:18 +00:00
Roman Lebedev	1d9ec7a81b	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 3 Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 362142	2019-05-30 20:36:54 +00:00
Lang Hames	0e124b37bd	[RuntimeDyld] Apply padding and alignment bumps to all sections with stubs, and increase the MachO/x86-64 stub alignment to 8. Stub alignment should be guaranteed for any section containing RuntimeDyld stubs/GOT-entries. To do this we should pad and align all sections containing stubs, not just code sections. This commit also bumps the MachO/x86-64 stub alignment to 8, so that GOT entries will be aligned. llvm-svn: 362139	2019-05-30 19:59:20 +00:00
Matt Arsenault	e0a4da8c0a	AMDGPU/GlobalISel: Add wave scratch offset argument Avoids crashing in PEI in a future change. llvm-svn: 362136	2019-05-30 19:33:18 +00:00
Roman Lebedev	7eb8b5b5dd	[DAGCombine] ((c1-A)-c2) -> ((c1-c2)-A) constant-fold Summary: https://rise4fun.com/Alive/B0A Reviewers: t.p.northover, RKSimon, spatel, craig.topper Reviewed By: RKSimon Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62691 llvm-svn: 362135	2019-05-30 19:27:51 +00:00
Roman Lebedev	691b5e2ecc	[DAGCombine] (A-C1)-C2 -> A-(C1+C2) constant-fold Summary: https://rise4fun.com/Alive/Mb1M Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62689 llvm-svn: 362134	2019-05-30 19:27:42 +00:00
Roman Lebedev	0a3dbbcdfb	[DAGCombine] (A+C1)-C2 -> A+(C1-C2) constant-fold Summary: Direct sibling of D62662, the root cause of the endless combine loop in D62257 https://rise4fun.com/Alive/d3W Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62664 llvm-svn: 362133	2019-05-30 19:27:32 +00:00
Roman Lebedev	9ff3159b4a	[DAGCombine] Use FoldConstantArithmetic() to perform C2-(A+C1) -> (C2-C1)-A fold Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry. Reviewers: spatel, RKSimon, craig.topper, t.p.northover Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62663 llvm-svn: 362132	2019-05-30 19:27:26 +00:00
Roman Lebedev	cc9a9cf237	[DAGCombine] ((A-c1)+c2) -> (A+(c2-c1)) constant-fold Summary: This was the root cause of the endless combine loop in D62257 https://rise4fun.com/Alive/d3W Reviewers: RKSimon, spatel, craig.topper, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62662 llvm-svn: 362131	2019-05-30 19:27:19 +00:00
Roman Lebedev	ef95679741	[DAGCombine] Use FoldConstantArithmetic() to perform ((c1-A)+c2) -> (c1+c2)-A fold Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry. Reviewers: spatel, RKSimon, craig.topper, t.p.northover Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62661 llvm-svn: 362130	2019-05-30 19:27:10 +00:00
Tim Northover	b7141207a4	Reapply: IR: add optional type to 'byval' function parameters When we switch to opaque pointer types we will need some way to describe how many bytes a 'byval' parameter should occupy on the stack. This adds a (for now) optional extra type parameter. If present, the type must match the pointee type of the argument. The original commit did not remap byval types when linking modules, which broke LTO. This version fixes that. Note to front-end maintainers: if this causes test failures, it's probably because the "byval" attribute is printed after attributes without any parameter after this change. llvm-svn: 362128	2019-05-30 18:48:23 +00:00
Tim Renouf	7fecdf36cc	[AMDGPU] Added target-specific attribute amdgpu-max-memory-clause With LLPC, previous investigation has suggested that si-scheduler interacts badly with SiFormMemoryClauses on an XNACK target in some games. That needs further investigation in the future. In the meantime, this commit adds a target-specific attribute to allow us to disable SIFormMemoryClauses by setting it to 1 on a per-function basis for LLPC to use. Differential Revision: https://reviews.llvm.org/D62572 Change-Id: Ia0ca12ce79093cbbe86caded723ffb13384ede92 llvm-svn: 362127	2019-05-30 18:46:34 +00:00
Florian Hahn	9bbdde2598	[LV] Remove the redundant using LoopVectorizationPlanner:VPlanPtr VPlan.h already contains the declaration of VPlanPtr type alias: using VPlanPtr = std::unique_ptr<VPlan>; The LoopVectorizationPlanner class also contains the same declaration of VPlanPtr and therefore LoopVectorize requires a long wording when its methods return VPlanPtr: LoopVectorizationPlanner::VPlanPtr LoopVectorizationPlanner::buildVPlanWithVPRecipes(...) but LoopVectorize.cpp includes VPlan.h (via LoopVectorizationPlanner.h) and can use VPlanPtr from that header. Patch by Pavel Samolysov. Reviewers: hsaito, rengolin, fhahn Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D62576 llvm-svn: 362126	2019-05-30 18:46:13 +00:00
Craig Topper	778e445c58	[LoopVectorize] Add FNeg instruction support Differential Revision: https://reviews.llvm.org/D62510 llvm-svn: 362124	2019-05-30 18:19:35 +00:00
Puyan Lotfi	0f4446b270	[MIR-Canon] Add support for rewriting VRegs that are typed but don't have an RC. There were crashes (addrspace-memoperands.mir was only one of them) in MIR that had operands that came from before register classes were set. With these operands, creating a replacement vreg (for MIR-Canon's renaming) needs to use the vreg type rather than the RegisterClass which is not present. Differential Revision: https://reviews.llvm.org/D62543 llvm-svn: 362122	2019-05-30 18:06:28 +00:00
Kevin P. Neal	51ce0b196a	Correct error in revert of r362112. Differential Revision: http://reviews.llvm.org/D62546 llvm-svn: 362118	2019-05-30 17:21:45 +00:00
Kevin P. Neal	d3db7b40b0	Revert r362112, it broke the bots with the message "Unsupported vector argument or return type" Differential Revision: http://reviews.llvm.org/D62546 llvm-svn: 362117	2019-05-30 17:10:21 +00:00
Kevin P. Neal	2e1807678d	[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully. Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Cameron McInally, Kevin P. Neal Approved by: Cameron McInally Differential Revision: http://reviews.llvm.org/D62546 llvm-svn: 362112	2019-05-30 16:44:47 +00:00
Roman Lebedev	019d270e43	[DAGCombine] Revert of recommit of "binop-with-const hoisting" patches I was looking into an endless combine loop the uncommitted follow-up patch was causing, and it appears even these patches can exibit such an endless loop. The root cause is that we try to hoist one binop (add/sub) with constant operand, and if we get two such binops both of which are eligible for this hoisting, we get stuck. Some cases may highlight missing constant-folds. Reverts r361871,r361872,r361873,r361874. llvm-svn: 362109	2019-05-30 16:07:11 +00:00
Sam Parker	913604a637	[NFC][ARM][ParallelDSP] Refactor narrow sequence Most of the code used for finding a 'narrow' sequence is not used, so I've removed it and simplified the calls from the smlad matcher. llvm-svn: 362104	2019-05-30 15:26:37 +00:00
Sjoerd Meijer	eb072b5a6a	[ARM] Change the MC names for VMAXNM/VMINNM Now the NEON ones have a prefix "NEON_", and the VFP ones have a prefix "VFP_". This is so that the regex in ARMScheduleA57.td can be made to match both of _those_ classes of VMAXNM without also matching the MVE ones that are going to be introduced soon. NFCI. Patch by Simon Tatham. Differential Revision: https://reviews.llvm.org/D60700 llvm-svn: 362097	2019-05-30 14:34:29 +00:00
Simon Pilgrim	5359bb4d31	[ARM] LowerVECTOR_SHUFFLE - fix uninitialized variable warnings. NFCI. llvm-svn: 362094	2019-05-30 14:01:24 +00:00
Roman Lebedev	e8578953ac	[LoopIdiom] Basic OptimizationRemarkEmitter handling Summary: I'm adding ORE to memset/memcpy formation, with tests, but mainly this is split off from D61144. Reviewers: reames, anemet, thegameg, craig.topper Reviewed By: thegameg Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62631 llvm-svn: 362092	2019-05-30 13:02:06 +00:00
Roman Lebedev	fae2e46766	[LoopIdiomRecognize][NFC] Sort includes Split off from D61144 llvm-svn: 362091	2019-05-30 13:01:53 +00:00
Sjoerd Meijer	930dee2c0b	[ARM] add target arch definitions for 8.1-M and MVE This adds: - LLVM subtarget features to make all the new instructions conditional on, - CPU and FPU names for use on clang's command line, with default FPUs set so that "armv8.1-m.main+fp" and "armv8.1-m.main+fp.dp" will select the right FPU features, - architecture extension names "mve" and "mve.fp", - ABI build attribute support for v8.1-M (a new value for Tag_CPU_arch) and MVE (a new actual tag). Patch mostly by Simon Tatham. Differential Revision: https://reviews.llvm.org/D60698 llvm-svn: 362090	2019-05-30 12:57:04 +00:00
Sjoerd Meijer	7eb95d672d	[ARM] Introduce separate features for FP registers The MVE extension in Arm v8.1-M permits the use of some move, load and store isntructions which access the FP registers, even if there's no actual FP support in the processor (in particular, if you have the integer-only version of MVE). Therefore, we need separate subtarget features to condition those instructions on, which are implied by both FP and MVE but are not part of either. Patch mostly by Simon Tatham. Differential Revision: https://reviews.llvm.org/D60694 llvm-svn: 362088	2019-05-30 12:37:05 +00:00
Simon Pilgrim	32aac1727a	[X86][SSE] Improve bool vector extload (PR26091) We already have good codegen for (vXiY *ext(vXi1 bitcast(iX))) cases, this patch uses it for loads of vXi1 types as well - changing the load into a iX integer load, and bitcasting so that combineToExtendBoolVectorInReg can then use it. Differential Revision: https://reviews.llvm.org/D62449 llvm-svn: 362081	2019-05-30 10:25:20 +00:00
Cullen Rhodes	7fad428931	[AArch64][SVE2] Asm: support SVE2 vector splice (constructive) Summary: The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62530 llvm-svn: 362073	2019-05-30 08:51:39 +00:00
Cullen Rhodes	ebe23041f0	[AArch64][SVE2] Asm: support SVE2 load instructions Summary: Patch adds support for the following instructions: * LDNT1SB, LDNT1B, LDNT1SH, LDNT1H, LDNT1SW, LDNT1W, LDNT1D The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62528 llvm-svn: 362072	2019-05-30 08:44:27 +00:00
Cullen Rhodes	455c529f77	[AArch64][SVE2] Asm: support FCVTX/FLOGB instructions Summary: Patch completes SVE2 support for: SVE Floating Point Unary Operations - Predicated Group The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62526 llvm-svn: 362071	2019-05-30 08:35:12 +00:00
Cullen Rhodes	028413f5ae	[AArch64][SVE2] Asm: add ext (immediate offset, constructive) instruction Summary: The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62518 llvm-svn: 362070	2019-05-30 08:25:17 +00:00
Sjoerd Meijer	5857bf5d1e	[ARM] Add an MVE execution domain MVE architecturally specifies a 'beat' system in which a vector instruction executed now will complete its actual operation over the next four cycles, so it can overlap with the execution of the previous and next MVE instruction. This makes it generally an advantage to avoid moving values back and forth between MVE registers and anywhere else, if there's any sensible way to do the same processing in whatever register type the values already occupied. That's just what the 'execution domain' system is supposed to achieve. So here we add a new execution domain which will contain all the MVE vector instructions when they are added. Patch by: Simon Tatham Differential Revision: https://reviews.llvm.org/D60703 llvm-svn: 362068	2019-05-30 08:07:06 +00:00
Florian Hahn	e4cfa89915	[LV] Inform about exactly reason of loop illegality Currently, only the following information is provided by LoopVectorizer in the case when the CF of the loop is not legal for vectorization: LV: Can't vectorize the instructions or CFG LV: Not vectorizing: Cannot prove legality. But this information is not enough for the root cause analysis; what is exactly wrong with the loop should also be printed: LV: Not vectorizing: The exiting block is not the loop latch. Patch by Pavel Samolysov. Reviewers: mkuper, hsaito, rengolin, fhahn Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D62311 llvm-svn: 362056	2019-05-30 05:03:12 +00:00
Pengfei Wang	1f67d94279	[X86] Add ENQCMD instructions For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Patch by Tianqing Wang (tianqing) Differential Revision: https://reviews.llvm.org/D62281 llvm-svn: 362053	2019-05-30 03:59:16 +00:00
Amy Huang	325003be02	CodeView - add static data members to global variable debug info. Summary: Add static data members to IR debug info's list of global variables so that they are emitted as S_CONSTANT records. Related to https://bugs.llvm.org/show_bug.cgi?id=41615. Reviewers: rnk Subscribers: aprantl, cfe-commits, llvm-commits, thakis Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62167 llvm-svn: 362038	2019-05-29 21:45:34 +00:00
Matt Arsenault	79b3ea701c	LoopVersioningLICM: Respect convergent and noduplicate llvm-svn: 362031	2019-05-29 20:47:59 +00:00
Tim Northover	71ee3d0237	Revert "IR: add optional type to 'byval' function parameters" The IRLinker doesn't delve into the new byval attribute when mapping types, and this breaks LTO. llvm-svn: 362029	2019-05-29 20:46:38 +00:00
Roman Lebedev	95dec50a35	[LoopIdiomRecognize][NFC] Use DEBUG_TYPE, add LLVM_DEBUG() to runOnNoncountableLoop() Split off from D61144 llvm-svn: 362022	2019-05-29 20:11:53 +00:00
Pete Couperus	d80024c687	[ARC] Cleanup ARCAsmPrinter. Summary: Remove unused getTargetStreamer. Remove unused headers. Reviewers: dantrushin Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62549 llvm-svn: 362021	2019-05-29 20:07:35 +00:00
Benjamin Kramer	107f8d9873	[DAGCombiner] Replace gathers with a zero mask with the passthru value These can be created by the legalizer when splitting a larger gather. See https://llvm.org/PR42055 for a motivating example. Differential Revision: https://reviews.llvm.org/D62613 llvm-svn: 362015	2019-05-29 19:24:19 +00:00
Tim Northover	6e07f16fae	IR: add optional type to 'byval' function parameters When we switch to opaque pointer types we will need some way to describe how many bytes a 'byval' parameter should occupy on the stack. This adds a (for now) optional extra type parameter. If present, the type must match the pointee type of the argument. Note to front-end maintainers: if this causes test failures, it's probably because the "byval" attribute is printed after attributes without any parameter after this change. llvm-svn: 362012	2019-05-29 19:12:48 +00:00
Nikita Popov	5382803b04	[InstCombine] Optimize always overflowing signed saturating add/sub Based on the overflow direction information added in D62463, we can now fold always overflowing signed saturating add/sub to signed min/max. Differential Revision: https://reviews.llvm.org/D62544 llvm-svn: 362006	2019-05-29 18:37:13 +00:00
Aakanksha Patil	d5443f8c21	AMDGPU: Return address lowering The patch computes the return address for the current function. Differential revision: https://reviews.llvm.org/D59666 llvm-svn: 362001	2019-05-29 18:20:11 +00:00
Matt Arsenault	f80c4241b3	CallSiteSplitting: Respect convergent and noduplicate llvm-svn: 361990	2019-05-29 16:59:48 +00:00
Teresa Johnson	5b2088d1fa	[ThinLTO] Use original alias visibility when importing Summary: When we import an alias, we do so by making a clone of the aliasee. Just as this clone uses the original alias name and linkage, it should also use the same visibility (not the aliasee's visibility). Otherwise, linker behavior is affected (e.g. if the aliasee was hidden, but the alias is not, the resulting imported clone should not be hidden, otherwise the linker will make the final symbol hidden which is incorrect). Reviewers: wmi Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62535 llvm-svn: 361989	2019-05-29 16:50:46 +00:00
Sam McCall	4f09d9fcfa	Qualify use of llvm::empty that's ambiguous with std::empty llvm-svn: 361968	2019-05-29 15:02:16 +00:00
Simon Atanasyan	188162118f	[mips] Iterate over MSACtrlRegClass to reserve all MSA control registers. NFC llvm-svn: 361965	2019-05-29 14:58:56 +00:00
Simon Atanasyan	af7bf2f687	[mips] Use range-based for loops. NFC llvm-svn: 361964	2019-05-29 14:58:50 +00:00
Sjoerd Meijer	24c5629625	[ARM] Split predicates out into their own .td file The new ARMPredicates.td is included from ARM.td, early enough that the predicate definitions are already in scope when ARMSchedule.td is included. This will make it possible to refer to them in UnsupportedFeatures fields of scheduling models. NFC: the chunk of Tablegen being moved here is copied and pasted verbatim. Patch by: Simon Tatham Differential Revision: https://reviews.llvm.org/D60693 llvm-svn: 361958	2019-05-29 13:41:57 +00:00
Matt Arsenault	36e7254441	SpeculateAroundPHIs: Respect convergent llvm-svn: 361957	2019-05-29 13:14:39 +00:00
Graham Hunter	f4fc01f8dd	[SVE][IR] Scalable Vector IR Type * Adds a 'scalable' flag to VectorType * Adds an 'ElementCount' class to VectorType to pass (possibly scalable) vector lengths, with overloaded operators. * Modifies existing helper functions to use ElementCount * Adds support for serializing/deserializing to/from both textual and bitcode IR formats * Extends the verifier to reject global variables of scalable types * Updates documentation See the latest version of the RFC here: http://lists.llvm.org/pipermail/llvm-dev/2018-July/124396.html Reviewers: rengolin, lattner, echristo, chandlerc, hfinkel, rkruppe, samparker, SjoerdMeijer, greened, sebpop Reviewed By: hfinkel, sebpop Differential Revision: https://reviews.llvm.org/D32530 llvm-svn: 361953	2019-05-29 12:22:54 +00:00
Andrea Di Biagio	280ac1fd1d	[MCA] Refactor class LSUnit. NFCI This should be the last bit of refactoring in preparation for a patch that would finally fix PR37494. This patch introduces the concept of memory dependency groups (class MemoryGroup) and "Load/Store Unit token" (LSUToken) to track the status of a memory operation. A MemoryGroup is a node of a memory dependency graph. It is used internally to classify memory operations based on the memory operations they depend on. Let I and J be two memory operations, we say that I and J equivalent (for the purpose of mapping instructions to memory dependency groups) if the set of memory operations they depend depend on is identical. MemoryGroups are identified by so-called LSUToken (a unique group identifier assigned by the LSUnit to every group). When an instruction I is dispatched to the LSUnit, the LSUnit maps I to a group, and then returns a LSUToken. LSUTokens are used by class Scheduler to track memory dependencies. This patch simplifies the LSUnit interface and moves most of the implementation details to its base class (LSUnitBase). There is no user visible change to the output. llvm-svn: 361950	2019-05-29 11:38:27 +00:00
Cullen Rhodes	6c04ef3d48	[AArch64][SVE2] Asm: support SVE Bitwise Logical - Unpredicated Group Summary: Patch adds support for the following instructions: * EOR3, BSL, BCAX, BSL1N, BSL2N, NBSL, XAR Aliases for types .B/.H/.S for EOR3 and BCAX have been added, the preferred disassembly is .D. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62387 llvm-svn: 361936	2019-05-29 09:03:27 +00:00
Cullen Rhodes	75dfbdc2da	[AArch64][SVE2] Asm: support Floating Point Widening Multiply-Add Summary: Patch adds support for the indexed and unpredicated vectors forms of the FMLALB, FMLALT, FMLSLB and FMLSLT instructions. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62386 llvm-svn: 361935	2019-05-29 08:53:06 +00:00
Cullen Rhodes	4f58ad4e72	[AArch64][SVE2] Asm: support SVE2 Floating Point Pairwise Group Summary: Patch adds support for the following instructions: SVE2 floating-point pairwise operations: * FADDP, FMAXNMP, FMINNMP, FMAXP, FMINP The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62383 llvm-svn: 361933	2019-05-29 08:40:33 +00:00
Richard Trieu	c77aff7e17	Inline a variable into debug section to fix unused variable warning. llvm-svn: 361927	2019-05-29 04:09:32 +00:00
Richard Trieu	e8698ead9d	Inline value into debug statement to avoid unused variable warning. llvm-svn: 361924	2019-05-29 03:43:01 +00:00
Peter Collingbourne	31fda09b2d	Add IR support, ELF section and user documentation for partitioning feature. The partitioning feature was proposed here: http://lists.llvm.org/pipermail/llvm-dev/2019-February/130583.html This is mostly just documentation. The feature itself will be contributed in subsequent patches. Differential Revision: https://reviews.llvm.org/D60242 llvm-svn: 361923	2019-05-29 03:29:01 +00:00
Peter Collingbourne	10c548cdfa	IR: Give the TypeAllocator a more generic name and start using it for section names as well. NFCI. This prepares us to start using it for partition names. llvm-svn: 361922	2019-05-29 03:28:51 +00:00
Jinsong Ji	f6cb3bcb4c	Support resource tracking with InstrSchedModel The current design use DFA to do resource tracking in SMS, and DFA only support InstrItins, and also has scaling limitation. This patch extend SMS to allow Subtarget to use ProcResource in InstrSchedModel instead. Differential Revision: https://reviews.llvm.org/D62163 llvm-svn: 361919	2019-05-29 03:02:59 +00:00
Pengfei Wang	72e3f9662b	Revert "[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to" This reverts commit c1b3716614bc0a107e6f41a7d3d503baefad8a5b. llvm-svn: 361918	2019-05-29 02:49:59 +00:00
Pengfei Wang	818c652643	[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to avoid static check fail RegClassOrBank is an object of RegClassOrRegBank, which is defined as using llvm::RegClassOrRegBank = typedef PointerUnion<const TargetRegisterClass , const RegisterBank > so control flow can not get here. Use ""llvm_unreachable" here to avoid "null pointer" confusion. Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62006 Signed-off-by: pengfei <pengfei.wang@intel.com> llvm-svn: 361912	2019-05-29 02:20:37 +00:00
Fangrui Song	656afe370d	[X86] Fix x86-64 call foo@tlsdesc(%rax) and support R_386_TLSGOTDESC R_386_TLS_DESC_CALL D18885 emitted 5 bytes for call foo@tlsdesc(%rax). It should use the 2-byte form instead and let R_X86_64_TLSDESC_CALL apply to the beginning of the call instruction. The 2-byte form was deliberately chosen to make ->LE and ->IE relaxation work: 0: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 7 <.text+0x7> 3: R_X86_64_GOTPC32_TLSDESC a-0x4 7: ff 10 callq *(%rax) 7: R_X86_64_TLSDESC_CALL a => 0: 48 c7 c0 fc ff ff ff mov $0xfffffffffffffffc,%rax 7: 66 90 xchg %ax,%ax Also change the symbol type to STT_TLS when VK_TLSCALL or VK_TLSDESC is seen. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D62512 llvm-svn: 361910	2019-05-29 02:02:59 +00:00
Thomas Lively	26d711be6e	[WebAssembly] Add signatures for RINT builtins Reviewers: azakai, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62564 llvm-svn: 361904	2019-05-29 01:06:00 +00:00
Quentin Colombet	a6f57ad2c9	[RegUsageInfoCollector] Don't mark as saved registers that don't have subregister lanes To determine the list of clobbered registers, the RegUsageInfoCollector pass uses the list of callee saved registers provided by the target and then augments it with the list of registers which have all their subregisters saved. It then basically does the difference between all the registers and the saved registers to come up with what is clobbered (plus it checks that the register is defined within that functions). The patch fixes a bug where when register does not have any subregister lane, hence when checking if any of its subregister are not saved, we would find none and think the register is saved as well. That's obviously wrong. The code was actually kind of checking for something like that with the CoveredBySubRegs bit. What this bit says is that a register is completely covered by its subregisters. We required that this bit was set, to check that a register was saved by its subregister lanes, since without this bit, we potentially would miss to check some part of the register. However, this bit is used de facto on registers that don't have any subregisters (e.g., on ARM) and the code was not prepared for that. This patch fixes this by checking that a register has subregisters before declaring it saved when none of its lanes are modified. llvm-svn: 361901	2019-05-28 23:43:12 +00:00
Lang Hames	eb5ee3004f	[ORC] Track JIT symbol states more explicitly. Prior to this patch, JITDylibs inferred symbol states (whether a symbol was newly added, materializing, resolved, or ready to run) via a combination of (1) bits in the JITSymbolFlags member, and (2) the state of some internal JITDylib data structures. This patch explicitly tracks symbol states by adding a new SymbolState member to the symbol table entries, and removing the 'Lazy' and 'Materializing' bits from JITSymbolFlags. This is a first step towards adding additional states representing initialization phases (e.g. eh-frame registration, registration with the language runtime, and static initialization). llvm-svn: 361899	2019-05-28 23:35:44 +00:00
Jessica Paquette	b73ea75b38	[AArch64][GlobalISel] Select FCMPSri/FCMPDri when comparing against 0.0 Add support for selecting FCMPSri and FCMPDri when comparing against 0.0, and factor out opcode selection for G_FCMP into its own function. Add a test to show that we don't do this with other immediates. Differential Revision: https://reviews.llvm.org/D62539 llvm-svn: 361888	2019-05-28 22:52:49 +00:00
Heejin Ahn	5514658591	[WebAssembly] Support for atomic fences Summary: This adds support for translation of LLVM IR fence instruction. We convert a singlethread fence to a pseudo compiler barrier which becomes 0 instructions in final binary, and a thread fence to an idempotent atomicrmw instruction to a memory address. Reviewers: dschuff, jfb, sunfish, tlively Subscribers: sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D50277 llvm-svn: 361884	2019-05-28 22:09:12 +00:00
Rong Xu	e88173abc0	[PGO] Handle cases of failing to split critical edges Fix PR41279 where critical edges to EHPad are not split. The fix is to not instrument those critical edges. We used to be able to know the size of counters right after MST is computed. With this, we have to pre-collect the instrument BBs to know the size, and then instrument them. Differential Revision: https://reviews.llvm.org/D62439 llvm-svn: 361882	2019-05-28 21:45:56 +00:00
Nikita Popov	5b32f60ec3	Revert "[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst" This reverts commit `53f2f32865`. As reported on D62126, this causes assertion failures if the switch has incorrect branch_weights metadata, which may happen as a result of other transforms not handling it correctly yet. llvm-svn: 361881	2019-05-28 21:28:24 +00:00
Konstantin Zhuravlyov	fe23ed2c68	AMDGPU: Temporary drop s_mul_hi_i/u32 patterns It introduces performance regressions in several applications. This has already been submitted downstream. llvm-svn: 361879	2019-05-28 21:18:34 +00:00
Adhemerval Zanella	34d8daae53	[AArch64] Handle ISD::LRINT and ISD::LLRINT This patch optimizes ISD::LRINT and ISD::LLRINT to frintx plus fcvtzs. It currently only handles the scalar version. Reviewed By: SjoerdMeijer, mstorsjo Differential Revision: https://reviews.llvm.org/D62018 llvm-svn: 361877	2019-05-28 21:04:29 +00:00
Adhemerval Zanella	6d7bf5e8df	[CodeGen] Add lrint/llrint builtins This patch add the ISD::LRINT and ISD::LLRINT along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lrint/llrint generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D62017 llvm-svn: 361875	2019-05-28 20:47:44 +00:00
Roman Lebedev	dfc34f0211	[DAGCombine] (x - C) - y -> (x - y) - C fold. Try 2 Summary: Again only vectors affected. Frustrating. Let me take a look into that.. https://rise4fun.com/Alive/AAq This is a recommit, originally committed in rL361856, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62294 llvm-svn: 361874	2019-05-28 20:40:10 +00:00
Roman Lebedev	d485c6bc9f	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 2 Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl This is a recommit, originally committed in rL361855, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 361873	2019-05-28 20:40:03 +00:00
Roman Lebedev	96c9986199	[DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 2 Summary: Direct sibling of D62223 patch. While i don't have a direct motivational pattern for this, it would seem to make sense to handle both patterns (or none), for symmetry? The aarch64 changes look neutral; sparc and systemz look like improvement (one less instruction each); x86 changes - 32bit case improves, 64bit case shows that LEA no longer gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea` https://rise4fun.com/Alive/ffh This is a recommit, originally committed in rL361853, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62252 llvm-svn: 361872	2019-05-28 20:39:55 +00:00
Roman Lebedev	2feb7e56e2	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 2 Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs. Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 361871	2019-05-28 20:39:39 +00:00
Michael Liao	5fc1dfa784	[AMDGPU] Correct the handling of inlineasm output registers. Summary: - There's a regression due to the cross-block RC assignment. Use the proper way to derive the output register RC in inline asm. Reviewers: rampitec, alex-t Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62537 llvm-svn: 361868	2019-05-28 19:37:09 +00:00
Roman Lebedev	272d70c366	Revert DAGCombine "hoist binop with const" folds Appear to introduce test-suite compile-time hang. http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/22825 This reverts r361852,r361853,r361854,r361855,r361856 llvm-svn: 361865	2019-05-28 19:04:21 +00:00
Nikita Popov	c51cdacab9	[InstCombine] Clean up saturing math overflow optimizations; NFC Reduce duplication and make it easier to handle signed always-overflows conditions in the future. llvm-svn: 361863	2019-05-28 18:59:21 +00:00
Nikita Popov	332c100562	[ValueTracking][ConstantRange] Distinguish low/high always overflow In order to fold an always overflowing signed saturating add/sub, we need to know in which direction the always overflow occurs. This patch splits up AlwaysOverflows into AlwaysOverflowsLow and AlwaysOverflowsHigh to pass through this information (but it is not used yet). Differential Revision: https://reviews.llvm.org/D62463 llvm-svn: 361858	2019-05-28 18:08:31 +00:00
Nikita Popov	2fb0a820df	[IR] Add SaturatingInst and BinaryOpIntrinsic classes Based on the suggestion in D62447, this adds a SaturatingInst class that represents the saturating add/sub family of intrinsics. It exposes the same interface as WithOverflowInst, for this reason I have also added a common base class BinaryOpIntrinsic that holds the actual implementation code and will be useful in some places handling both overflowing and saturating math. Differential Revision: https://reviews.llvm.org/D62466 llvm-svn: 361857	2019-05-28 18:08:06 +00:00
Roman Lebedev	7669665432	[DAGCombine] (x - C) - y -> (x - y) - C fold Summary: Again only vectors affected. Frustrating. Let me take a look into that.. https://rise4fun.com/Alive/AAq Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62294 llvm-svn: 361856	2019-05-28 17:54:21 +00:00
Roman Lebedev	8c9b3e4e4a	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 361855	2019-05-28 17:54:13 +00:00
Roman Lebedev	6a24c9b9ab	[DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold Summary: Only vector tests are being affected here, since subtraction by scalar constant is rewritten as addition by negated constant. No surprising test changes. https://rise4fun.com/Alive/pbT Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62257 llvm-svn: 361854	2019-05-28 17:54:04 +00:00
Roman Lebedev	1499f65ac1	[DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold Summary: Direct sibling of D62223 patch. While i don't have a direct motivational pattern for this, it would seem to make sense to handle both patterns (or none), for symmetry? The aarch64 changes look neutral; sparc and systemz look like improvement (one less instruction each); x86 changes - 32bit case improves, 64bit case shows that LEA no longer gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea` https://rise4fun.com/Alive/ffh Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62252 llvm-svn: 361853	2019-05-28 17:53:54 +00:00
Roman Lebedev	19f51ec04a	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 361852	2019-05-28 17:53:43 +00:00
Sanjay Patel	f7980e727f	Revert "[x86] split 256-bit store of concatenated vectors" This reverts commit `d5a8637072`. Most likely suspect for this bot failure: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9684 llvm-svn: 361850	2019-05-28 17:37:58 +00:00
Matt Arsenault	24e80b8d04	AMDGPU: Don't enable all lanes with non-CSR VGPR spills If the only VGPRs used for SGPR spilling were not CSRs, this was enabling all laness and immediately restoring exec. This is the usual situation in leaf functions. llvm-svn: 361848	2019-05-28 16:46:02 +00:00
Michael Liao	7166843f1e	[AMDGPU] Fix the mis-handling of `vreg_1` copied from scalar register. Summary: - Don't treat the use of a scalar register as `vreg_1` an VGPR usage. Otherwise, that promotes that scalar register into vector one, which breaks the assumption that scalar register holds the lane mask. - The issue is triggered in a complicated case, where if the uses of that (lane mask) scalar register is legalized firstly before its definition, e.g., due to the mismatch block placement and its topological order or loop. In that cases, the legalization of PHI introduces the use of that scalar register as `vreg_1`. Reviewers: rampitec, nhaehnle, arsenm, alex-t Subscribers: kzhuravl, jvesely, wdng, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62492 llvm-svn: 361847	2019-05-28 16:29:39 +00:00
Simon Tatham	760df47b77	[ARM] Replace fp-only-sp and d16 with fp64 and d32. Those two subtarget features were awkward because their semantics are reversed: each one indicates the _lack_ of support for something in the architecture, rather than the presence. As a consequence, you don't get the behavior you want if you combine two sets of feature bits. Each SubtargetFeature for an FP architecture version now comes in four versions, one for each combination of those options. So you can still say (for example) '+vfp2' in a feature string and it will mean what it's always meant, but there's a new string '+vfp2d16sp' meaning the version without those extra options. A lot of this change is just mechanically replacing positive checks for the old features with negative checks for the new ones. But one more interesting change is that I've rearranged getFPUFeatures() so that the main FPU feature is appended to the output list before rather than after the features derived from the Restriction field, so that -fp64 and -d32 can override defaults added by the main feature. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: srhines, javed.absar, eraman, kristof.beyls, hiraditya, zzheng, Petar.Avramovic, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60691 llvm-svn: 361845	2019-05-28 16:13:20 +00:00
Fangrui Song	448a79d123	[AArch64] Delete unused VariantKind in AArch64MCExpr llvm-svn: 361844	2019-05-28 16:11:56 +00:00
David Greene	561fcc0d63	[X86-64] Fix 256-bit SET0 lowering for non-VLX targets If we don't have VLX then 256-bit SET0 should be lowered to VPXOR with ZMM registers. This restores functionality accidentally removed by r309926. Differential Revision: https://reviews.llvm.org/D62415 llvm-svn: 361843	2019-05-28 15:37:01 +00:00
Nico Weber	a2ca6e7803	llvm-undname: Support demangling char8_t Ports clang's mangling support added in r354633 to llvm-undname. llvm-svn: 361839	2019-05-28 15:30:04 +00:00
Nico Weber	88ab281b4d	llvm-undname: Add support for local static thread guards llvm-svn: 361835	2019-05-28 14:54:49 +00:00
Jason Liu	9212206d25	[XCOFF] Implement parsing symbol table for xcoffobjfile and output as yaml format Summary: This patch implement parsing symbol table for xcoffobjfile and output as yaml format. Parsing auxiliary entries of a symbol will be in a separate patch. The XCOFF object file (aix_xcoff.o) used in the test comes from -bash-4.2$ cat test.c extern int i; extern int TestforXcoff; int main() { i++; TestforXcoff--; } Patch by DiggerLin Reviewers: sfertile, hubert.reinterpretcast, MaskRay, daltenty Differential Revision: https://reviews.llvm.org/D61532 llvm-svn: 361832	2019-05-28 14:37:59 +00:00
Sanjay Patel	d5a8637072	[x86] split 256-bit store of concatenated vectors This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is the reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 361822	2019-05-28 13:54:17 +00:00
Simon Pilgrim	9cd9624fb6	[DAG] LegalizeVectorTypes - reduce scope of local variables. NFCI. Move the element index/count variables into the block where they are actually used - appeases cppcheck and helps avoid shadow variable warnings. llvm-svn: 361821	2019-05-28 13:46:26 +00:00
David Stenberg	5d0e6b6755	Stop undef fragments from closing non-overlapping fragments Summary: When DwarfDebug::buildLocationList() encountered an undef debug value, it would truncate all open values, regardless if they were overlapping or not. This patch fixes so that it only does that for overlapping fragments. This change unearthed a bug that I had introduced in D57511, which I have fixed in this patch. The code in DebugHandlerBase that changes labels for parameter debug values could break DwarfDebug's assumption that the labels for the entries in the debug value history are monotonically increasing. Before this patch, that bug could result in location list entries whose ending address was lower than the beginning address, and with the changes for undef debug values that this patch introduces it could trigger an assertion, due to attempting to emit location list entries with empty ranges. A reproducer for the bug is added in param-reg-const-mix.mir. Reviewers: aprantl, jmorse, probinson Reviewed By: aprantl Subscribers: javed.absar, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D62379 llvm-svn: 361820	2019-05-28 13:23:25 +00:00
Matt Arsenault	d3ed418ad3	MIR: Fix printer crashing on dead CSR frame indexes llvm-svn: 361819	2019-05-28 13:08:31 +00:00
Sanjay Patel	6bf4ca9d2e	[x86] fix 256-bit vector store splitting to honor 'volatile' Forking this out of the discussion in D62498 (and assuming that will be committed later, so adding the helper function here). The LangRef says: "the backend should never split or merge target-legal volatile load/store instructions." Differential Revision: https://reviews.llvm.org/D62506 llvm-svn: 361815	2019-05-28 12:58:07 +00:00
Benjamin Kramer	57e267a2e9	[X86] Custom lower CONCAT_VECTORS of v2i1 The generic legalizer cannot handle this. Add an assert instead of silently miscompiling vectors with elements smaller than 8 bits. llvm-svn: 361814	2019-05-28 12:52:57 +00:00
Graham Hunter	19e91253c0	[NFC] Test commit, delete trailing whitespace llvm-svn: 361813	2019-05-28 12:36:39 +00:00
Hans Wennborg	d936e40575	Re-commit r357452 (take 2): "SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)" This was reverted in r360086 as it was supected of causing mysterious test failures internally. However, it was never concluded that this patch was the root cause. > The code was previously checking that candidates for sinking had exactly > one use or were a store instruction (which can't have uses). This meant > we could sink call instructions only if they had a use. > > That limitation seemed a bit arbitrary, so this patch changes it to > "instruction has zero or one use" which seems more natural and removes > the need to special-case stores. > > Differential revision: https://reviews.llvm.org/D59936 llvm-svn: 361811	2019-05-28 12:19:38 +00:00
Yevgeny Rouban	53f2f32865	[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst This patch fixes the CorrelatedValuePropagation pass to keep prof branch_weights metadata of SwitchInst consistent. It makes use of SwitchInstProfUpdateWrapper. New tests are added. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D62126 llvm-svn: 361808	2019-05-28 11:33:50 +00:00
Simon Pilgrim	4b48aa0e30	[X86] X86CmovConverterPass::collectCmovCandidates - fix uninitialized variable warnings. NFCI. llvm-svn: 361804	2019-05-28 10:53:23 +00:00
Cullen Rhodes	f57bd6bd23	[AArch64][SVE2] Asm: support SVE2 Floating Point Convert Group Summary: Patch adds support for the following intructions: SVE2 floating-point convert precision: * FCVTXNT, FCVTNT, FCVTLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62382 llvm-svn: 361801	2019-05-28 09:36:52 +00:00
Cullen Rhodes	8e91dd7934	[AArch64][SVE2] Asm: support SVE2 Crypto Extensions Group Summary: Patch adds support for the following instructions: SVE2 crypto constructive binary operations: * SM4EKEY, RAX1 SVE2 crypto destructive binary operations: * AESE, AESD, SM4E SVE2 crypto unary operations: * AESMC, AESIMC AESE, AESD, AESMC and AESIMC are enabled with +sve2-aes. SM4E and SM4EKEY are enabled with +sve2-sm4. RAX1 is enabled with +sve2-sha3. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62307 llvm-svn: 361797	2019-05-28 09:13:17 +00:00
Cullen Rhodes	c4ed601bd9	[AArch64][SVE2] Asm: support SVE2 Histogram Computation Groups Summary: Patch adds support for the following instructions: SVE2 histogram generation (segment): * HISTSEG SVE2 histogram generation (vector): * HISTCNT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62306 llvm-svn: 361796	2019-05-28 08:51:59 +00:00
Cullen Rhodes	7d9cac5bba	[AArch64][SVE2] Asm: support SVE2 Misc Group Summary: Patch adds support for the following instructions: SVE2 bitwise exclusive-or interleaved: * EORBT, EORTB SVE2 bitwise permute: * BEXT, BDEP, BGRP SVE2 bitwise shift left long: * SSHLLB, SSHLLT, USHLLB, USHLLT SVE2 integer add/subtract interleaved long: * SADDLBT, SSUBLBT, SSUBLTB BDEP, BEXT and BGRP are enabled with SVE2 feature +bitperm, all other instructions in this group are enabled with +sve2. Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62304 llvm-svn: 361795	2019-05-28 08:42:22 +00:00
Craig Topper	ab53c5e5ab	[InlineCost] Fix a couple comments. NFC Replace "unary operator" with "unary instruction" in visitUnaryInstruction since we now have a UnaryOperator class which might needs its own visit function. Fix a copy/paste in visitCastInst that appears to have been copied from visitPtrToInt. llvm-svn: 361794	2019-05-28 07:25:27 +00:00
Craig Topper	50d502826b	[CostModel] Add really basic support for being able to query the cost of the FNeg instruction. Summary: This reuses the getArithmeticInstrCost, but passes dummy values of the second operand flags. The X86 costs are wrong and can be improved in a follow up. I just wanted to stop it from reporting an unknown cost first. Reviewers: RKSimon, spatel, andrew.w.kaylor, cameron.mcinally Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62444 llvm-svn: 361788	2019-05-28 04:09:18 +00:00
Nico Weber	f83c39e53f	llvm-undname: Remove unreachable statement llvm-svn: 361786	2019-05-28 01:20:36 +00:00
Nico Weber	82dc06c340	llvm-undname: Extract demangleMD5Name() method; no behavior change llvm-svn: 361783	2019-05-27 23:10:42 +00:00
Lang Hames	23343c5d90	[RuntimeDyld][ARM] Fix an incorrect assertion condition. Fixes https://llvm.org/PR42036 llvm-svn: 361782	2019-05-27 21:34:31 +00:00
Matt Arsenault	ca84c4be4b	RegAllocFast: Set MayLiveAcrossBlocks when allocating uses Setting mayLiveOut based only on use instructions after allocating the def block did not work if the use block was allocated before the def block, since the virtual register uses were already removed. Fixes bug 41973. llvm-svn: 361781	2019-05-27 20:37:31 +00:00
Sanjay Patel	2f99d009c1	[SelectionDAG] fold concat of extract subvectors This is derived from the related fold for build vectors. We also have a version of this in DAGCombiner. The benefit of having this fold at node creation time is (1) efficiency and (2) preventing infinite looping from creating patterns that should not exist in the first place. Currently, the inf-loop could happen with MergeConsecutiveStores() because it naively creates concat of extracts when forming a wider vector store. That could fight with target-specific store narrowing. llvm-svn: 361780	2019-05-27 20:26:21 +00:00
Sanjay Patel	e13ae3e4d8	[SelectionDAG] fix formatting and redundant comments; NFC There's a possible missing fold here for extracting from the same source vector. It's similar to a check that we use to squash a build vector with all extracted elements from the same source vector. llvm-svn: 361778	2019-05-27 18:26:43 +00:00
Michael Liao	9c70c574b4	[SelectionDAG] Enhance the simplification of `copyto` from `implicit-def`. Summary: - The current implementation simplifies the case where the source of `copyto` is `implicit-def`ed. However, it only works when that `implicit-def` is single-used since it detects that from `implicit-def` and cannot determine which destination vreg should be used if there are multiple uses. - This patch changes that detection when `copyto` is being emitted. If that `copyto`'s source is defined from `implicit-def`, it simplifies it. Hence, it works even that `implicit-def` is multi-used. - Except it simplifies the internal IR, it won't improve the quality of code generation. However, it helps to detect 'implicit-def` in a straight-forward manner in some passes, such as `si-i1-copies`. A test case is added. Reviewers: sunfish, nhaehnle Subscribers: jvesely, hiraditya, asbirlea, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62342 llvm-svn: 361777	2019-05-27 18:26:29 +00:00
Alexander Timofeev	f4040a0dd8	[AMDGPU] Fix for the address sanitizer failure. Fixing typo llvm-svn: 361776	2019-05-27 18:17:21 +00:00
Dmitri Gribenko	5379f1a6c5	Include what you use in AArch64AsmBackend.cpp AArch64AsmBackend.cpp was not using any APIs from AArch64.h, and was only including it for transitive dependencies. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary AArch64 target library and the MCTargetDesc library). llvm-svn: 361774	2019-05-27 17:03:57 +00:00
Simon Pilgrim	ebb053b139	[SelectionDAG] GetDemandedBits - add demanded elements wrapper implementation The DemandedElts variable is pretty much inert at the moment - the original GetDemandedBits implementation calls it with an 'all ones' DemandedElts value so the function is active and behaves exactly as it used to. llvm-svn: 361773	2019-05-27 16:39:25 +00:00
Simon Pilgrim	d99f9373d3	[LLParser] Fix uninitialized flag variable warnings. NFCI. Fixes a large number of warnings in the scan-build report on llvm builds. llvm-svn: 361772	2019-05-27 16:33:15 +00:00
Alexander Timofeev	4a7c4069ae	[AMDGPU] Fix for the address sanitizer failure caused by the ifollowing commit: 1a8b2ea611cf4ca7cb09562e0238cfefa27c05b5 Divergence driven ISel. Assign register class for cross block values according to the divergence. llvm-svn: 361770	2019-05-27 15:03:29 +00:00
Dmitry Preobrazhensky	b79af7930c	[AMDGPU][MC] Enabled constant expressions as operands of s_waitcnt See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61017 llvm-svn: 361763	2019-05-27 14:08:43 +00:00
Xing Xue	3860aad6e7	[MustExecute] Improve MustExecute to correctly handle loop nest Summary: for.outer: br for.inner for.inner: LI <loop invariant load instruction> for.inner.latch: br for.inner, for.outer.latch for.outer.latch: br for.outer, for.outer.exit LI is a loop invariant load instruction that post dominate for.outer, so LI should be able to move out of the loop nest. However, there is a bug in allLoopPathsLeadToBlock(). Current algorithm of allLoopPathsLeadToBlock() 1. get all the transitive predecessors of the basic block LI belongs to (for.inner) ==> for.outer, for.inner.latch 2. if any successors of any of the predecessors are not for.inner or for.inner's predecessors, then return false 3. return true Although for.inner.latch is for.inner's predecessor, but for.inner dominates for.inner.latch, which means if for.inner.latch is ever executed, for.inner should be as well. It should not return false for cases like this. Author: Whitney (committed by xingxue) Reviewers: kbarton, jdoerfert, Meinersbur, hfinkel, fhahn Reviewed By: jdoerfert Subscribers: hiraditya, jsji, llvm-commits, etiotto, bmahjour Tags: #LLVM Differential Revision: https://reviews.llvm.org/D62418 llvm-svn: 361762	2019-05-27 13:57:28 +00:00
Nikola Prica	441ad62531	Test commit (NFC) Add blank line. llvm-svn: 361761	2019-05-27 13:51:30 +00:00
Diana Picus	68b20c589c	[ARM GlobalISel] Cleanup CallLowering a bit We never actually use the Offsets produced by ComputeValueVTs, so remove them until we need them. llvm-svn: 361755	2019-05-27 10:30:33 +00:00
David L. Jones	0ff41b8a5a	Revert r361356: "[MIR] Add simple PRE pass to MachineCSE" This is problematic on buildbots, as discussed here: https://reviews.llvm.org/rL361356 It seems like the plan already was to revert, but that hasn't happened yet. llvm-svn: 361746	2019-05-27 06:00:00 +00:00
Nico Weber	cfe08bc7d6	llvm-undname: Make demangling of MD5 names more robust Demangler::parse() for MD5 names would: 1. Put all remaining text into the MD5 name sight unseen 2. Not modify MangledName This meant that if the demangler recursively called parse() (e.g. in demangleLocallyScopedNamePiece()), every recursive call that started on an MD5 name would add all remaining bytes to the output buffer but only advance the input by a byte. For valid inputs, MD5 types are never (well, see comments for 2 exceptions) nested, but for invalid input this could cause memory use quadratic in the input size. llvm-svn: 361744	2019-05-27 00:48:59 +00:00
Florian Hahn	11b2f4fe50	[LoopInterchange] Fix handling of LCSSA nodes defined in headers and latches. The code to preserve LCSSA PHIs currently only properly supports reduction PHIs and PHIs for values defined outside the latches. This patch improves the LCSSA PHI handling to cover PHIs for values defined in the latches. Fixes PR41725. Reviewers: efriedma, mcrosier, davide, jdoerfert Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D61576 llvm-svn: 361743	2019-05-26 23:38:25 +00:00
Yonghong Song	e698958ad8	[BPF] generate R_BPF_NONE relocation for BTF DataSec variables The variables in BTF DataSec type encode in-section offset. R_BPF_NONE should be generated instead of R_BPF_64_32. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D62460 llvm-svn: 361742	2019-05-26 21:26:06 +00:00
Alexander Timofeev	ba447bae74	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 This commit was reverted because of the build failure. The reason was mlformed patch. Build failure fixed. llvm-svn: 361741	2019-05-26 20:33:26 +00:00
Andrea Di Biagio	c2493ce4a4	[MCA][Scheduler] Improved critical memory dependency computation. This fixes a problem where back-pressure increases caused by register dependencies were not correctly notified if execution was also delayed by memory dependencies. llvm-svn: 361740	2019-05-26 19:50:31 +00:00
Simon Pilgrim	06e02856ab	[SelectionDAG] GetDemandedBits - cleanup to more closely match SimplifyDemandedBits. NFCI. Prep work before adding demanded elts support. llvm-svn: 361739	2019-05-26 18:58:14 +00:00
Simon Pilgrim	2916b9e28c	[SelectionDAG] MaskedValueIsZero - add demanded elements implementation Will be used in an upcoming patch but I've updated the original implementation to call this to ensure test coverage. llvm-svn: 361738	2019-05-26 18:43:44 +00:00
Andrea Di Biagio	a549dd2560	[MCA] Refactor the logic that computes the critical memory dependency info. NFCI CriticalRegDep has been renamed CriticalDependency, and it is now used by class Instruction to store information about the critical register dependency and the critical memory dependency. No functional change intendend. llvm-svn: 361737	2019-05-26 18:41:35 +00:00
Shawn Landden	343578759e	[SimplifyCFG] back out all SwitchInst commits They caused the sanitizer builds to fail. My suspicion is the change the countLeadingZeros(). llvm-svn: 361736	2019-05-26 18:15:51 +00:00
Simon Pilgrim	a044410f37	[X86][SSE] Add shuffle combining support for ISD::ANY_EXTEND_VECTOR_INREG Reuses what we already have in place for ISD::ZERO_EXTEND_VECTOR_INREG just with a different sentinel llvm-svn: 361734	2019-05-26 16:00:35 +00:00
Simon Pilgrim	e434368a67	Revert rL361731 : [LLParser] Fix uninitialized variable warnings. NFCI. These 3 variables cause quite a few warnings in the scan-build report on llvm. ........ Revert accidental commit. llvm-svn: 361732	2019-05-26 15:08:45 +00:00
Simon Pilgrim	aabe7781a5	[LLParser] Fix uninitialized variable warnings. NFCI. These 3 variables cause quite a few warnings in the scan-build report on llvm. llvm-svn: 361731	2019-05-26 15:05:12 +00:00
Sanjay Patel	9317963920	[InstCombine] prevent crashing with invalid extractelement index This was found/reduced from a fuzzer report: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14956 llvm-svn: 361729	2019-05-26 14:03:50 +00:00
Shawn Landden	fa91ab85d9	[SimplifyCFG] ReduceSwitchRange: Improve on the case where the SubThreshold doesn't trigger llvm-svn: 361728	2019-05-26 13:55:52 +00:00
Shawn Landden	30111c786f	[SimplifyCFG] Run ReduceSwitchRange unconditionally, generalize Rather than gating on "isSwitchDense" (resulting in necessesarily sparse lookup tables even when they were generated), always run this quite cheap transform. This transform is useful not just for generating tables. LowerSwitch also wants this: read LowerSwitch.cpp:257. Be careful to not generate worse code, by introducing a SubThreshold heuristic. Instead of just sorting by signed, generalize the finding of the best base. And now that it is run unconditionally, do not replicate its functionality in SwitchToLookupTable (which could use a Sub when having a hole is smaller, hence the SubThreshold heuristic located in a single place). This simplifies SwitchToLookupTable, and fixes some ugly corner cases due to the use of signed numbers, such as a table containing i16 32768 and 32769, of which 32769 would be interpreted as -32768, and now the code thinks the table is size 65536. (We still use unconditional subtraction when building a single-register mask, but I think this whole block should go when the more general sparse map is added, which doesn't leave empty holes in the table.) And the reason test4 and test5 did not trigger was documented wrong: it was because they were not considered sufficiently "dense". Also, fix generation of invalid LLVM-IR: shl by bit-width. llvm-svn: 361727	2019-05-26 13:55:14 +00:00
Shawn Landden	444eaaf1cc	[SimpligyCFG] NFC, remove GCD that was only used for powers of two and replace with an equilivent countTrailingZeros. GCD is much more expensive than this, with repeated division. This depends on D60823 llvm-svn: 361726	2019-05-26 13:54:04 +00:00
Shawn Landden	b7cc093db2	[Support] make countLeadingZeros() and countTrailingZeros() return unsigned This matches countLeadingOnes() and countTrailingOnes(), and APInt's countLeadingZeros() and countTrailingZeros(). (as well as __builtin_clzll()) llvm-svn: 361724	2019-05-26 13:49:58 +00:00
Nikita Popov	d0f13e618f	[ValueTracking] Base computeOverflowForUnsignedMul() on ConstantRange code; NFCI The implementation in ValueTracking and ConstantRange are equally powerful, reuse the one in ConstantRange, which will make this easier to extend. llvm-svn: 361723	2019-05-26 13:22:01 +00:00
Nikita Popov	39f2bebf41	[InstCombine] Refactor OptimizeOverflowCheck; NFCI Extract method to compute overflow based on binop and signedness, and then make the result handling code generic. This extends the always-overflow handling to signed muls, but has currently no effect, as we don't compute always overflow for them (thus NFC). llvm-svn: 361721	2019-05-26 11:43:37 +00:00
Nikita Popov	352f598795	[InstCombine] Remove OverflowCheckFlavor; NFC Instead pass binary op and signedness. The extra enum only makes things more complicated in this case. llvm-svn: 361720	2019-05-26 11:43:31 +00:00
David Green	0dbafe191e	[ARM] Select fp16 fma This adds a pattern for fma, similar to the float and double patterns. Differential Revision: https://reviews.llvm.org/D62330 llvm-svn: 361719	2019-05-26 11:34:30 +00:00
David Green	21542cd6f4	[ARM] Select a number of fp16 rounding functions This add patterns for fp16 round and ceil etc. Same as the float and double patterns. Differential Revision: https://reviews.llvm.org/D62326 llvm-svn: 361718	2019-05-26 11:13:00 +00:00
David Green	c9f4b7d201	[ARM] Promote various fp16 math intrinsics Promote a number of fp16 math intrinsics to float, so that the relevant float math routines can be used. Copysign is expanded so as to be handled in-place. Differential Revision: https://reviews.llvm.org/D62325 llvm-svn: 361717	2019-05-26 10:59:21 +00:00
Simon Pilgrim	58a8541dcc	[X86][AVX] combineBitcastvxi1 - peek through bitops to determine size of original vector We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well. There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well. llvm-svn: 361716	2019-05-26 10:54:23 +00:00
David Green	2881325b17	[ARM] Select fp16 fabs This adds a pattern for the fabs intrinsic, the same as float and double. Differential Revision: https://reviews.llvm.org/D62324 llvm-svn: 361715	2019-05-26 10:51:58 +00:00
David Green	aeade651f3	[ARM] Select fp16 fsqrt This adds a pattern for the sqrt intrinsic, the same as float and double. Differential Revision: https://reviews.llvm.org/D62322 llvm-svn: 361714	2019-05-26 10:42:24 +00:00
David Green	caf8a11b65	[ARM] Promote fp16 frem Promote fp16 frem operations on ARM to floats so they call fmodf. Differential Revision: https://reviews.llvm.org/D62321 llvm-svn: 361713	2019-05-26 10:30:22 +00:00
David Bolvansky	0290a77aa8	[SimplifyCFG] Added condition assumption for unreachable blocks Summary: PR41688 Reviewers: spatel, efriedma, craig.topper, hfinkel, reames Reviewed By: hfinkel Subscribers: javed.absar, dmgreen, fhahn, hfinkel, reames, nikic, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61409 llvm-svn: 361707	2019-05-25 22:34:27 +00:00
Simon Pilgrim	40fa52b174	[X86] lowerBuildVectorToBitOp - support build_vector(shift()) -> shift(build_vector(),C) Commonly occurs in sign-extension cases llvm-svn: 361706	2019-05-25 18:02:17 +00:00
Robert Widmann	b0fd12b689	[LLVM-C] Add Accessor for Mach-O Universal Binary Slices Summary: Allow for retrieving an object file corresponding to an architecture-specific slice in a Mach-O universal binary file. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60378 llvm-svn: 361705	2019-05-25 16:47:27 +00:00
Nikita Popov	d87eceda0e	[X86] Combine fminnum/fmaxnum with non-nan operand to fmin/fmax If we have a known non-nan operand, place it in the second operand of fmin/fmax that is returned if either operand is nan. Differential Revision: https://reviews.llvm.org/D62448 llvm-svn: 361704	2019-05-25 16:44:29 +00:00
Nikita Popov	6bb5041e94	[LVI][CVP] Add support for saturating add/sub Adds support for the uadd.sat family of intrinsics in LVI, based on ConstantRange methods from D60946. Differential Revision: https://reviews.llvm.org/D62447 llvm-svn: 361703	2019-05-25 16:44:14 +00:00
Nikita Popov	8b1fa07639	[CVP] Remove unnecessary checks for empty GNWR; NFC The guaranteed no-wrap region is never empty, it always contains at least zero, so these optimizations don't ever apply. To make this more obviously true, replace the conversative return in makeGNWR with an assertion. llvm-svn: 361698	2019-05-25 14:11:55 +00:00
Sanjay Patel	91131b6500	[SelectionDAG] soften assertion when legalizing narrow vector FP ops The test based on PR42010: https://bugs.llvm.org/show_bug.cgi?id=42010 ...may show an inaccuracy for PPC's target defs, but we should not be so aggressive with an assert here. There's no telling what out-of-tree targets look like. llvm-svn: 361696	2019-05-25 13:48:07 +00:00
Nikita Popov	024b18aca7	[LVI][CVP] Calculate with.overflow result range In LVI, calculate the range of extractvalue(op.with.overflow(%x, %y), 0) as the range of op(%x, %y). This is mainly useful in conjunction with D60650: If the result of the operation is extracted in a branch guarded against overflow, then the value of %x will be appropriately constrained and the result range of the operation will be calculated taking that into account. Differential Revision: https://reviews.llvm.org/D60656 llvm-svn: 361693	2019-05-25 09:53:45 +00:00
Nikita Popov	17367b0d89	[LVI] Extract helper for binary range calculations; NFC llvm-svn: 361692	2019-05-25 09:53:37 +00:00
Craig Topper	46e5052b8e	[X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. Support LEA64_32r properly. INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags. This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg. One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input. Differential Revision: https://reviews.llvm.org/D61472 llvm-svn: 361691	2019-05-25 06:17:47 +00:00
Craig Topper	4b08fcdeb1	[X86] Add zero idioms to the haswell, broadwell, and skylake schedule models. Add 256-bit fp xor to sandybridge zero idioms This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate. Differential Revision: https://reviews.llvm.org/D62360 llvm-svn: 361690	2019-05-25 04:47:49 +00:00
Peter Collingbourne	3b93737446	Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence." Broke sanitizer bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio llvm-svn: 361688	2019-05-25 01:52:38 +00:00
David Blaikie	a17564c2f1	llvm-dwarfdump: Don't error on mixed units using/not using str_offsets This lead to errors when dumping binaries with v4 and v5 units linked together (but could've also errored on v5 units that did/didn't use str_offsets). Also improves error handling and messages around invalid str_offsets contributions. llvm-svn: 361683	2019-05-25 00:07:22 +00:00
Jessica Paquette	97d668d70f	[GlobalISel][AArch64] Make FP constraint checks consider possible use/def banks In a few places in getInstrMapping, we check if use/def instructions for the instruction we're mapping have floating point constraints. We can improve this check and reduce the number of copies in GISel-compiled code if we make a couple observations: - For a def instruction, it only matters if the def instruction must always output a value stored on a FPR - For a use instruction, it only matters if the use instruction must always only take in values stored in FPRs This adds two new functions: - onlyUsesFP - onlyDefinesFP Then we can use those when we're checking the uses/defs instead. Without this patch, the load, unmerge, store, and select in the added test would have unnecessary copies. Differential Revision: https://reviews.llvm.org/D62426 llvm-svn: 361679	2019-05-24 23:08:45 +00:00
Jessica Paquette	bede937b16	[GlobalISel][AArch64] NFC: Factor out HasFPConstraints into a proper function Factor it out into a function, and replace places where we had the same check with the new function. Differential Revision: https://reviews.llvm.org/D62421 llvm-svn: 361677	2019-05-24 22:12:21 +00:00
Jonas Devlieghere	0da8160df3	[dwarfdump] Add flag to limit the number of parents DIEs This adds `-parent-recurse-depth` which limits the number of parent DIEs being dumped. Differential revision: https://reviews.llvm.org/D62359 llvm-svn: 361671	2019-05-24 21:11:28 +00:00
Jason Liu	8e1d921bb3	Implement call lowering without parameters on AIX Summary:dd This patch implements call lowering for calls without parameters on AIX as initial support. Reviewers: sfertile, hubert.reinterpretcast, aheejin, efriedma Differential Revision: https://reviews.llvm.org/D61948 llvm-svn: 361669	2019-05-24 20:54:35 +00:00
Jessica Paquette	56503865ed	[GlobalISel][AArch64] Improve register bank mappings for G_SELECT The fcsel and csel instructions differ in only the register banks they work on. So, they're entirely interchangeable otherwise. With this in mind, this does two things: - Teach AArch64RegisterBankInfo to consider the inputs to G_SELECT as well as the outputs. - Teach it to choose the best register bank mapping based off the constraints of the inputs and outputs. The "best" in this case means the one that requires the smallest number of copies to properly emit a fcsel/csel. For example, if the inputs are all already going to be on FPRs, we should emit a fcsel, even if the output is a GPR. This costs one copy to produce the result, but saves us from copying the inputs into GPRs. Also update the regbank-select.mir to check that we end up with the right select instruction. Differential Revision: https://reviews.llvm.org/D62267 llvm-svn: 361665	2019-05-24 19:35:25 +00:00
Nick Desaulniers	33bc64202b	[AArch64] check for INLINEASM_BR along w/ INLINEASM Summary: It looks like since INLINEASM_BR was created off of INLINEASM, a few checks for INLINEASM needed to be updated to check for either case. pr/41999 Reviewers: t.p.northover, peter.smith Reviewed By: peter.smith Subscribers: craig.topper, javed.absar, kristof.beyls, hiraditya, llvm-commits, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62402 llvm-svn: 361661	2019-05-24 19:00:13 +00:00
Nick Desaulniers	9f7bd71cf5	[ARM] additionally check for ARM::INLINEASM_BR w/ ARM::INLINEASM Summary: We were observing failures for arm32 allyesconfigs of the Linux kernel with the asm goto Clang patch, where ldr's were being generated to offsets too far away to encode in imm12. It looks like since INLINEASM_BR was created off of INLINEASM, a few checks for INLINEASM needed to be updated to check for either case. pr/41999 Link: https://github.com/ClangBuiltLinux/linux/issues/490 Reviewers: peter.smith, kristof.beyls, ostannard, rengolin, t.p.northover Reviewed By: peter.smith Subscribers: jyu2, javed.absar, hiraditya, llvm-commits, nathanchance, craig.topper, kees, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62400 llvm-svn: 361659	2019-05-24 18:58:21 +00:00
Matt Arsenault	3d59e388ca	AMDGPU: Activate all lanes when spilling CSR VGPR for SGPR spills If some lanes weren't active on entry to the function, this could clobber their VGPR values. llvm-svn: 361655	2019-05-24 18:18:51 +00:00
Matt Arsenault	0ff901fba0	AMDGPU: Boost inline threshold with addrspacecasted alloca arguments This was skipping GetUnderlyingObject for nonprivate addresses, but an alloca could also be found through an addrspacecast if it's flat. llvm-svn: 361649	2019-05-24 16:52:35 +00:00
Alexander Timofeev	dffedea014	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 llvm-svn: 361644	2019-05-24 15:32:18 +00:00
Stefan Pintilie	522307fa40	[PowerPC] Remove CRBits Copy Of Unset/set CBit For the situation, where we generate the following code: crxor 8, 8, 8 < Some instructions> .LBB0_1: < Some instructions> cror 1, 8, 8 cror (COPY of CRbit) depends on the result of the crxor instruction. CR8 is known to be zero as crxor is equivalent to CRUNSET. We can simply use crxor 1, 1, 1 instead to zero out CR1, which does not have any dependency on any previous instruction. This patch will optimize it to: < Some instructions> .LBB0_1: < Some instructions> cror 1, 1, 1 Patch By: Victor Huang (NeHuang) Differential Revision: https://reviews.llvm.org/D62044 llvm-svn: 361632	2019-05-24 12:05:37 +00:00
Cullen Rhodes	b3e58df80c	[AArch64][SVE2] Asm: support SVE2 String Processing Group Summary: Patch adds support for the SVE2 character match instructions MATCH and NMATCH. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62206 llvm-svn: 361627	2019-05-24 10:32:01 +00:00
Cullen Rhodes	adb1d74bf9	[AArch64][SVE2] Asm: support SVE2 Narrowing Group Summary: Patch adds support for the following instructions: SVE2 bitwise shift right narrow: * SQSHRUNB, SQSHRUNT, SQRSHRUNB, SQRSHRUNT, SHRNB, SHRNT, RSHRNB, RSHRNT, SQSHRNB, SQSHRNT, SQRSHRNB, SQRSHRNT, UQSHRNB, UQSHRNT, UQRSHRNB, UQRSHRNT SVE2 integer add/subtract narrow high part: * ADDHNB, ADDHNT, RADDHNB, RADDHNT, SUBHNB, SUBHNT, RSUBHNB, RSUBHNT SVE2 saturating extract narrow: * SQXTNB, SQXTNT, UQXTNB, UQXTNT, SQXTUNB, SQXTUNT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62205 llvm-svn: 361624	2019-05-24 10:22:30 +00:00
Cullen Rhodes	5f04f00282	[AArch64][SVE2] Asm: support SVE2 Accumulate Group Summary: Patch adds support for the following instructions: SVE2 bitwise shift and insert: * SRI, SLI SVE2 bitwise shift right and accumulate: * SSRA, USRA, SRSRA, URSRA SVE2 complex integer add: * CADD, SQCADD SVE2 integer absolute difference and accumulate: * SABA, UABA SVE2 integer absolute difference and accumulate long: * SABALB, SABALT, UABALB, UABALT SVE2 integer add/subtract long with carry: * ADCLB, ADCLT, SBCLB, SBCLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62204 llvm-svn: 361622	2019-05-24 10:10:34 +00:00
Simon Pilgrim	95b8d9bbf8	[SelectionDAG] computeKnownBits - support constant pool values from target This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets. computeKnownBits then uses this function to improve codegen, notably vector code after legalization. A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit. This required a couple of fixes: * SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR). * Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets. Differential Revision: https://reviews.llvm.org/D61887 llvm-svn: 361620	2019-05-24 10:03:11 +00:00
Cullen Rhodes	980f760515	[AArch64][SVE2] Asm: add PMULLB/PMULLT instructions Summary: This patch adds support for the polynomial multiplication instructions PMULLB/PMULLT. The 64-bit source and 128-bit destination element variants are enabled with crypto extensions (+sve2-aes), similar to the NEON PMULL2 instruction. All other variants are enabled with +sve2. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62145 llvm-svn: 361619	2019-05-24 09:56:23 +00:00
Cullen Rhodes	8bcea9daaa	[AArch64][SVE2] Asm: add integer add/sub long/wide instructions Summary: Patch adds support for the following instructions: SVE2 integer add/subtract long: * SADDLB, SADDLT, UADDLB, UADDLT, SSUBLB, SSUBLT, USUBLB, USUBLT, SABDLB, SABDLT, UABDLB, UABDLT SVE2 integer add/subtract wide: * SADDWB, SADDWT, UADDWB, UADDWT, SSUBWB, SSUBWT, USUBWB, USUBWT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62142 llvm-svn: 361615	2019-05-24 09:28:27 +00:00
Bjorn Pettersson	b4771425f5	Use the DataLayout::typeSizeEqualsStoreSize helper. NFC Just a minor refactoring to use the new helper method DataLayout::typeSizeEqualsStoreSize(). This is done when checking if getTypeSizeInBits is equal/non-equal to getTypeStoreSizeInBits. llvm-svn: 361613	2019-05-24 09:20:20 +00:00
Cullen Rhodes	968cb0e049	[AArch64][SVE2] Asm: add various bitwise shift instructions Summary: This patch adds support for the SVE2 saturating/rounding bitwise shift left (predicated) group of instructions: * SRSHL, URSHL, SRSHLR, URSHLR, SQSHL, UQSHL, SQRSHL, UQRSHL, SQSHLR, UQSHLR, SQRSHLR, UQRSHLR Immediate forms of the SQSHL and UQSHL instructions are also added to the existing SVE bitwise shift by immediate (predicated) group, as well as three new instructions SRSHR/URSHR/SQSHLU. The new instructions in this group are encoded similarly and are implemented using the same TableGen class with a minimal change (1 bit in encoding). The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62140 llvm-svn: 361612	2019-05-24 09:17:23 +00:00
Cullen Rhodes	6bca64fe5e	[AArch64][SVE2] Asm: add saturating add/sub instructions Summary: Patch adds support for the following instructions: * SQADD, UQADD, SUQADD, USQADD * SQSUB, UQSUB, SQSUBR, UQSUBR The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62130 llvm-svn: 361611	2019-05-24 09:06:37 +00:00
Neil Henning	119c31ad93	StructurizeCFG: Relax uniformity checks. This change relaxes the checks for hasOnlyUniformBranches such that our region is uniform if: 1. All conditional branches that are direct children are uniform. 2. And either: a. All sub-regions are uniform. b. There is one or less conditional branches among the direct children. Differential Revision: https://reviews.llvm.org/D62198 llvm-svn: 361610	2019-05-24 08:59:17 +00:00
Cullen Rhodes	d9bb7b69ab	[AArch64][SVE2] Asm: fix overlapping bit Summary: Bit 20 in sve2_int_arith_pred TableGen class was overlapping. The encodings are not affected as bit 20 is defined by the opc bits and this was overwriting the earlier error of setting bit 20 to 0. Raised by Momchil: https://reviews.llvm.org/D62130 Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62292 llvm-svn: 361609	2019-05-24 08:45:37 +00:00
Tim Northover	3b2157aeed	GlobalISel: support swifterror attribute on AArch64. swifterror marks an argument as a register pretending to be a pointer, so we need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the infrastructure can be reused from the DAG world. llvm-svn: 361608	2019-05-24 08:40:13 +00:00
Tim Northover	3d7a057b0d	CodeGen: factor out swifterror value tracking. llvm-svn: 361607	2019-05-24 08:39:43 +00:00
Simon Atanasyan	c1b482f2a5	[mips] Always check that `shift and add` optimization is efficient. The D45316 introduced the `shouldTransformMulToShiftsAddsSubs` function to check that breaking down constant multiplications into a series of shifts, adds, and subs is efficient. Unfortunately, this function does not check maximum number of steps on all paths of the algorithm. This patch fixes this bug. Fix for PR41929. Differential Revision: https://reviews.llvm.org/D62166 llvm-svn: 361606	2019-05-24 08:39:40 +00:00
Bjorn Pettersson	d63a2bb35f	[DSE] Bugfix to avoid PartialStoreMerging involving non byte-sized stores Summary: The DeadStoreElimination pass now skips doing PartialStoreMerging when stores overlap according to OW_PartialEarlierWithFullLater and at least one of the stores is having a store size that is different from the size of the type being stored. This solves problems seen in https://bugs.llvm.org/show_bug.cgi?id=41949 for which we in the past could end up with mis-compiles or assertions. The content and location of the padding bits is not formally described (or undefined) in the LangRef at the moment. So the solution is chosen based on that we cannot assume anything about the padding bits when having a store that clobbers more memory than indicated by the type of the value that is stored (such as storing an i6 using an 8-bit store instruction). Fixes: https://bugs.llvm.org/show_bug.cgi?id=41949 Reviewers: spatel, efriedma, fhahn Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62250 llvm-svn: 361605	2019-05-24 08:32:02 +00:00
Sjoerd Meijer	937af54666	[ARM] ARMExpandPseudoInsts: add debug messages This pass wasn't printing any messages at all, which I find really inconvenient while debugging/tracing things. It now dumps the before and after of expanded instructions. It doesn't do this yet for all instructions, but this is a good start I guess. Differential Revision: https://reviews.llvm.org/D62297 llvm-svn: 361604	2019-05-24 08:25:02 +00:00
QingShan Zhang	449bfdd1b0	[Power9] Add a specific heuristic to schedule the addi before the load When we are scheduling the load and addi, if all other heuristic didn't take effect, we will try to schedule the addi before the load, to hide the latency, and avoid the true dependency added by RA. And this only take effects for Power9. Differential Revision: https://reviews.llvm.org/D61930 llvm-svn: 361600	2019-05-24 05:30:09 +00:00
Yevgeny Rouban	c652b3455e	[NFC] SwitchInst: Introduce wrapper for prof branch_weights handling This patch introduces a wrapper class that re-implements several mutator methods of SwitchInst to handle changes of prof branch_weights metadata along with remove/add switch case methods. Subsequent patches will use this wrapper to implement prof branch_weights metadata handling for SwitchInst. Reviewers: davidx, eraman, reames, chandlerc Reviewed By: davidx Differential Revision: https://reviews.llvm.org/D62122 llvm-svn: 361596	2019-05-24 04:34:23 +00:00
David Blaikie	fc302c2b7f	dwarfdump: Deterministically... determine whether parsing a DWARF32 or DWARF64 str_offsets header Rather than trying one and then the other - use the kind of the CU to select which kind of header to parse. llvm-svn: 361589	2019-05-24 01:41:58 +00:00
Reid Kleckner	b7a78c7dff	[AArch64] Preserve X8 for thunks ending in variadic musttail calls Summary: On Windows, X8 may be used to pass in the address of an aggregate that is returned indirectly. Therefore, it should be forwarded to variadic musttail calls and preserved in thunks. Fixes PR41997 Reviewers: mgrang, efriedma Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62344 llvm-svn: 361585	2019-05-24 01:27:20 +00:00
Serge Pavlov	ed595e8627	[AArch64] Add nvcast patterns for v2f32 -> v1f64 Summary: Constant stores of f32 values can create such NvCast nodes. Reviewers: t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62285 llvm-svn: 361584	2019-05-24 01:20:34 +00:00
David Blaikie	79872a88a0	dwarfdump: Add a bit more DWARF64 support This test case was incorrect because it mixed DWARF32 and DWARF64 for a single unit (DWARF32 unit referencing a DWARF64 str_offsets section). So fix enough of the unit parsing for DWARF64 and make the test valid. (not sure if anyone needs DWARF64 support though - support in libDebugInfoDWARF has been added piecemeal and LLVM doesn't produce it at all) llvm-svn: 361582	2019-05-24 01:05:52 +00:00
Eli Friedman	052f87ae36	Revert r361460 It regresses https://bugs.llvm.org/show_bug.cgi?id=38309 (represented by the testcase test/Transforms/GlobalOpt/globalsra-multigep.ll). llvm-svn: 361581	2019-05-24 01:03:51 +00:00
Thomas Lively	55229f6b10	[WebAssembly] Expand more SIMD float ops Summary: These were previously causing ISel failures. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62354 llvm-svn: 361577	2019-05-24 00:15:04 +00:00
Sanjay Patel	8869a98e82	[InstSimplify] fold insertelement-of-extractelement This was partly handled in InstCombine (only the constant index case), so delete that and zap it more generally in InstSimplify. llvm-svn: 361576	2019-05-24 00:13:58 +00:00
Sanjay Patel	093c922205	[InstCombine] remove redundant fold for extractelement; NFC The out-of-bounds index pattern is handled by InstSimplify, so the extractelement should be eliminated next time it is visited. llvm-svn: 361570	2019-05-23 23:33:38 +00:00
Sanjay Patel	4d4df6f144	[InstCombine] remove redundant fold for insertelement; NFC The out-of-bounds index pattern is handled by InstSimplify. llvm-svn: 361569	2019-05-23 23:33:34 +00:00
Alina Sbirlea	d82ddfa7c3	[NewPassManager] Add tuning option: ForgetAllSCEVInLoopUnroll [NFC]. Summary: Mirror tuning option from old pass manager in new pass manager. Reviewers: chandlerc Subscribers: mehdi_amini, jlebar, zzheng, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61612 llvm-svn: 361560	2019-05-23 21:52:59 +00:00
Sanjay Patel	e60cb7d1be	[InstSimplify] insertelement V, undef, ? --> V This was part of InstCombine, but it's better placed in InstSimplify. InstCombine also had an unreachable but weaker fold for insertelement with undef index, so that is deleted. llvm-svn: 361559	2019-05-23 21:49:47 +00:00
Kit Barton	987fdfd9a7	Revert [LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, induction variable, and guard branch. This reverts r361517 (git commit `2049e4dd8f`) llvm-svn: 361553	2019-05-23 20:53:05 +00:00
Sanjay Patel	7d6c0bce50	[DAGCombiner] make folds of binops safe for opcodes that produce >1 value This is no-functional-change-intended currently because the definition of isBinOp() only includes opcodes that produce 1 value. But if we share that implementation with isCommutativeBinOp() as proposed in D62191, then we need to make sure that the callers bail out for opcodes that they are not prepared to handle correctly. llvm-svn: 361547	2019-05-23 20:17:25 +00:00
Matt Arsenault	5c714cbdd8	AMDGPU: Correct maximum possible private allocation size We were assuming a much larger possible per-wave visible stack allocation than is possible: `faa3ae5138/src/core/runtime/amd_gpu_agent.cpp (L70)` Based on this, we can assume the high 15 bits of a frame index or sret are 0. The frame index value is the per-lane offset, so the maximum frame index value is MAX_WAVE_SCRATCH / wavesize. Remove the corresponding subtarget feature and option that made this configurable. llvm-svn: 361541	2019-05-23 19:38:14 +00:00
Alina Sbirlea	e4b27869c6	[NewPassManager] Add tuning option: LoopUnrolling [NFC]. Summary: Mirror tuning option from old pass manager in new pass manager. Reviewers: chandlerc Subscribers: jlebar, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61618 llvm-svn: 361540	2019-05-23 19:35:40 +00:00
Alina Sbirlea	63729b0c49	[SLPVectorizer] Set flag to previous default. Summary: The refactoring in r360276 moved the `RunSLPVectorization` flag and added the default explicitly. The default should have been `false`, as before. The new pass manager used to have SLPVectorization on by default, now it's off in opt, and needs D61617 checked in to enable it in clang. Reviewers: chandlerc Subscribers: mehdi_amini, jlebar, eraman, steven_wu, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61955 llvm-svn: 361537	2019-05-23 19:07:41 +00:00
Sanjay Patel	3249be1e03	[InstCombine] be more careful when transforming a shuffle mask This is reduced from a fuzzer test: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14890 Usually, demanded elements should be able to simplify shuffle mask elements that are pointing to undef elements of its source operands, but that doesn't happen in the test case. llvm-svn: 361533	2019-05-23 18:46:03 +00:00
Robert Lougher	170dfeb2ff	Resubmit r360436 "[X86] Avoid SFB - Fix inconsistent codegen with/without debug info" Fixes https://bugs.llvm.org/show_bug.cgi?id=40969 The functions findPotentiallyBlockedCopies and buildCopy are currently not accounting for the presence of debug instructions. In the former this results in the optimization not being trigerred, and in the latter results in inconsistent codegen. This patch enables the optimization to be performed in a debug build and ensures the codegen is consistent with non-debug builds. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61680 llvm-svn: 361527	2019-05-23 18:15:12 +00:00
Thomas Lively	e18b5c6237	[WebAssembly] Implement ReplaceNodeResults to fix a SIMD crash Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61037 llvm-svn: 361526	2019-05-23 18:09:26 +00:00
Matt Arsenault	0f3ba44b57	AMDGPU/GlobalISel: Legality for integer min/max llvm-svn: 361519	2019-05-23 17:58:48 +00:00
Kit Barton	2049e4dd8f	[LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, induction variable, and guard branch. Summary: This PR extends the loop object with more utilities to get loop bounds, step, induction variable, and guard branch. There already exists passes which try to obtain the loop induction variable in their own pass, e.g. loop interchange. It would be useful to have a common area to get these information. Moreover, loop fusion (https://reviews.llvm.org/D55851) is planning to use getGuard() to extend the kind of loops it is able to fuse, e.g. rotated loop with non-constant upper bound, which would have a loop guard. /// Example: /// for (int i = lb; i < ub; i+=step) /// <loop body> /// --- pseudo LLVMIR --- /// beforeloop: /// guardcmp = (lb < ub) /// if (guardcmp) goto preheader; else goto afterloop /// preheader: /// loop: /// i1 = phi[{lb, preheader}, {i2, latch}] /// <loop body> /// i2 = i1 + step /// latch: /// cmp = (i2 < ub) /// if (cmp) goto loop /// exit: /// afterloop: /// /// getBounds /// getInitialIVValue --> lb /// getStepInst --> i2 = i1 + step /// getStepValue --> step /// getFinalIVValue --> ub /// getCanonicalPredicate --> '<' /// getDirection --> Increasing /// getGuard --> if (guardcmp) goto loop; else goto afterloop /// getInductionVariable --> i1 /// getAuxiliaryInductionVariable --> {i1} /// isCanonical --> false Committed on behalf of @Whitney (Whitney Tsang). Reviewers: kbarton, hfinkel, dmgreen, Meinersbur, jdoerfert, syzaara, fhahn Reviewed By: kbarton Subscribers: tvvikram, bmahjour, etiotto, fhahn, jsji, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60565 llvm-svn: 361517	2019-05-23 17:56:35 +00:00
Thomas Lively	eafe8ef6f2	[WebAssembly] Add multivalue and tail-call target features Summary: These features will both be implemented soon, so I thought I would save time by adding the boilerplate for both of them at the same time. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62047 llvm-svn: 361516	2019-05-23 17:26:47 +00:00
Thomas Preud'homme	7b7683d7a6	[FileCheck] Remove llvm:: prefix Summary: Remove all llvm:: prefixes in FileCheck library header and implementation except for calls to make_unique and make_shared since both files already use the llvm namespace. Reviewers: jhenderson, jdenny, probinson, arichardson Subscribers: hiraditya, arichardson, probinson, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62323 llvm-svn: 361515	2019-05-23 17:19:36 +00:00
Saleem Abdulrasool	7bbefb13ee	Transforms: lower fadd and fsub atomicrmw instructions `fadd` and `fsub` have recently (r351850) been added as `atomicrmw` operations. This diff adds lowering cases for them to the LowerAtomic transform. Patch by Josh Berdine! llvm-svn: 361512	2019-05-23 17:03:43 +00:00
Andrea Di Biagio	27b3b5d952	[MCA] Add the ability to compute critical register dependency of an instruction. This patch adds the methods `getCriticalRegDep()` and `computeCriticalRegDep()` to class InstructionBase. The goal is to allow users to obtain information about the critical register dependency that most affects the latency of an instruction. These methods are currently unused. However, the long term plan is to use them in order to allow the computation of a critical-path as part of the bottleneck analysis. So, this is yet another step towards fixing PR37494. llvm-svn: 361509	2019-05-23 16:32:19 +00:00
Shoaib Meenai	87226a7202	[AsmPrinter] Treat a narrowing PtrToInt like Trunc When printing assembly for PtrToInt, AsmPrinter::lowerConstant incorrectly assumed that if PtrToInt was not converting to an int with exactly the same number of bits, it must be widening to a larger int. But this isn't necessarily true; PtrToInt can also shrink the size, which is useful when you want to produce a known 32-bit pointer on a 64-bit platform (on x86_64 ELF this yields a R_X86_64_32 relocation). The old behavior of falling through to the widening case for a narrowing PtrToInt yields bogus assembly code like this, which fails to assemble because the no-op bit and it accidentally creates is not a valid relocation: ``` .long a&-1 ``` The fix is to treat a narrowing PtrToInt exactly the same as it already treats Trunc: just emit the expression and let the assembler deal with truncating it in the appropriate way. Patch by Mat Hostetter <mjh@fb.com>. Differential Revision: https://reviews.llvm.org/D61325 llvm-svn: 361508	2019-05-23 16:29:09 +00:00
Lewis Revill	74927554e2	[RISCV] Support assembling TLS LA pseudo instructions This patch adds the pseudo instructions la.tls.ie and la.tls.gd, used in the initial-exec and global-dynamic TLS models respectively when addressing a global. The pseudo instructions are expanded in the assembly parser. llvm-svn: 361499	2019-05-23 14:46:27 +00:00
Petar Jovanovic	aa28b6d198	[LiveDebugValues] Rename 'DMI' into 'DebugInstr' (NFC) This will improve code readability. Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D62295 llvm-svn: 361497	2019-05-23 13:49:06 +00:00
Andrea Di Biagio	dd0d9e01ee	[MCA] Introduce class LSUnitBase and let LSUnit derive from it. Class LSUnitBase provides a abstract interface for all the concrete LS units in llvm-mca. Methods exposed by the public abstract LSUnitBase interface are: - Status isAvailable(const InstRef&); - void dispatch(const InstRef &); - const InstRef &isReady(const InstRef &); LSUnitBase standardises the API, but not the data structures internally used by LS units. This allows for more flexibility. Previously, only method `isReady()` was declared virtual by class LSUnit. Also, derived classes had to inherit all the internal data members of LSUnit. No functional change intended. llvm-svn: 361496	2019-05-23 13:42:47 +00:00
Clement Courbet	43882b16a3	[MergeICmps] Make the pass compatible with the new pass manager. Reviewers: gchatelet, spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62287 llvm-svn: 361490	2019-05-23 12:35:26 +00:00
Andrea Di Biagio	28afd8dc71	[MCA] Make the bool conversion operator in class InstRef explicit. NFCI This patch makes the bool conversion operator in InstRef explicit. It also adds a operator< to hel comparing InstRef objects in sets. llvm-svn: 361482	2019-05-23 10:50:01 +00:00
Petar Jovanovic	ff47d83e78	[DwarfExpression] Refactor dwarf expression (NFC) Refactor location description kind in order to be easier for extensions (needed for D60866). In addition, cut off some bits from the other class fields. Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D62002 llvm-svn: 361480	2019-05-23 10:37:13 +00:00
Sam Parker	617cdc5a6d	[ARM][CGP] Clear SafeWrap before each search The previous patch added a member set to store instructions that we could allow to wrap. But this wasn't cleared between searches meaning that they could get promoted, incorrectly, during the promotion of a separate valid chain. Differential Revision: https://reviews.llvm.org/D62254 llvm-svn: 361462	2019-05-23 07:46:39 +00:00
Christian Bruel	4a7da98bd9	[GlobalOpt] recognize dead struct fields and propagate values Summary: Allow struct fields SRA and dead stores. This works by considering fields accesses from getElementPtr to be considered as a possible pointer root that can be cleaned up. We check that the variable can be SRA by recursively checking the sub expressions with the new isSafeSubSROAGEP function. basically this allows the array in following C code to be optimized out struct Expr { int a[2]; int b; }; static struct Expr e; int foo (int i) { e.b = 2; e.a[i] = 1; return e.b; } Reviewers: greened, bkramer, nicholas, jmolloy Reviewed By: jmolloy Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61911 llvm-svn: 361460	2019-05-23 05:53:10 +00:00
Thomas Lively	1a3cbe720c	[WebAssembly] Implement __builtin_return_address for emscripten Summary: In this patch, `ISD::RETURNADDR` is lowered on the emscripten target to the new Emscripten runtime function `emscripten_return_address`, which implements the functionality. Patch by Guanzhong Chen Reviewers: tlively, aheejin Reviewed By: tlively Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62210 llvm-svn: 361454	2019-05-23 01:24:01 +00:00
Fangrui Song	86c9ca48c3	[X86] Support -fno-plt __tls_get_addr calls In general dynamic/local dynamic TLS models, with -fno-plt, * x86: emit `calll ___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT` Note, on x86, if we can get rid of %ebx as the PIC register, it may be better to use a register not preserved across function calls. x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT` Reorganize the code by separating 32-bit and 64-bit. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62106 llvm-svn: 361453	2019-05-23 01:05:13 +00:00
Thomas Preud'homme	f3b9bb3d69	[FileCheck] Introduce substitution subclasses Summary: With now a clear distinction between string and numeric substitutions, this patch introduces separate classes to represent them with a parent class implementing the common interface. Diagnostics in printSubstitutions() are also adapted to not require knowing which substitution is being looked at since it does not hinder clarity and makes the implementation simpler. Reviewers: jhenderson, jdenny, probinson, arichardson Subscribers: llvm-commits, probinson, arichardson, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D62241 llvm-svn: 361446	2019-05-23 00:10:29 +00:00
Thomas Preud'homme	1a944d27b2	FileCheck: Improve FileCheck variable terminology Summary: Terminology introduced by [[#]] blocks is confusing and does not integrate well with existing terminology. First, variables referred by [[]] blocks are called "pattern variables" while the text a CHECK directive needs to match is called a "CHECK pattern". This is inconsistent with variables in [[#]] blocks since [[#]] blocks are also found in CHECK pattern yet those variables are called "numeric variable". Second, the replacing of both [[]] and [[#]] blocks by the value of the variable or expression they contain is represented by a FileCheckPatternSubstitution class. The naming refers to being a substitution in a CHECK pattern but could be wrongly understood as being a substitution of a pattern variable. Third and lastly, comments use "numeric expression" to refer both to the [[#]] blocks as well as to the numeric expressions these blocks contain which get evaluated at match time. This patch solves these confusions by - calling variables in [[]] and [[#]] blocks as string and numeric variables respectively; - referring to [[]] and [[#]] as substitution blocks, with the former being a string substitution block and the latter a numeric substitution block; - calling [[]] and [[#]] blocks to be replaced by the value of a variable or expression they contain a substitution (as opposed to definition when these blocks are used to defined a variable), with the former being a string substitution and the latter a numeric substitution; - renaming the FileCheckPatternSubstitution as a FileCheckSubstitution class with FileCheckStringSubstitution and FileCheckNumericSubstitution subclasses; - restricting the use of "numeric expression" to refer to the expression that is evaluated in a numeric substitution. While numeric substitution blocks only support numeric substitutions of numeric expressions at the moment there are plans to augment numeric substitution blocks to support numeric definitions as well as both a numeric definition and numeric substitution in the same numeric substitution block. Reviewers: jhenderson, jdenny, probinson, arichardson Subscribers: hiraditya, arichardson, probinson, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62146 llvm-svn: 361445	2019-05-23 00:10:14 +00:00
Matt Arsenault	b79a25b124	TableGen: Handle nontrivial foreach range bounds This allows using anything that isn't a literal integer as the bounds for a foreach. Some of the diagnostics aren't perfect, but nobody ever accused tablegen of having good errors. For example, the existing wording suggests a bitrange is valid, but as far as I can tell this has never worked. Fixes bug 41958. llvm-svn: 361434	2019-05-22 21:28:20 +00:00
Craig Topper	93f38e1f1a	[X86] Explcitly disable VEXTRACT instruction matching for an immediate of 0. Remove a bunch of isel patterns that become unnecessary. We effectively had a second set of isel patterns that tried to use a regular store instruction and an extract_subreg instruction. Or a masked move and an extract_subreg. These patterns were intended to override the matching of VEXTRACT instructions by taking advantage of the priority of the explicit immediate 0 for the index. This patch instaed just disables the immediate 0 matchin the VEXTRACT patterns. This each of the component pieces of the larger patterns will match by themselves. This found a bug of sorts were we didn't use 128-bit store for 512->128 extract on KNL. Its unclear what the right thing here should be. Using the vextract avoids constraining the register allocator to use xmm0-15. But it always results in a longer encoding if the register allocator ends up choosing xmm0-15 anyway. llvm-svn: 361431	2019-05-22 21:00:18 +00:00
Galina Kistanova	ed49f6d8e6	Reverted r361134 because of a failing test left unattended for a long time. http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/17792/steps/test-check-all/logs/stdio Failing Tests (1): LLVM :: CodeGen/AMDGPU/regbank-reassign.mir llvm-svn: 361430	2019-05-22 20:42:56 +00:00
Craig Topper	9816d55776	[X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics into llvm.ceil/floor. Remove some isel patterns that existed because that was happening. We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond to the libm functions. For the libm functions we need to disable the precision exception so the llvm.floor/ceil functions should always map to encodings 0x9 and 0xA. We had a mix of isel patterns where some used 0x9 and 0xA and others used 0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA. Since we have no way in isel of knowing where the llvm.ceil/floor came from, we can't map X86 specific intrinsics with encodings 1 or 2 to it. We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like to see a use case and optimization advantage first. I've left the backend test cases to show the blend we now emit without the extra isel patterns. But I've removed the InstCombine tests completely. llvm-svn: 361425	2019-05-22 20:04:55 +00:00
Craig Topper	2f1895e03d	[X86] Add more icelake model numbers to getHostCPUName. Using model numbers found in Table 2-1 of the May 2019 version of the Intel Software Developer's Manual Volume 4. llvm-svn: 361422	2019-05-22 19:51:35 +00:00
Alexey Lapshin	53726588f6	[DebugInfo][AArch64] Recognise target specific instruction as mov instr This fix is for the problem from https://bugs.llvm.org/show_bug.cgi?id=38714. Specifically, Simple Register Coalescing creates following conversion : undef %0.sub_32:gpr64 = ORRWrs $wzr, %3:gpr32common, 0, debug-location !24; It copies 32-bit value from gpr32 into gpr64. But Live DEBUG_VALUE analysis is not able to create debug location record for that instruction. So the problem is in that debug info for argc variable is incorrect. The fix is to write custom isCopyInstrImpl() which would recognize the ORRWrs instr. llvm-svn: 361417	2019-05-22 18:48:58 +00:00
Hiroshi Yamauchi	dfeb797455	[PGO][CHR] Speed up following long use-def chains. Summary: Avoid visiting an instruction more than once by using a map. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62262 llvm-svn: 361416	2019-05-22 18:37:34 +00:00
Xing Xue	4246b75295	Disable EHFrameSupport in JITLink/RuntimeDyld on AIX Summary: EH Frames aren't supported on AIX with the system compiler, but the definition of HAVE_EHTABLE_SUPPORT misses this which causes linking problems on AIX. This patch updates the definition of HAVE_EHTABLE_SUPPORT in both JITLink and RuntimeDyld. Author: daltenty Reviewers: sfertile, xingxue, hubert.reinterpretcase Reviewed By: xingxue Subscribers: hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62203 llvm-svn: 361410	2019-05-22 17:41:27 +00:00
Matt Arsenault	418e23e33c	AMDGPU: Move disassembler support check to constructor Don't check for unsupported targets for every instruction. llvm-svn: 361406	2019-05-22 16:28:48 +00:00
Matt Arsenault	ca64ef2043	MC: Allow getMaxInstLength to depend on the subtarget Keep it optional in cases this is ever needed in some global context. Currently it's only used for getting an upper bound inline asm code size. For AMDGPU, gfx10 increases the maximum instruction size to 20-bytes. This avoids penalizing older subtargets when estimating code size, and making some annoying branch relaxation test adjustments. llvm-svn: 361405	2019-05-22 16:28:41 +00:00
Kees Cook	c2187c20a4	[TargetLowering] Extend bool args to inline-asm according to getBooleanType Summary: This extends Krzysztof Parzyszek's X86-specific solution (https://reviews.llvm.org/D60208) to the generic code pointed out by James Y Knight. Reviewers: kparzysz, craig.topper, nickdesaulniers Subscribers: efriedma, sdardis, nemanjai, javed.absar, eraman, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, srhines, void, nickdesaulniers, jyknight Tags: #llvm Differential Revision: https://reviews.llvm.org/D60224 llvm-svn: 361404	2019-05-22 16:16:15 +00:00
Kees Cook	a7a687e500	[TargetLowering] Add blank line (test commit) llvm-svn: 361403	2019-05-22 16:02:13 +00:00
Nico Weber	09fb2029e5	llvm-undname: Fix an assert-on-invalid, found by oss-fuzz If a template parameter refers to a pointer to member, but the mangling of that was a string literal instead of a real symbol, llvm-undname used to crash instead of rejecting the input. llvm-svn: 361402	2019-05-22 15:53:23 +00:00
Sanjay Patel	5a4f7cf2ff	[IR] allow fast-math-flags on select of FP values This is a minimal start to correcting a problem most directly discussed in PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 We have been hacking around a limitation for FP select patterns by using the fast-math-flags on the condition of the select rather than the select itself. This patch just allows FMF to appear with the 'select' opcode. No changes are needed to "FPMathOperator" because it already includes select-of-FP because that definition is based on the (return) value type. Once we have this ability, we can start correcting and adding IR transforms to use the FMF on a 'select' instruction. The instcombine and vectorizer test diffs only show that the IRBuilder change is behaving as expected by applying an FMF guard value to 'select'. For reference: rL241901 - allowed FMF with fcmp rL255555 - allowed FMF with FP calls Differential Revision: https://reviews.llvm.org/D61917 llvm-svn: 361401	2019-05-22 15:50:46 +00:00
Simon Pilgrim	3c05cad03e	LoopVectorizationCostModel::selectInterleaveCount - assert we have a non-zero loop cost. NFCI. The input LoopCost value can be zero, but if so it should be recalculated with the current VF. After that it should always be non-zero. llvm-svn: 361387	2019-05-22 14:18:17 +00:00
Dmitry Preobrazhensky	7773fc478d	[AMDGPU][MC] Corrected parsing of op_sel* and neg_* modifiers See bug 41361: https://bugs.llvm.org/show_bug.cgi?id=41361 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61012 llvm-svn: 361386	2019-05-22 13:59:01 +00:00
Simon Pilgrim	9b40dd6318	[Hexagon] assert getRegisterBitWidth returns non-zero value. NFCI. Fixes scan-build warning. llvm-svn: 361375	2019-05-22 12:25:46 +00:00
Simon Pilgrim	cfe6fe06ab	[VirtualFileSystem] Fix uninitialized variable warning. NFCI. llvm-svn: 361371	2019-05-22 11:20:52 +00:00
Sjoerd Meijer	aa4f1ffca4	[TargetMachine] error message unsupported code model When the tiny code model is requested for a target machine that does not support this, we get an error message (which is nice) but also this diagnostic and request to submit a bug report: fatal error: error in backend: Target does not support the tiny CodeModel [Inferior 2 (process 31509) exited with code 0106] clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation) (gdb) clang version 9.0.0 (http://llvm.org/git/clang.git 29994b0c63a40f9c97c664170244a7bba5ecc15e) (http://llvm.org/git/llvm.git 95606fdf91c2d63a931e865f4b78b2e9828ddc74) Target: arm-arm-none-eabi Thread model: posix clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. clang-9: note: diagnostic msg: ******************** PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: Preprocessed source(s) and associated run script(s) are located at: clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.c clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.sh clang-9: note: diagnostic msg: But this is not a bug, this is a feature. :-) Not only is this not a bug, this is also pretty confusing. This patch causes just to print the fatal error and not the diagnostic: fatal error: error in backend: Target does not support the tiny CodeModel Differential Revision: https://reviews.llvm.org/D62236 llvm-svn: 361370	2019-05-22 10:40:26 +00:00
Martin Storsjo	de6038b265	[llvm-dlltool] Respect NONAME keyword This adds proper handling of the NONAME-keyword, which makes llvm-dlltool generate an import using the ordinal instead of the name. Patch by by Jannik Vogel, test added by Stefan Schmidt. Differential Revision: https://reviews.llvm.org/D62175 llvm-svn: 361367	2019-05-22 09:49:54 +00:00
Clement Courbet	f8f93ba90d	Re-land r361257 "[MergeICmps][NFC] Make BCEAtom move-only."" llvm-svn: 361366	2019-05-22 09:45:40 +00:00
Anton Afanasyev	df00c6a54f	[MIR] Add simple PRE pass to MachineCSE This is the second part of the commit fixing PR38917 (hoisting partitially redundant machine instruction). Most of PRE (partitial redundancy elimination) and CSE work is done on LLVM IR, but some of redundancy arises during DAG legalization. Machine CSE is not enough to deal with it. This simple PRE implementation works a little bit intricately: it passes before CSE, looking for partitial redundancy and transforming it to fully redundancy, anticipating that the next CSE step will eliminate this created redundancy. If CSE doesn't eliminate this, than created instruction will remain dead and eliminated later by Remove Dead Machine Instructions pass. The third part of the commit is supposed to refactor MachineCSE, to make it more clear and to merge MachinePRE with MachineCSE, so one need no rely on further Remove Dead pass to clear instrs not eliminated by CSE. First step: https://reviews.llvm.org/D54839 Fixes llvm.org/PR38917 llvm-svn: 361356	2019-05-22 07:41:34 +00:00
Fangrui Song	1c61471ab1	[PPC64] Parse -elfv1 -elfv2 when specified on target triple Summary: For big-endian powerpc64, the default ABI is ELFv1. OpenPower ABI ELFv2 is supported when -mabi=elfv2 is specified. FreeBSD support for PowerPC64 ELFv2 ABI with LLVM is in progress[1]. This patch adds an alternative way to specify ELFv2 ABI on target triple [2]. The following results are expected: ELFv1 when using: -target powerpc64-unknown-freebsd12.0 -target powerpc64-unknown-freebsd12.0 -mabi=elfv1 -target powerpc64-unknown-freebsd12.0-elfv1 ELFv2 when using: -target powerpc64-unknown-freebsd12.0 -mabi=elfv2 -target powerpc64-unknown-freebsd12.0-elfv2 [1] https://wiki.freebsd.org/powerpc/llvm-elfv2 [2] https://clang.llvm.org/docs/CrossCompilation.html Patch by Alfredo Dal'Ava Júnior! Differential Revision: https://reviews.llvm.org/D61950 llvm-svn: 361355	2019-05-22 07:29:59 +00:00
Sjoerd Meijer	eec021658b	[AArch64] Subtarget crypto extension defaults The Armv8.2-A crypto extensions all defaulted to true, but should default to false, like all the other extensions. Differential Revision: https://reviews.llvm.org/D62180 llvm-svn: 361354	2019-05-22 07:10:27 +00:00
Nikita Popov	15df05152d	[X86] Don't compare i128 through vector if construction not cheap (PR41971) Fix for https://bugs.llvm.org/show_bug.cgi?id=41971. Make the combineVectorSizedSetCCEquality() transform more conservative by checking that the bitcast to the vector type will be cheap/free for both operands. I'm considering it cheap if it's a constant, a load or already a vector. I've dropped the explicit check for f128 because it should fall out naturally (in the cases where it'd be detrimental). Differential Revision: https://reviews.llvm.org/D62220 llvm-svn: 361352	2019-05-22 06:47:06 +00:00
Chen Zheng	b727b0483c	[PowerPC] use meaningful name for displacement form aligned with x-form - NFC llvm-svn: 361347	2019-05-22 03:17:39 +00:00
Chen Zheng	9970665f60	[PowerPC] [ISEL] select x-form instruction for unaligned offset Differential Revision: https://reviews.llvm.org/D62173 llvm-svn: 361346	2019-05-22 02:57:31 +00:00
Pengfei Wang	6a0d432e9e	[X86] [CET] Deal with return-twice function such as vfork, setjmp when CET-IBT enabled Return-twice functions will indirectly jump after the caller's position. So when CET-IBT is enable, we should make sure these is endbr* instructions follow these Return-twice function caller. Like GCC does. Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D61881 llvm-svn: 361342	2019-05-22 00:50:21 +00:00
Sanjay Patel	6a554188aa	[InstCombine] fold shuffles of insert_subvectors This should be a valid exception to the general rule of not creating new shuffle masks in IR... because we already do it. :) Also, DAG combining/legalization will undo this by widening the shuffle back out if needed. Explanation for how we already do this: SLP or vector source can create chains of insert/extract as shown in 1 of the examples from PR16739: https://godbolt.org/z/NlK7rA https://bugs.llvm.org/show_bug.cgi?id=16739 And we expect instcombine or DAGCombine to clean that up by creating relatively simple shuffles. Differential Revision: https://reviews.llvm.org/D62024 llvm-svn: 361338	2019-05-22 00:32:25 +00:00
Matt Arsenault	2cba91b8db	AMDGPU: Assume calls read exec llvm-svn: 361333	2019-05-21 23:23:16 +00:00
Matt Arsenault	dd1ffa00a5	AMDGPU: Assume call pseudos are convergent There should probably be nonconvergent versions, but my guess is it doesn't matter in practice. llvm-svn: 361331	2019-05-21 23:23:10 +00:00
Matt Arsenault	60ba03e210	AMDGPU: Fix not marking new gfx10 SGPRs as CSRs llvm-svn: 361330	2019-05-21 23:23:05 +00:00
Dan Gohman	a49496fb2a	[WebAssembly] Add the signature for the new llround builtin function r360889 added new llround builtin functions. This patch adds their signatures for the WebAssembly backend. It also adds wasm32 support to utils/update_llc_test_checks.py, since that's the script other targets are using for their testcases for this feature. Differential Revision: https://reviews.llvm.org/D62207 llvm-svn: 361327	2019-05-21 23:06:34 +00:00
Stanislav Mekhanoshin	44d17ca02e	Fix register coalescer failure to prune value Register coalescer fails for the test in the patch with the assertion in JoinVals::ConflictResolution `DefMI != nullptr'. It attempts to join live intervals for two adjacent instructions and erase the copy: %2:vreg_256 = COPY %1 %3:vreg_256 = COPY killed %1 The LI needs to be adjusted to kill subrange for the erased instruction and extend the subrange of the original def. That was done for the main interval only but not for the subrange. As a result subrange had a VNI pointing to the erased slot resulting in the above failure. Differential Revision: https://reviews.llvm.org/D62162 llvm-svn: 361293	2019-05-21 19:32:41 +00:00
Leonard Chan	0bada7ce6c	[Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic Add an intrinsic that takes 2 signed integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D55720 llvm-svn: 361289	2019-05-21 19:17:19 +00:00
Craig Topper	ed6df47bae	[X86] Remove an unneeded ZERO_EXTEND creation from LowerINTRINSIC_W_CHAIN. NFC We were trying to ZERO_EXTEND from an i8 X86ISD::SETCC to i8 again. llvm-svn: 361288	2019-05-21 19:03:45 +00:00
Sanjay Patel	10f6b39899	[SelectionDAG] fold insert subvector of undef into undef DAGCombiner simplifies this more liberally as: // If inserting an UNDEF, just return the original vector. if (N1.isUndef()) return N0; So there's no way to make this visible in output AFAIK, but doing this at node creation time should be slightly more efficient. llvm-svn: 361287	2019-05-21 18:53:53 +00:00
Sanjay Patel	51dc59d090	[SelectionDAG] remove redundant code; NFCI getNode() squashes concatenation of undefs via FoldCONCAT_VECTORS(): // Concat of UNDEFs is UNDEF. if (llvm::all_of(Ops, [](SDValue Op) { return Op.isUndef(); })) return DAG.getUNDEF(VT); llvm-svn: 361284	2019-05-21 18:28:22 +00:00
Clement Courbet	122c6e6f36	[MergeICmps] Make sorting strongly stable on the rhs. Summary: Because the sort order was not strongly stable on the RHS, whether the chain could merge would depend on the order of the blocks in the Phi. EXPENSIVE_CHECKS would shuffle the blocks before sorting, resulting in non-deterministic merging. Reviewers: gchatelet Subscribers: hiraditya, llvm-commits, RKSimon Tags: #llvm Differential Revision: https://reviews.llvm.org/D62193 llvm-svn: 361281	2019-05-21 17:58:42 +00:00
Simon Pilgrim	4b82e50315	[X86][SSE] computeKnownBitsForTargetNode - add X86ISD::ANDNP support Fixes PACKSS-PSHUFB shuffle regressions mentioned on D61692 llvm-svn: 361270	2019-05-21 15:20:24 +00:00
Sanjay Patel	78c3f58122	[DAGCombiner] prevent unsafe reassociation of FP ops There are no FP callers of DAGCombiner::reassociateOps() currently, but we can add a fast-math check to make sure this API is not being misused. This was noted as a potential risk (and that risk might increase) with: D62191 llvm-svn: 361268	2019-05-21 14:47:38 +00:00
Clement Courbet	8361a10493	Revert r361257 "[MergeICmps][NFC] Make BCEAtom move-only." Broke some bots. llvm-svn: 361263	2019-05-21 14:24:46 +00:00
Clement Courbet	8fa970c2d8	[MergeICmps][NFC] Make BCEAtom move-only. And handle for self-move. This is required so that llvm::sort can work with EXPENSIVE_CHECKS, as it will do a random shuffle of the input which can result in self-moves. llvm-svn: 361257	2019-05-21 13:34:12 +00:00
Florian Hahn	f9b28e53c7	[ScheduleDAGInstrs] Compute topological ordering on demand. In most cases, the topological ordering does not get changed in ScheduleDAGInstrs. We can compute the ordering on demand, similar to D60125. This drastically cuts down the number of times we need to compute the topological ordering, e.g. for SPEC2006, SPEC2k and MultiSource, we get the following stats for -O3 -flto on X86 (showing the top reductions, with small absolute values filtered). The smallest reduction is -50%. Slightly positive impact on compile-time (-0.1 % geomean speedup for test-suite + SPEC & co, with -O1 on X86) Tests: 243 Metric: pre-RA-sched.NumTopoInits Program base patch diff test-suite...ngs-C/fixoutput/fixoutput.test 115.00 3.00 -97.4% test-suite...ks/Prolangs-C/cdecl/cdecl.test 957.00 26.00 -97.3% test-suite...math/automotive-basicmath.test 107.00 3.00 -97.2% test-suite...rolangs-C++/deriv2/deriv2.test 144.00 6.00 -95.8% test-suite...lowfish/security-blowfish.test 410.00 18.00 -95.6% test-suite...frame_layout/frame_layout.test 441.00 23.00 -94.8% test-suite...rolangs-C++/employ/employ.test 159.00 11.00 -93.1% test-suite...s/Ptrdist/anagram/anagram.test 157.00 11.00 -93.0% test-suite...s-C/unix-smail/unix-smail.test 829.00 59.00 -92.9% test-suite...chmarks/Olden/power/power.test 154.00 11.00 -92.9% test-suite...T95/147.vortex/147.vortex.test 19876.00 1434.00 -92.8% test-suite...000/255.vortex/255.vortex.test 19881.00 1435.00 -92.8% test-suite...ce/Applications/Burg/burg.test 2203.00 168.00 -92.4% test-suite...urce/Applications/hbd/hbd.test 1067.00 85.00 -92.0% test-suite...ternal/HMMER/hmmcalibrate.test 3145.00 251.00 -92.0% test-suite.../Applications/spiff/spiff.test 1037.00 84.00 -91.9% test-suite...SPEC/CINT95/130.li/130.li.test 5913.00 487.00 -91.8% test-suite.../CINT95/134.perl/134.perl.test 12532.00 1041.00 -91.7% test-suite...ce/Benchmarks/Olden/bh/bh.test 220.00 19.00 -91.4% test-suite :: External/Nurbs/nurbs.test 2304.00 206.00 -91.1% test-suite...arks/VersaBench/dbms/dbms.test 773.00 75.00 -90.3% test-suite...ce/Applications/siod/siod.test 9043.00 878.00 -90.3% test-suite...pplications/treecc/treecc.test 4510.00 438.00 -90.3% test-suite...T2006/456.hmmer/456.hmmer.test 7093.00 697.00 -90.2% test-suite...s-C/Pathfinder/PathFinder.test 882.00 87.00 -90.1% test-suite.../CINT2000/176.gcc/176.gcc.test 64978.00 6721.00 -89.7% test-suite...cations/hexxagon/hexxagon.test 657.00 69.00 -89.5% test-suite...fice-ispell/office-ispell.test 2712.00 285.00 -89.5% test-suite.../CINT2006/403.gcc/403.gcc.test 139613.00 14992.00 -89.3% test-suite...lications/ClamAV/clamscan.test 25880.00 2785.00 -89.2% Reviewers: MatzeB, atrick, efriedma, niravd Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D60839 llvm-svn: 361253	2019-05-21 13:04:53 +00:00
Paul Robinson	9c56326934	[DebugInfo] Handle '# line "file"' correctly for asm source. This provides the correct file path for the original source, rather than the preprocessed source. Part of the fix for PR41839. Differential Revision: https://reviews.llvm.org/D62074 llvm-svn: 361248	2019-05-21 11:59:03 +00:00
Bob Haarman	032f87bbb3	Revert r360902 "Resubmit: [Salvage] Change salvage debug info ..." This reverts commit rr360902. It caused an assertion failure in lib/IR/DebugInfoMetadata.cpp: Assertion `(OffsetInBits + SizeInBits <= FragmentSizeInBits) && "new fragment outside of original fragment"' failed. PR41931. llvm-svn: 361246	2019-05-21 11:53:41 +00:00
Paul Robinson	116e8d4876	[DebugInfo] Handle -main-file-name correctly for asm source. This option provides only the base filename, not a full relative path. Part of the fix for PR41839. Differential Revision: https://reviews.llvm.org/D62071 llvm-svn: 361245	2019-05-21 11:52:27 +00:00
Clement Courbet	a95d95d392	[MergeICmps] Preserve the dominator tree. Summary: In preparation for D60318 . Reviewers: gchatelet, efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62068 llvm-svn: 361239	2019-05-21 11:02:23 +00:00
Fangrui Song	cd36a2857e	[PPC64] Update LocalEntry from assigned symbols On PowerPC64 ELFv2 ABI, functions may have 2 entry points: global and local. The local entry point location of a function is stored in the st_other field of the symbol, as an offset relative to the global entry point. In order to make symbol assignments (e.g. .equ/.set) work properly with this, PPCTargetELFStreamer already copies the local entry bits from the source symbol to the destination one, on emitAssignment(). The problem is that this copy is performed only at the assignment location, where the source symbol may not yet have processed the .localentry directive, that sets the local entry. This may cause the destination symbol to end up with wrong local entry information. Other symbol info is not affected by this because, in this case, the destination symbol value is actually a symbol reference. This change keeps track of these assignments, and update all needed st_other fields when finish() is called. Patch by Leandro Lupori! Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D56586 llvm-svn: 361237	2019-05-21 10:41:25 +00:00
Florian Hahn	4a8835c655	[AArch64] Skip mask checks for masks with an odd number of elements. Some checks in isShuffleMaskLegal expect an even number of elements, e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access invalid elements and crash. This patch adds checks to the impacted functions. Fixes PR41951 Reviewers: t.p.northover, dmgreen, samparker Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D60690 llvm-svn: 361235	2019-05-21 10:05:26 +00:00
Cullen Rhodes	7f47b75d18	[AArch64][SVE2] Asm: add integer unary instructions (predicated) Summary: Patch adds support for the following instructions: * URECPE, URSQRTE, SQABS, SQNEG The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62129 llvm-svn: 361230	2019-05-21 09:06:51 +00:00
Cullen Rhodes	e798e8d9d2	[AArch64][SVE2] Asm: add integer pairwise arithmetic instructions Summary: Patch adds support for the following instructions: ADDP, SMAXP, UMAXP, SMINP, UMINP The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62128 llvm-svn: 361229	2019-05-21 08:59:00 +00:00
Sam Parker	3141bbd52d	[ARM][CGP] Skip nuw in PrepareConstants PrepareConstants step converts add/sub with 'negative' immediates to sub/add with a 'positive' imm to make promotion more simple. nuw already states that the add shouldn't cause an unsigned wrap, so it shouldn't need any tweaking. Plus, we also don't allow a sub with a 'negative' immediate to be safe wrap, so this functionality has been removed. The PrepareConstants step now just handles the add instructions that we've determined would be safe if they wrap around zero. Differential Revision: https://reviews.llvm.org/D62057 llvm-svn: 361227	2019-05-21 07:56:47 +00:00
Dylan McKay	e967308da4	Add TargetLoweringInfo hook for explicitly setting the ABI calling convention endianess Summary: The endianess used in the calling convention does not always match the endianess of the target on all architectures, namely AVR. When an argument is too large to be legalised by the architecture and is split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian is queried to find the endianess that function arguments must be laid out in. This approach was recommended by Eli Friedman. Originally reported in https://github.com/avr-rust/rust/issues/129. Patch by Carl Peto. Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma Reviewed By: efriedma Subscribers: JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62003 llvm-svn: 361222	2019-05-21 06:38:02 +00:00
Chen Zheng	c4c407a0eb	[PowerPC] use more meaningful name - NFC llvm-svn: 361218	2019-05-21 03:54:42 +00:00
Lang Hames	f088e195cc	[ORC] Assert that JITDylibs have unique names. Patch by Praveen Velliengiri. Thanks Praveen! Differential Revision: https://reviews.llvm.org/D62139 llvm-svn: 361215	2019-05-21 03:23:08 +00:00
Nick Desaulniers	28e351af2a	[ORC] fix use-after-move. NFC Summary: scan-build flagged a potential use-after-move in debug builds. It's not safe that a moved from value contains anything but garbage. Manually DRY up these repeated expressions. Reviewers: lhames Reviewed By: lhames Subscribers: hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62112 llvm-svn: 361203	2019-05-20 22:17:43 +00:00
Matt Arsenault	6dd08e335f	AMDGPU: Force skip branches over calls Unfortunately the way SIInsertSkips works is backwards, and is required for correctness. r338235 added handling of some special cases where skipping is mandatory to avoid side effects if no lanes are active. It conservatively handled asm correctly, but the same logic needs to apply to calls. Usually the call sequence code is larger than the skip threshold, although the way the count is computed is really broken, so I'm not sure if anything was likely to really hit this. llvm-svn: 361202	2019-05-20 22:04:42 +00:00
Lang Hames	0dcf69eb82	[ORC] Remove some unreachable code. Fixes http://llvm.org/PR41662. llvm-svn: 361199	2019-05-20 21:30:33 +00:00
Cameron McInally	8bec58d5f7	[NFC][InstCombine] Add FIXME for one-use check on constant negation transforms. llvm-svn: 361197	2019-05-20 21:00:42 +00:00
Lang Hames	93d2bdda6b	[Support] Renamed member 'Size' to 'AllocatedSize' in MemoryBlock and OwningMemoryBlock. Rename member 'Size' to 'AllocatedSize' in order to provide a hint that the allocated size may be different than the requested size. Comments are added to clarify this point. Updated the InMemoryBuffer in FileOutputBuffer.cpp to track the requested buffer size. Patch by Machiel van Hooren. Thanks Machiel! https://reviews.llvm.org/D61599 llvm-svn: 361195	2019-05-20 20:53:05 +00:00
Martin Storsjo	4ed18e5ef5	[AArch64] Handle lowering lround on windows, where long is 32 bit Differential Revision: https://reviews.llvm.org/D62108 llvm-svn: 361192	2019-05-20 19:53:28 +00:00
Cameron McInally	2557ca296a	[InstCombine] Add visitFNeg(...) visitor for unary Fneg Also, break out a helper function, namely foldFNegIntoConstant(...), which performs transforms common between visitFNeg(...) and visitFSub(...). Differential Revision: https://reviews.llvm.org/D61693 llvm-svn: 361188	2019-05-20 19:10:30 +00:00
Sanjay Patel	63fa690617	[InstSimplify] update stale comment; NFC Missed this diff with rL361118. llvm-svn: 361180	2019-05-20 17:52:18 +00:00
Craig Topper	97d4f7c194	[SelectionDAGBuilder] Flush PendingExports before creating INLINEASM_BR node for asm goto. Since INLINEASM_BR is a terminator we need to flush the pending exports before emitting it. If we don't do this, a TokenFactor can be inserted between it and the BR instruction emitted to finish the callbr lowering. It looks like nodes are glued to the INLINEASM_BR so I had to make sure we emit the TokenFactor before that. Differential Revision: https://reviews.llvm.org/D59981 llvm-svn: 361177	2019-05-20 17:08:02 +00:00
Nick Desaulniers	bf940622c8	[DWARF] hoist nullptr checks. NFC Summary: This was flagged in https://www.viva64.com/en/b/0629/ under "Snippet No. 15" (see under #13). It looks like PVS studio flags nullptr checks where the ptr is used inbetween creation and checking against nullptr. Reviewers: JDevlieghere, probinson Reviewed By: JDevlieghere Subscribers: RKSimon, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62118 llvm-svn: 361176	2019-05-20 16:58:59 +00:00
Craig Topper	cac6b76a76	[X86] Add icelake-client and tremont model numbers to getHostCPUName. llvm-svn: 361174	2019-05-20 16:58:23 +00:00
Nick Desaulniers	639b29b1b5	[INLINER] allow inlining of blockaddresses if sole uses are callbrs Summary: It was supposed that Ref LazyCallGraph::Edge's were being inserted by inlining, but that doesn't seem to be the case. Instead, it seems that there was no test for a blockaddress Constant in an instruction that referenced the function that contained the instruction. Ex: ``` define void @f() { %1 = alloca i8, align 8 2: store i8 blockaddress(@f, %2), i8** %1, align 8 ret void } ``` When iterating blockaddresses, do not add the function they refer to back to the worklist if the blockaddress is referring to the contained function (as opposed to an external function). Because blockaddress has sligtly different semantics than GNU C's address of labels, there are 3 cases that can occur with blockaddress, where only 1 can happen in GNU C due to C's scoping rules: * blockaddress is within the function it refers to (possible in GNU C). * blockaddress is within a different function than the one it refers to (not possible in GNU C). * blockaddress is used in to declare a global (not possible in GNU C). The second case is tested in: ``` $ ./llvm/build/unittests/Analysis/AnalysisTests \ --gtest_filter=LazyCallGraphTest.HandleBlockAddress ``` This patch adjusts the iteration of blockaddresses in LazyCallGraph::visitReferences to not revisit the blockaddresses function in the first case. The Linux kernel contains code that's not semantically valid at -O0; specifically code passed to asm goto. It requires that asm goto be inline-able. This patch conservatively does not attempt to handle the more general case of inlining blockaddresses that have non-callbr users (pr/39560). https://bugs.llvm.org/show_bug.cgi?id=39560 https://bugs.llvm.org/show_bug.cgi?id=40722 https://github.com/ClangBuiltLinux/linux/issues/6 https://reviews.llvm.org/rL212077 Reviewers: jyknight, eli.friedman, chandlerc Reviewed By: chandlerc Subscribers: george.burgess.iv, nathanchance, mgorny, craig.topper, mengxu.gatech, void, mehdi_amini, E5ten, chandlerc, efriedma, eraman, hiraditya, haicheng, pirama, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D58260 llvm-svn: 361173	2019-05-20 16:48:09 +00:00
Bjorn Pettersson	eee0f2330d	[AMDGPU] Fix std::array initializers to avoid warnings with older tool chains. NFC A std::array is implemented as a template with an array inside a struct. Older versions of clang, like 3.6, require an extra set of curly braces around std::array initializations to avoid warnings. The C++ language was changed regarding this by CWG 1270. So more modern tool chains does not complaing even if leaving out one level of braces. llvm-svn: 361171	2019-05-20 16:41:08 +00:00
Craig Topper	af7a188453	[Intrinsics] Merge lround.i32 and lround.i64 into a single intrinsic with overloaded result type. Make result type for llvm.llround overloaded instead of fixing to i64 We shouldn't really make assumptions about possible sizes for long and long long. And longer term we should probably support vectorizing these intrinsics. By making the result types not fixed we can support vectors as well. Differential Revision: https://reviews.llvm.org/D62026 llvm-svn: 361169	2019-05-20 16:27:09 +00:00
Craig Topper	203bfdd0f0	[DAGCombiner] Refactor code in visitShiftByConstant slightly to make it more readable. NFC This changes the isShift variable to include the constant operand check that was previously in the if statement. While there fix an 80 column violation and an unnecessary use of getNode. Also fix variable name capitalization. llvm-svn: 361168	2019-05-20 16:26:55 +00:00
Matt Arsenault	5239298b0d	R600: Fix unconditional return in loop llvm-svn: 361167	2019-05-20 16:22:11 +00:00
Nikita Popov	9060b6df97	[SDAG] Vector op legalization for overflow ops Fixes issue reported by aemerson on D57348. Vector op legalization support is added for uaddo, usubo, saddo and ssubo (umulo and smulo were already supported). As usual, by extracting TargetLowering methods and calling them from vector op legalization. Vector op legalization doesn't really deal with multiple result nodes, so I'm explicitly performing a recursive legalization call on the result value that is not being legalized. There are some existing test changes because expansion happens earlier, so we don't get a DAG combiner run in between anymore. Differential Revision: https://reviews.llvm.org/D61692 llvm-svn: 361166	2019-05-20 16:09:22 +00:00
Matt Arsenault	7c8ec18964	RegAlloc: Fix verifier error with undef identity copies The code did not match the example in the comment, and was checking the undef flag on the copy dest instead of source. The existing tests were only hitting the > 2 operands case. llvm-svn: 361156	2019-05-20 14:09:36 +00:00
Cullen Rhodes	523789fa6b	[AArch64][SVE2] Asm: add SADALP and UADALP instructions Summary: This patch adds support for the integer pairwise add and accumulate long instructions SADALP/UADALP. These instructions are predicated. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62001 llvm-svn: 361154	2019-05-20 13:50:15 +00:00
Cameron McInally	2d2a46db8e	[InstSimplify] Teach fsub -0.0, (fneg X) ==> X about unary fneg Differential Revision: https://reviews.llvm.org/D62077 llvm-svn: 361151	2019-05-20 13:13:35 +00:00
Orlando Cazalet-Hyams	ed67bf8d2f	Resubmit "[DebugInfo] Update loop metadata for inlined loops" This reverts commit `95805bc425`. I've squashed the test fix into this commit. [DebugInfo] Update loop metadata for inlined loops Currently, when a loop is cloned while inlining function (A) into function (B) the loop metadata is copied and then not modified at all. The loop metadata can encode the loop's start and end DILocations. Therefore, the new inlined loop in function (B) may have loop metadata which shows start and end locations residing in function (A). This patch ensures loop metadata is updated while inlining so that the start and end DILocations are given the "inlinedAt" operand. I've also added a regression test for this. This fix is required for D60831 because that patch uses loop metadata to determine the DILocation for the branches of new loop preheaders. Reviewers: aprantl, dblaikie, anemet Reviewed By: aprantl Subscribers: eraman, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D61933 llvm-svn: 361149	2019-05-20 13:02:30 +00:00
Orlando Cazalet-Hyams	95805bc425	Revert "[DebugInfo] Update loop metadata for inlined loops" This reverts commit `6e8f1a80cd`. Reverting patch while investigating build bot failure. llvm-svn: 361143	2019-05-20 11:24:39 +00:00
Guillaume Chatelet	e386a01e84	[NFC] Refactor visitIntrinsicCall so it doesn't return a const char* Summary: API simplification Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61306 llvm-svn: 361140	2019-05-20 11:01:30 +00:00
Petar Jovanovic	e85bbf564d	[DebugInfoMetadata] Refactor DIExpression::prepend constants (NFC) Refactor DIExpression::With* into a flag enum in order to be less error-prone to use (as discussed on D60866). Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D61943 llvm-svn: 361137	2019-05-20 10:35:57 +00:00
Cullen Rhodes	96c5929926	[AArch64][SVE2] Asm: add int halving add/sub (predicated) instructions Summary: This patch adds support for the predicated integer halving add/sub instructions: * SHADD, UHADD, SRHADD, URHADD * SHSUB, UHSUB, SHSUBR, UHSUBR The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D62000 llvm-svn: 361136	2019-05-20 10:35:23 +00:00
Cullen Rhodes	0fc6347b35	[AArch64][SVE2] Asm: add saturating multiply-add interleaved long instructions Summary: Patch adds support for SQDMLALBT and SQDMLSLBT instructions. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61998 llvm-svn: 361135	2019-05-20 10:29:48 +00:00
Fangrui Song	68774edcd6	Use llvm::sort. NFC llvm-svn: 361134	2019-05-20 10:18:35 +00:00
Sander de Smalen	f83cccf917	Match types of accumulator and result for llvm.experimental.vector.reduce.fadd/fmul The scalar start/accumulator value of the fadd- and fmul reduction should match the result type of the reduction, as well as the vector element-type of the input vector. Although this was not explicitly specified in the LangRef, it was taken for granted in code implementing the reductions. The patch also fixes the LangRef by adding this constraint. Reviewed By: aemerson, nikic Differential Revision: https://reviews.llvm.org/D60260 llvm-svn: 361133	2019-05-20 09:54:06 +00:00
Orlando Cazalet-Hyams	6e8f1a80cd	[DebugInfo] Update loop metadata for inlined loops Summary: Currently, when a loop is cloned while inlining function (A) into function (B) the loop metadata is copied and then not modified at all. The loop metadata can encode the loop's start and end DILocations. Therefore, the new inlined loop in function (B) may have loop metadata which shows start and end locations residing in function (A). This patch ensures loop metadata is updated while inlining so that the start and end DILocations are given the "inlinedAt" operand. I've also added a regression test for this. This fix is required for D60831 because that patch uses loop metadata to determine the DILocation for the branches of new loop preheaders. Reviewers: aprantl, dblaikie, anemet Reviewed By: aprantl Subscribers: eraman, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D61933 llvm-svn: 361132	2019-05-20 09:40:44 +00:00
Guillaume Chatelet	a760e69840	Revert "[NFC] Refactor visitIntrinsicCall so it doesn't return a const char*" This reverts commit 706d3cd6388cc3446aab282f3af879862b10cbed. llvm-svn: 361130	2019-05-20 09:00:12 +00:00
Guillaume Chatelet	fa8c152576	[NFC] Refactor visitIntrinsicCall so it doesn't return a const char* Summary: API simplification Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61306 llvm-svn: 361129	2019-05-20 08:52:10 +00:00
Carl Ritson	34e95ce259	[AMDGPU] gfx1010 Avoid SMEM WAR hazard for some s_waitcnt values Summary: Avoid introducing hazard mitigation when lgkmcnt is reduced to 0. Clarify code comments to explain assumptions made for this hazard mitigation. Expand and correct test cases to cover variants of s_waitcnt. Reviewers: nhaehnle, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62058 llvm-svn: 361124	2019-05-20 07:20:12 +00:00
Sanjay Patel	9ef99b4b11	[InstSimplify] fold fcmp (maxnum, X, C1), C2 This is the sibling transform for rL360899 (D61691): maxnum(X, GreaterC) == C --> false maxnum(X, GreaterC) <= C --> false maxnum(X, GreaterC) < C --> false maxnum(X, GreaterC) >= C --> true maxnum(X, GreaterC) > C --> true maxnum(X, GreaterC) != C --> true llvm-svn: 361118	2019-05-19 14:26:39 +00:00
Dinar Temirbulatov	2ff72f6654	[SLP] Refactoring of EdgeInfo and UserTreeIdx in buildTree_rec(). This is a follow-up refactoring patch after the introduction of usable TreeEntry pointers in D61706. The EdgeInfo struct can now use a TreeEntry pointer instead of an index in VectorizableTree. Committed on behalf of @vporpo (Vasileios Porpodas) Differential Revision: https://reviews.llvm.org/D61795 llvm-svn: 361110	2019-05-19 01:30:41 +00:00
Craig Topper	3164b50af7	[X86] Remove combineShift function. Just dispatch directly to the handler for each flavor from the main switch. NFC llvm-svn: 361108	2019-05-19 01:01:46 +00:00
Matt Arsenault	b04f3258dd	GVN: Handle addrspacecast llvm-svn: 361103	2019-05-18 14:36:06 +00:00
Simon Pilgrim	2b45a70fd6	MemCmpExpansion::getCompareLoadPairs - assert we find a comparison diff. NFCI. Fix scan-build uninitialized warning and assert the final diff isn't null. llvm-svn: 361095	2019-05-18 11:31:48 +00:00
Matt Arsenault	2f29220d6d	AMDGPU/GlobalISel: Implement s64->s64 [SU]ITOFP llvm-svn: 361082	2019-05-17 23:05:18 +00:00
Matt Arsenault	02b5ca8cd1	GlobalISel: Implement lower for S64->S32 [SU]ITOFP This is ported from the custom AMDGPU DAG implementation. I think this is a better default expansion than what the DAG currently uses, at least if the target has CTLZ. This implements the signed version in terms of the unsigned conversion, which is implemented with bit operations. SelectionDAG has several other implementations that should eventually be ported depending on what instructions are legal. llvm-svn: 361081	2019-05-17 23:05:13 +00:00
Sam Clegg	13717bd54b	[WebAssembly] Remove expected failure of builtin-location.C test This seems to have been fixed by https://reviews.llvm.org/D61956 Yay Differential Revision: https://reviews.llvm.org/D62075 llvm-svn: 361071	2019-05-17 19:55:17 +00:00
Matt Arsenault	f3cedf4823	GlobalISel: Define integer min/max instructions Doesn't attempt to emit them for anything yet, but some legalizations I want to port use them. llvm-svn: 361061	2019-05-17 18:36:31 +00:00
Sanjay Patel	926e47751b	[InstCombine] move bitcast after insertelement-with-bitcasted-operands llvm-svn: 361058	2019-05-17 18:06:12 +00:00
Simon Pilgrim	065431c82b	[X86][SSE] Fold movmsk(not(x)) -> not(movmsk) Helps to improve folding of comparisons with movmsk results. llvm-svn: 361056	2019-05-17 17:56:25 +00:00
Simon Pilgrim	2c2f8e74b9	[X86][SSE] Match all-of bool scalar reductions into a bitcast/movmsk + cmp. Same as what we do for vector reductions in combineHorizontalPredicateResult, use movmsk+cmp for scalar (and(extract(x,0),extract(x,1)) reduction patterns. llvm-svn: 361052	2019-05-17 17:25:55 +00:00
Cameron McInally	067e946859	[InstSimplify] Add unary fneg to `fsub 0.0, (fneg X) ==> X` transform Differential Revision: https://reviews.llvm.org/D62013 llvm-svn: 361047	2019-05-17 16:47:00 +00:00
Dmitry Preobrazhensky	198611b0ff	[AMDGPU][MC] Corrected parsing of NAME:VALUE modifiers See bug 41298: https://bugs.llvm.org/show_bug.cgi?id=41298 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61009 llvm-svn: 361045	2019-05-17 16:04:17 +00:00
Roman Lebedev	64c756b991	[DAGCombiner] visitShiftByConstant(): drop bogus signbit check Summary: That check claims that the transform is illegal otherwise. That isn't true: 1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter https://rise4fun.com/Alive/K4A 2. For `ISD::AND`, there is no restriction on constants: https://rise4fun.com/Alive/Wy3 3. For `ISD::OR`, there is no restriction on constants: https://rise4fun.com/Alive/GOH 3. For `ISD::XOR`, there is no restriction on constants: https://rise4fun.com/Alive/ml6 So, why is it there then? This changes the testcase that was touched by @spatel in rL347478, but i'm not sure that test tests anything particular? Reviewers: RKSimon, spatel, craig.topper, jojo, rengolin Reviewed By: spatel Subscribers: javed.absar, llvm-commits, spatel Tags: #llvm Differential Revision: https://reviews.llvm.org/D61918 llvm-svn: 361044	2019-05-17 15:52:58 +00:00
Roman Lebedev	3275060fe8	[InstCombine] canShiftBinOpWithConstantRHS(): drop bogus signbit check Summary: In D61918 i was looking at dropping it in DAGCombiner `visitShiftByConstant()`, but as @craig.topper pointed out, it was copied from here. That check claims that the transform is illegal otherwise. That isn't true: 1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter https://rise4fun.com/Alive/K4A 2. For `ISD::AND`, there is no restriction on constants: https://rise4fun.com/Alive/Wy3 3. For `ISD::OR`, there is no restriction on constants: https://rise4fun.com/Alive/GOH 3. For `ISD::XOR`, there is no restriction on constants: https://rise4fun.com/Alive/ml6 So, why is it there then? As far as i can tell, it dates all the way back to original check-in rL7793. I think we should just drop it. Reviewers: spatel, craig.topper, efriedma, majnemer Reviewed By: spatel Subscribers: llvm-commits, craig.topper Tags: #llvm Differential Revision: https://reviews.llvm.org/D61938 llvm-svn: 361043	2019-05-17 15:52:49 +00:00
Dmitry Preobrazhensky	5ae3113969	[AMDGPU][MC] Enabled labels with s_call_b64 and s_cbranch_i_fork See https://bugs.llvm.org/show_bug.cgi?id=41888 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D62016 llvm-svn: 361040	2019-05-17 14:57:04 +00:00
Simon Pilgrim	279314e81b	[X86][AVX] Remove LowerCTTZ's AVX1 custom vector handling. We can now rely on generic expansion to handle this. llvm-svn: 361038	2019-05-17 14:37:19 +00:00
Simon Pilgrim	62c7032c18	[X86][AVX] isNOT - add extract_subvector(xor X, -1) -> extract_subvector(X) fold. Prep work for the removal of the remaining x86 CTTZ vector lowering. llvm-svn: 361035	2019-05-17 14:04:56 +00:00
Dmitry Preobrazhensky	43fcc79837	[AMDGPU][MC] Enabled expressions for most operands which accept integer values See bug 40873: https://bugs.llvm.org/show_bug.cgi?id=40873 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60768 llvm-svn: 361031	2019-05-17 13:17:48 +00:00
Matt Arsenault	1a02d30c87	AMDGPU: Fix unused variable warnings in release builds llvm-svn: 361030	2019-05-17 12:59:27 +00:00
Matt Arsenault	a510b570c2	AMDGPU/GlobalISel: Legalize G_FCEIL llvm-svn: 361028	2019-05-17 12:20:05 +00:00
Matt Arsenault	6aebcd5499	AMDGPU/GlobalISel: Legalize G_INTRINSIC_TRUNC llvm-svn: 361027	2019-05-17 12:20:01 +00:00
Matt Arsenault	6aafc5e19d	AMDGPU/GlobalISel: Legalize G_FRINT llvm-svn: 361026	2019-05-17 12:19:57 +00:00
Matt Arsenault	1448f5689e	AMDGPU/GlobalISel: Legalize G_FCOPYSIGN llvm-svn: 361025	2019-05-17 12:19:52 +00:00
Clement Courbet	90900fbc9f	[MergeICmps][NFC] Add more debug. llvm-svn: 361024	2019-05-17 12:07:51 +00:00
Matt Arsenault	568f193847	AMDGPU/GlobalISel: RegBankSelect for llvm.amdgcn.s.buffer.load llvm-svn: 361023	2019-05-17 12:02:34 +00:00
Matt Arsenault	a3b5a386fa	AMDGPU/GlobalISel: Use subreg index instead of extra unmerge This saves instructions and extra steps, but I'm not sure about introducing subregister indexes at this point. llvm-svn: 361022	2019-05-17 12:02:31 +00:00
Matt Arsenault	b3dc73634c	AMDGPU/GlobalISel: Use waterfall loop for buffer_load This adds support for more complex waterfall loops that need to handle operands > 32-bits, and multiple operands. llvm-svn: 361021	2019-05-17 12:02:27 +00:00
Simon Pilgrim	a6d3bd486b	[X86] Pull out IsNOT helper. NFCI. Return the input value for the NOT pattern: (xor X, -1) -> X llvm-svn: 361012	2019-05-17 10:37:08 +00:00
Clement Courbet	632dfdda16	Re-land r360859: "[MergeICmps] Simplify the code." With a fix for PR41917: The predecessor list was changing under our feet. - for (BasicBlock Pred : predecessors(EntryBlock_)) { + while (!pred_empty(EntryBlock_)) { + BasicBlock const Pred = *pred_begin(EntryBlock_); llvm-svn: 361009	2019-05-17 09:43:45 +00:00
Rhys Perry	c4bc61bad7	[AMDGPU] detect WaW hazards when moving/merging load/store instructions Summary: In order to combine memory operations efficiently, the load/store optimizer might move some instructions around. It's usually safe to move instructions down past the merged instruction because the pass checks if memory operations can be re-ordered. Though, the current logic doesn't handle Write-after-Write hazards. This fixes a reflection issue with Monster Hunter World and DXVK. v2: - rebased on top of master - clean up the test case - handle WaW hazards correctly Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130 Original patch by Samuel Pitoiset. Reviewers: tpr, arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: ronlieb, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D61313 llvm-svn: 361008	2019-05-17 09:32:23 +00:00
Cullen Rhodes	7f605c3550	[AArch64][SVE2] Asm: add saturating multiply-add long instructions Summary: Patch adds support for indexed and unpredicated vectors forms of the following instructions: * SQDMLALB, SQDMLALT, SQDMLSLB, SQDMLSLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D61997 llvm-svn: 361005	2019-05-17 09:29:43 +00:00
Cullen Rhodes	334130a199	[AArch64][SVE2] Asm: add integer multiply-add long instructions Summary: Patch adds support for indexed and unpredicated vectors forms of the following instructions: * SMLALB, SMLALT, UMLALB, UMLALT, SMLSLB, SMLSLT, UMLSLB, UMLSLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61951 llvm-svn: 361003	2019-05-17 09:19:41 +00:00
Cullen Rhodes	0d47f00821	[AArch64][SVE2] Asm: add integer multiply long instructions Summary: Patch adds support for indexed and unpredicated vectors forms of the following instructions: * SMULLB, SMULLT, UMULLB, UMULLT, SQDMULLB, SQDMULLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61936 llvm-svn: 361002	2019-05-17 09:04:44 +00:00
Craig Topper	ae1597d360	[X86] Add FeatureFastScalarShiftMasks and FeatureFastVectorShiftMasks to the ignore list for inlining compatibility. These are tuning flags and won't cause any codegen issue if we inline a function with a different value. llvm-svn: 360992	2019-05-17 06:40:21 +00:00
Fangrui Song	ad7199f3e6	[PowerPC] Support .reloc , R_PPC{,64}_NONE, This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. llvm-svn: 360990	2019-05-17 06:04:11 +00:00
Fangrui Song	ec6dc3089e	[GlobalISel] Fix -Wsign-compare on 32-bit -DLLVM_ENABLE_ASSERTIONS=on builds llvm-svn: 360989	2019-05-17 05:53:39 +00:00
Fangrui Song	e18a6ad0b8	[MC][PowerPC] Clean up PPCAsmBackend Replace the member variable Target with Triple Use Triple instead of TheTarget.getName() to dispatch on 32-bit/64-bit. Delete redundant parameters llvm-svn: 360986	2019-05-17 05:44:26 +00:00
Ben Dunbobbin	1d16515fb4	[ELF] Implement Dependent Libraries Feature This patch implements a limited form of autolinking primarily designed to allow either the --dependent-library compiler option, or "comment lib" pragmas ( https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017) in C/C++ e.g. #pragma comment(lib, "foo"), to cause an ELF linker to automatically add the specified library to the link when processing the input file generated by the compiler. Currently this extension is unique to LLVM and LLD. However, care has been taken to design this feature so that it could be supported by other ELF linkers. The design goals were to provide: - A simple linking model for developers to reason about. - The ability to to override autolinking from the linker command line. - Source code compatibility, where possible, with "comment lib" pragmas in other environments (MSVC in particular). Dependent library support is implemented differently for ELF platforms than on the other platforms. Primarily this difference is that on ELF we pass the dependent library specifiers directly to the linker without manipulating them. This is in contrast to other platforms where they are mapped to a specific linker option by the compiler. This difference is a result of the greater variety of ELF linkers and the fact that ELF linkers tend to handle libraries in a more complicated fashion than on other platforms. This forces us to defer handling the specifiers to the linker. In order to achieve a level of source code compatibility with other platforms we have restricted this feature to work with libraries that meet the following "reasonable" requirements: 1. There are no competing defined symbols in a given set of libraries, or if they exist, the program owner doesn't care which is linked to their program. 2. There may be circular dependencies between libraries. The binary representation is a mergeable string section (SHF_MERGE, SHF_STRINGS), called .deplibs, with custom type SHT_LLVM_DEPENDENT_LIBRARIES (0x6fff4c04). The compiler forms this section by concatenating the arguments of the "comment lib" pragmas and --dependent-library options in the order they are encountered. Partial (-r, -Ur) links are handled by concatenating .deplibs sections with the normal mergeable string section rules. As an example, #pragma comment(lib, "foo") would result in: .section ".deplibs","MS",@llvm_dependent_libraries,1 .asciz "foo" For LTO, equivalent information to the contents of a the .deplibs section can be retrieved by the LLD for bitcode input files. LLD processes the dependent library specifiers in the following way: 1. Dependent libraries which are found from the specifiers in .deplibs sections of relocatable object files are added when the linker decides to include that file (which could itself be in a library) in the link. Dependent libraries behave as if they were appended to the command line after all other options. As a consequence the set of dependent libraries are searched last to resolve symbols. 2. It is an error if a file cannot be found for a given specifier. 3. Any command line options in effect at the end of the command line parsing apply to the dependent libraries, e.g. --whole-archive. 4. The linker tries to add a library or relocatable object file from each of the strings in a .deplibs section by; first, handling the string as if it was specified on the command line; second, by looking for the string in each of the library search paths in turn; third, by looking for a lib<string>.a or lib<string>.so (depending on the current mode of the linker) in each of the library search paths. 5. A new command line option --no-dependent-libraries tells LLD to ignore the dependent libraries. Rationale for the above points: 1. Adding the dependent libraries last makes the process simple to understand from a developers perspective. All linkers are able to implement this scheme. 2. Error-ing for libraries that are not found seems like better behavior than failing the link during symbol resolution. 3. It seems useful for the user to be able to apply command line options which will affect all of the dependent libraries. There is a potential problem of surprise for developers, who might not realize that these options would apply to these "invisible" input files; however, despite the potential for surprise, this is easy for developers to reason about and gives developers the control that they may require. 4. This algorithm takes into account all of the different ways that ELF linkers find input files. The different search methods are tried by the linker in most obvious to least obvious order. 5. I considered adding finer grained control over which dependent libraries were ignored (e.g. MSVC has /nodefaultlib:<library>); however, I concluded that this is not necessary: if finer control is required developers can fall back to using the command line directly. RFC thread: http://lists.llvm.org/pipermail/llvm-dev/2019-March/131004.html. Differential Revision: https://reviews.llvm.org/D60274 llvm-svn: 360984	2019-05-17 03:44:15 +00:00
Fangrui Song	2463239777	[X86] Support .reloc , R_{386,X86_64}_NONE, This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. See R_MIPS_NONE (D13659), R_ARM_NONE (D61992), R_AARCH64_NONE (D61973) for similar changes. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62014 llvm-svn: 360983	2019-05-17 03:25:39 +00:00
Fangrui Song	aa6102ad8e	[AArch64] Support .reloc , R_AARCH64_NONE, Summary: This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D61973 llvm-svn: 360981	2019-05-17 03:05:07 +00:00
Fangrui Song	43ca0e9eb8	[ARM] Support .reloc , R_ARM_NONE, R_ARM_NONE can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. Add a generic MCFixupKind FK_NONE as this kind of no-op relocation is ubiquitous on ELF and COFF, and probably available on many other binary formats. See D62014. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D61992 llvm-svn: 360980	2019-05-17 02:51:54 +00:00
Philip Reames	a74d654374	[LFTR] Strengthen assertions in genLoopLimit [NFCI] llvm-svn: 360978	2019-05-17 02:18:03 +00:00
Philip Reames	45e7690796	[IndVars] Don't reimplement Loop::isLoopInvariant [NFC] Using dominance vs a set membership check is indistinguishable from a compile time perspective, and the two queries return equivelent results. Simplify code by using the existing function. llvm-svn: 360976	2019-05-17 02:09:03 +00:00
Philip Reames	8e169cd266	[LFTR] Factor out a helper function for readability purpose [NFC] llvm-svn: 360972	2019-05-17 01:39:58 +00:00
Philip Reames	9283f1847c	Clarify comments on helpers used by LFTR [NFC] I'm slowly wrapping my head around this code, and am making comment improvements where I can. llvm-svn: 360968	2019-05-17 01:12:02 +00:00
Jonas Paulsson	9427961c89	[SystemZ] Bugfix in SystemZTargetLowering::combineIntDIVREM() Make sure to not unroll a vector division/remainder (with a constant splat divisor) after type legalization, since the scalar type may then be illegal. Review: Ulrich Weigand https://reviews.llvm.org/D62036 llvm-svn: 360965	2019-05-17 00:50:35 +00:00
Nico Weber	d764e7c660	Revert r360859: "Reland r360771 "[MergeICmps] Simplify the code."" It caused PR41917. llvm-svn: 360963	2019-05-17 00:43:53 +00:00
David L. Jones	4a5e01faa4	[X86][AsmParser] Add mnemonics missed in r360954. These are valid Jcc, but aren't based on the EFLAGS condition codes (Intel 64 and IA-32 Architetcures Software Developer's Manual Vol. 1, Appendix B). These are covered in clang/test, but not llvm/test. llvm-svn: 360960	2019-05-17 00:19:20 +00:00
Evgeniy Stepanov	7f281b2c06	HWASan exception support. Summary: Adds a call to __hwasan_handle_vfork(SP) at each landingpad entry. Reusing __hwasan_handle_vfork instead of introducing a new runtime call in order to be ABI-compatible with old runtime library. Reviewers: pcc Subscribers: kubamracek, hiraditya, #sanitizers, llvm-commits Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D61968 llvm-svn: 360959	2019-05-16 23:54:41 +00:00
David L. Jones	add7ed2281	[X86][AsmParser] Ignore "short" even harder in Intel syntax ASM. In Intel syntax, it's not uncommon to see a "short" modifier on Jcc conditional jumps, which indicates the offset should be a "short jump" (8-bit immediate offset from EIP, -128 to +127). This patch expands to all recognized Jcc condition codes, and removes the inline restriction. Clang already ignores "jmp short" in inline assembly. However, only "jmp" and a couple of Jcc are actually checked, and only inline (i.e., not when using the integrated assembler for asm sources). A quick search through asm-containing libraries at hand shows a pretty broad range of Jcc conditions spelled with "short." GAS ignores the "short" modifier, and instead uses an encoding based on the given immediate. MS inline seems to do the same, and I suspect MASM does, too. NASM will yield an error if presented with an out-of-range immediate value. Example of GCC 9.1 and MSVC v19.20, "jmp short" with offsets that do and do not fit within 8 bits: https://gcc.godbolt.org/z/aFZmjY Differential Revision: https://reviews.llvm.org/D61990 llvm-svn: 360954	2019-05-16 23:27:07 +00:00
David L. Jones	11305984d0	[X86][AsmParser] Rename "ConditionCode" variable to "ConditionPredicate". This better matches the verbiage in Intel documentation, and should help avoid confusion between these two different kinds of values, both of which are parsed from mnemonics. llvm-svn: 360953	2019-05-16 23:27:05 +00:00
Reid Kleckner	08c15df29f	[X86] Deduplicate symbol lowering logic, NFC Summary: This refactors four pieces of code that create SDNodes for references to symbols: - normal global address lowering (LEA, MOV, etc) - callee global address lowering (CALL) - external symbol address lowering (LEA, MOV, etc) - external symbol address lowering (CALL) Each of these pieces of code need to: - classify the reference - lower the symbol - emit a RIP wrapper if needed - emit a load if needed - add offsets if needed I think handling them all in one place will make the code easier to maintain in the future. Reviewers: craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61690 llvm-svn: 360952	2019-05-16 23:15:26 +00:00
Amy Huang	c2029068bc	Emit global variables as S_CONSTANT records for codeview debug info. Summary: This emits S_CONSTANT records for global variables. Currently this emits records for the global variables already being tracked in the LLVM IR metadata, which are just constant global variables; we'll also want S_CONSTANTs for static data members and enums. Related to https://bugs.llvm.org/show_bug.cgi?id=41615 Reviewers: rnk Subscribers: aprantl, hiraditya, llvm-commits, thakis Tags: #llvm Differential Revision: https://reviews.llvm.org/D61926 llvm-svn: 360948	2019-05-16 22:28:52 +00:00
Tim Renouf	e3cbdaf1b5	[CodeGen] Fixed de-optimization of legalize subvector extract The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU 3-dword memory instructions, caused a de-optimization problem for code with such a load that then bitcasts via vector of i8, because v12i8 is not an MVT so it legalizes the bitcast by widening it. This commit adds the ability to widen a bitcast using extract_subvector on the result, so the value does not need to go via memory. Differential Revision: https://reviews.llvm.org/D60457 Change-Id: Ie4abb7760547e54a2445961992eafc78e80d4b64 llvm-svn: 360942	2019-05-16 21:49:06 +00:00
Lang Hames	c97b50e224	[ORC] Change handling for SymbolStringPtr tombstones and empty keys. SymbolStringPtr used to use nullptr as its empty value and (since it performed ref-count operations on any non-nullptr) a pointer to a special pool-entry instance as its tombstone. This commit changes the scheme to use two invalid pointer values as the empty and tombstone values, and broadens the ref-count guard to prevent ref-counting operations from being performed on these pointers. This should improve the performance of SymbolStringPtrs used in DenseMaps/DenseSets, as ref counting operations will no longer be performed on the tombstone. llvm-svn: 360925	2019-05-16 18:29:34 +00:00
Joerg Sonnenberger	ec6ee797ec	Fix typos in comment. llvm-svn: 360921	2019-05-16 18:01:57 +00:00
Craig Topper	f09b9d419f	[X86] Use 0x9 instead of 0x1 as the immediate in some masked floor pattern. Similarly change 0x2 to 0xA for ceil. This suppresses exceptions which is what we should be doing for ceil and floor. We already use the correct immediate in patterns without masking. llvm-svn: 360915	2019-05-16 16:53:50 +00:00
Don Hinton	8249a8889d	[CommandLine] Don't allow duplicate categories. Summary: This is a fix to D61574, r360179, that allowed duplicate OptionCategory's. This change adds a check to make sure a category can only be added once even if the user passes it twice. Reviewed By: MaskRay Tags: #llvm Differential Revision: https://reviews.llvm.org/D61972 llvm-svn: 360913	2019-05-16 16:25:13 +00:00
Pavel Labath	2d29e16c30	Minidump: Add support for the MemoryList stream Summary: the stream format is exactly the same as for ThreadList and ModuleList streams, only the entry types are slightly different, so the changes in this patch are just straight-forward applications of established patterns. Reviewers: amccarth, jhenderson, clayborg Subscribers: markmentovai, lldb-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61885 llvm-svn: 360908	2019-05-16 15:17:30 +00:00
Matt Arsenault	99e6f4d11a	AMDGPU: Introduce TokenFactor for ABI register copies in call sequence The call was missing chain dependencies on the pre-call copies. I don't think this was causing any real issues however. llvm-svn: 360906	2019-05-16 15:10:27 +00:00
Matt Arsenault	df24c92c0f	AMDGPU: Assume xnack is enabled by default This is the conservatively correct default. It is always safe to assume xnack is enabled, but not the converse. Introduce a feature to blacklist targets where xnack can never be meaningfully enabled. I'm not sure the targets this is applied to is 100% correct. llvm-svn: 360903	2019-05-16 14:48:34 +00:00
Stephen Tozer	6f59b4b6d9	Resubmit: [Salvage] Change salvage debug info implementation to use DW_OP_LLVM_convert where needed Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=40645 Previously, LLVM had no functional way of performing casts inside of a DIExpression(), which made salvaging cast instructions other than Noop casts impossible. With the recent addition of DW_OP_LLVM_convert this salvaging is now possible, and so can be used to fix the attached bug as well as any cases where SExt instruction results are lost in the debugging metadata. This patch introduces this fix by expanding the salvage debug info method to cover these cases using the new operator. Differential revision: https://reviews.llvm.org/D61184 llvm-svn: 360902	2019-05-16 14:41:01 +00:00
Sanjay Patel	152f81fae8	[InstSimplify] fold fcmp (minnum, X, C1), C2 minnum(X, LesserC) == C --> false minnum(X, LesserC) >= C --> false minnum(X, LesserC) > C --> false minnum(X, LesserC) != C --> true minnum(X, LesserC) <= C --> true minnum(X, LesserC) < C --> true maxnum siblings will follow if there are no problems here. We should be able to perform some other combines when the constants are equal or greater-than too, but that would go in instcombine. We might also generalize this by creating an FP ConstantRange (similar to what we do for integers). Differential Revision: https://reviews.llvm.org/D61691 llvm-svn: 360899	2019-05-16 14:03:10 +00:00
Xing Xue	2dee094a08	Fixes for builds that require strict X/Open and POSIX compatiblity Summary: - Use alternative to MAP_ANONYMOUS for allocating mapped memory if it isn't available - Use strtok_r instead of strsep as part of getting program path - Don't try to find the width of a terminal using "struct winsize" and TIOCGWINSZ on POSIX builds. These aren't defined under POSIX (even though some platforms make them available when they shouldn't), so just check if we are doing a X/Open or POSIX compliant build first. Author: daltenty Reviewers: hubert.reinterpretcast, xingxue, andusy Reviewed By: hubert.reinterpretcast Subscribers: MaskRay, jsji, hiraditya, kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61326 llvm-svn: 360898	2019-05-16 14:02:13 +00:00
Adhemerval Zanella	2d28db6b9f	[AArch64] Handle ISD::LROUND and ISD::LLROUND This patch optimizes ISD::LROUND and ISD::LLROUND to fcvtas instruction. It currently only handles the scalar version. llvm-svn: 360894	2019-05-16 13:30:18 +00:00
Fangrui Song	e183340c29	Recommit [Object] Change object::SectionRef::getContents() to return Expected<StringRef> r360876 didn't fix 2 call sites in clang. Expected<ArrayRef<uint8_t>> may be better but use Expected<StringRef> for now. Follow-up of D61781. llvm-svn: 360892	2019-05-16 13:24:04 +00:00
Adhemerval Zanella	73643b5041	[CodeGen] Add lround/llround builtins This patch add the ISD::LROUND and ISD::LLROUND along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lround/llround generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. llvm-svn: 360889	2019-05-16 13:15:27 +00:00
Matt Arsenault	828b685ebe	RegAllocFast: Improve hinting heuristic Trace through multiple COPYs when looking for a physreg source. Add hinting for vregs that will be copied into physregs (we only hinted for vregs getting copied to a physreg previously). Give hinted a register a bonus when deciding which value to spill. This is part of my rewrite regallocfast series. In fact this one doesn't even have an effect unless you also flip the allocation to happen from back to front of a basic block. Nonetheless it helps to split this up to ease review of D52010 Patch by Matthias Braun llvm-svn: 360887	2019-05-16 12:50:39 +00:00
Matt Arsenault	27ac8408f6	GlobalISel: Add DstOp version of buildIntrinsic llvm-svn: 360879	2019-05-16 12:22:56 +00:00
Hans Wennborg	4da9ff9fcf	Revert r360876 "[Object] Change object::SectionRef::getContents() to return Expected<StringRef>" It broke the Clang build, see llvm-commits thread. > Expected<ArrayRef<uint8_t>> may be better but use Expected<StringRef> for now. > > Follow-up of D61781. llvm-svn: 360878	2019-05-16 12:08:34 +00:00
Matt Arsenault	a8f88c388f	AMDGPU/GlobalISel: Correct regbank for 1-bit and/or/xor Bool values should use the scc/vcc regbank since r350611. llvm-svn: 360877	2019-05-16 12:06:41 +00:00
Fangrui Song	a076ec54be	[Object] Change object::SectionRef::getContents() to return Expected<StringRef> Expected<ArrayRef<uint8_t>> may be better but use Expected<StringRef> for now. Follow-up of D61781. llvm-svn: 360876	2019-05-16 11:33:48 +00:00
Cullen Rhodes	472c6ef8b0	[AArch64][SVE2] Asm: implement CMLA/SQRDCMLAH instructions Summary: This patch adds support for the indexed and unpredicated vectors forms of the CMLA and SQRDCMLAH instructions. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D61906 llvm-svn: 360871	2019-05-16 09:42:22 +00:00
Cullen Rhodes	07eba98dd7	[AArch64][SVE2] Asm: implement CDOT instruction Summary: The complex DOT instructions perform a dot-product on quadtuplets from two source vectors and the resuling wide real or wide imaginary is accumulated into the destination register. The instructions come in two forms: Vector form, e.g. cdot z0.s, z1.b, z2.b, #90 - complex dot product on four 8-bit quad-tuplets, accumulating results in 32-bit elements. The complex numbers in the second source vector are rotated by 90 degrees. cdot z0.d, z1.h, z2.h, #180 - complex dot product on four 16-bit quad-tuplets, accumulating results in 64-bit elements. The complex numbers in the second source vector are rotated by 180 degrees. Indexed form, e.g. cdot z0.s, z1.b, z2.b[3], #0 - complex dot product on four 8-bit quad-tuplets, with specified quadtuplet from second source vector, accumulating results in 32-bit elements. cdot z0.d, z1.h, z2.h[1], #0 - complex dot product on four 16-bit quad-tuplets, with specified quadtuplet from second source vector, accumulating results in 64-bit elements. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer, rovka Differential Revision: https://reviews.llvm.org/D61903 llvm-svn: 360870	2019-05-16 09:33:44 +00:00
Cullen Rhodes	064f6ab556	[AArch64][SVE2] Asm: add unpredicated integer multiply instructions Summary: Add support for the following instructions: * MUL (indexed and unpredicated vectors forms) * SQDMULH (indexed and unpredicated vectors forms) * SQRDMULH (indexed and unpredicated vectors forms) * SMULH (unpredicated, predicated form added in SVE) * UMULH (unpredicated, predicated form added in SVE) * PMUL (unpredicated) The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer, rovka Differential Revision: https://reviews.llvm.org/D61902 llvm-svn: 360867	2019-05-16 09:07:26 +00:00
Fangrui Song	3e92df3e39	Add Triple::isPPC64() llvm-svn: 360864	2019-05-16 08:31:22 +00:00
Clement Courbet	c4fdd717ef	Reland r360771 "[MergeICmps] Simplify the code." This revision does not seem to be the culprit. llvm-svn: 360859	2019-05-16 06:18:02 +00:00

... 13 14 15 16 17 ...

124284 Commits