llvm-project

Commit Graph

Author	SHA1	Message	Date
Alex Bradbury	61aa940074	[RISCV] Introduce codegen patterns for RV64M-only instructions As discussed on llvm-dev <http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>, we have to be careful when trying to select the *w RV64M instructions. i32 is not a legal type for RV64 in the RISC-V backend, so operations have been promoted by the time they reach instruction selection. Information about whether the operation was originally a 32-bit operations has been lost, and it's easy to write incorrect patterns. Similarly to the variable 32-bit shifts, a DAG combine on ANY_EXTEND will produce a SIGN_EXTEND if this is likely to result in sdiv/udiv/urem being selected (and so save instructions to sext/zext the input operands). Differential Revision: https://reviews.llvm.org/D53230 llvm-svn: 350993	2019-01-12 07:43:06 +00:00
Alex Bradbury	d05eae7a7b	[RISCV] Add patterns for RV64I SLLW/SRLW/SRAW instructions This restores support for selecting the SLLW/SRLW/SRAW instructions, which was removed in rL348067 as the previous patterns made some unsafe assumptions. Also see the related llvm-dev discussion <http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html> Ultimately I didn't introduce a custom SelectionDAG node, but instead added a DAG combine that inserts an AssertZext i5 on the shift amount for an i32 variable-length shift and also added an ANY_EXTEND DAG-combine which will instead produce a SIGN_EXTEND for an i32 variable-length shift, increasing the opportunity to safely select SLLW/SRLW/SRAW. There are obviously different ways of addressing this (a number discussed in the llvm-dev thread), so I'd welcome further feedback and comments. Note that there are now some cases in test/CodeGen/RISCV/rv64i-exhaustive-w-insts.ll where sraw/srlw/sllw is selected even though sra/srl/sll could be used without any extra instructions. Given both are semantically equivalent, there doesn't seem a good reason to prefer one vs the other. Given that would require more logic to still select sra/srl/sll in those cases, I've left it preferring the *w variants. Differential Revision: https://reviews.llvm.org/D56264 llvm-svn: 350992	2019-01-12 07:32:31 +00:00
Craig Topper	bf61525e8c	[X86] When lowering v1i1/v2i1/v4i1/v8i1 load/store with avx512f, but not avx512dq, use v16i1 as the intermediate mask type instead of v8i1. We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type. By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned. llvm-svn: 350989	2019-01-12 02:22:10 +00:00
Craig Topper	abe6ef8d09	[X86] Add ISD nodes for masked truncate so we can properly represent when the output has more elements than the input due to needing to be 128 bits. We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask. This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that. This fixes some of PR34877 llvm-svn: 350985	2019-01-12 00:55:27 +00:00
Nikita Popov	9f6e9cf71b	[ConstantFolding] Fold undef for integer intrinsics This fixes https://bugs.llvm.org/show_bug.cgi?id=40110. This implements handling of undef operands for integer intrinsics in ConstantFolding, in particular for the bitcounting intrinsics (ctpop, cttz, ctlz), the with.overflow intrinsics, the saturating math intrinsics and the funnel shift intrinsics. The undef behavior follows what InstSimplify does for the general cas e of non-constant operands. For the bitcount intrinsics (where InstSimplify doesn't do undef handling -- there cannot be a combination of an undef + non-constant operand) I'm using a 0 result if the intrinsic is defined for zero and undef otherwise. Differential Revision: https://reviews.llvm.org/D55950 llvm-svn: 350971	2019-01-11 21:18:00 +00:00
Alexey Bataev	8da9a7538e	[SLP]Moved NVPTX test under NVPTX directory, NFC. llvm-svn: 350969	2019-01-11 20:42:48 +00:00
Alexey Bataev	ce2c8b3360	[SLP]Update test checks for the SPL vectorizer, NFC. llvm-svn: 350967	2019-01-11 20:21:14 +00:00
Nirav Dave	6b7f5aac72	[X86] Fix incomplete handling of register-assigned variables in parsing. Teach x86 assembly operand parsing to distinguish between assembler variable assigned to named registers and those assigned to immediate values. Reviewers: rnk, nickdesaulniers, void Subscribers: hiraditya, jyknight, llvm-commits Differential Revision: https://reviews.llvm.org/D56287 llvm-svn: 350966	2019-01-11 20:17:36 +00:00
Alex Bradbury	eea0b07028	[RISCV][NFC] Add CHECK lines for atomic operations on RV64I As or RV32I, we include these for completeness. Committing now to make it easier to review the RV64A patch. llvm-svn: 350962	2019-01-11 19:46:48 +00:00
Evandro Menezes	946fe976fd	[llvm-mca] Update tests for Exynos (NFC) Update test cases for Exynos M4. llvm-svn: 350961	2019-01-11 19:36:27 +00:00
Evandro Menezes	0674762112	[AArch64] Create feature set for Exynos M4 Complete the feature set for Exynos M4 and update test cases. llvm-svn: 350953	2019-01-11 18:54:25 +00:00
Pirama Arumuga Nainar	cc07dabdaa	[Legalizer] Use correct ValueType of SELECT_CC node during Float promotion Summary: When legalizing the result of a SELECT_CC node by promoting the floating-point type, use the promoted-to type rather than the original type. Fix PR40273. Reviewers: efriedma, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56566 llvm-svn: 350951	2019-01-11 18:46:02 +00:00
Teresa Johnson	290a839891	[LTO] Record whether LTOUnit splitting is enabled in index Summary: Records in the module summary index whether the bitcode was compiled with the option necessary to enable splitting the LTO unit (e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit). The information is passed down to the ModuleSummaryIndex builder via a new module flag "EnableSplitLTOUnit", which is propagated onto a flag on the summary index. This is then used during the LTO link to check whether all linked summaries were built with the same value of this flag. If not, an error is issued when we detect a situation requiring whole program visibility of the class hierarchy. This is the case when both of the following conditions are met: 1) We are performing LowerTypeTests or Whole Program Devirtualization. 2) There are type tests or type checked loads in the code. Note I have also changed the ThinLTOBitcodeWriter to also gate the module splitting on the value of this flag. Reviewers: pcc Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits Differential Revision: https://reviews.llvm.org/D53890 llvm-svn: 350948	2019-01-11 18:31:57 +00:00
Jordan Rupprecht	298ea3f577	[llvm-objcopy][NFC] Consistenly use two dashes for flags in tests. Summary: As pointed out in D53667, our use of hyphens in flags can be inconsistent, mixing `-` with `--`. This change makes all long style flags use `--`. Automatically changed via: ``` find test/tools/llvm-objcopy/ELF -type f \| xargs sed -i 's/ -$[a-zA-Z]\{3\}$/ --\1/g' ``` Two false positives were manually fixed/reverted. Reviewers: jhenderson, espindola, alexshap Reviewed By: jhenderson Subscribers: emaste, javed.absar, arichardson, fedor.sergeev, jakehehrlich, llvm-commits Differential Revision: https://reviews.llvm.org/D56513 llvm-svn: 350944	2019-01-11 18:06:31 +00:00
Vedant Kumar	ee10ef737e	[MergeFunc] Erase unused duplicate functions if they are discardable MergeFunc only deletes unused duplicate functions if they have local linkage, but it should be safe to relax this to any "discardable if unused" linkage type. Differential Revision: https://reviews.llvm.org/D56574 llvm-svn: 350939	2019-01-11 17:56:35 +00:00
Ehsan Amiri	f452f116d2	[Jump Threading] Unfold a select insn that feeds a switch via a phi node Currently when a select has a constant value in one branch and the select feeds a conditional branch (via a compare/ phi and compare) we unfold the select statement. This results in threading the conditional branch later on. Similar opportunity exists when a select (with a constant in one branch) feeds a switch (via a phi node). The patch unfolds select under this condition. A testcase is provided. llvm-svn: 350931	2019-01-11 15:52:57 +00:00
Sanjay Patel	40cd4b77e9	[x86] allow insert/extract when matching horizontal ops Previously, we limited this transform to cases where the extraction into the build vector happens from vectors of the same type as the build vector, but that's not required. There's a slight potential regression seen in the AVX512 result for phadd -- we're using the 256-bit flavor of the instruction now even though the 128-bit subset is sufficient. The same problem could already be seen in the AVX2 result. Follow-up patches will attempt to narrow that back down. llvm-svn: 350928	2019-01-11 14:27:59 +00:00
Martin Storsjo	fb909207c6	[llvm-objcopy] [COFF] Implmement --strip-unneeded and -x/--discard-all for symbols Differential Revision: https://reviews.llvm.org/D56480 llvm-svn: 350927	2019-01-11 14:13:04 +00:00
Martin Storsjo	d1cc64fe12	[llvm-objcopy] [COFF] Fix writing object files without symbols/string table Previously, this was broken - by setting PointerToSymbolTable to zero but still actually writing the string table length, the object file header was corrupted. Differential Revision: https://reviews.llvm.org/D56584 llvm-svn: 350926	2019-01-11 13:47:37 +00:00
Dmitry Venikov	37c1e2e7a9	[llvm-symbolizer] Add -exe, -e as aliases to -obj Summary: Provides -exe, -e as aliases to -obj. Motivation: https://bugs.llvm.org/show_bug.cgi?id=40071 Reviewers: ruiu, rnk, fjricci, jhenderson Reviewed By: jhenderson Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56580 llvm-svn: 350925	2019-01-11 11:51:52 +00:00
Craig Topper	b97885cc2e	[X86] Change vXi1 extract_vector_elt lowering to be legal if the index is 0. Add DAG combine to turn scalar_to_vector+extract_vector_elt into extract_subvector. We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions. Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector. This fixes some of the regressions from D350800. llvm-svn: 350918	2019-01-11 05:44:56 +00:00
Francis Visoiu Mistrih	f57a247df7	[llvm-objdump][MachO] Disable some invalid input tests It causes some (but not all) bots to fail. I'll look into it tomorrow morning. Remove the tests for now to make the bots green. llvm-svn: 350908	2019-01-10 23:46:31 +00:00
Heejin Ahn	e73c7a1ab2	[WebAssembly] Fix stack pointer store check in RegStackify Summary: We now use __stack_pointer global and global.get/global.set instruction. This fixes the checking routine for stack_pointer writes accordingly. This also fixes the existing __stack_pointer test in reg-stackify.ll: That test used to pass not because of __stack_pointer clashes but because the function `stackpointer_callee` was not marked as `readnone`, so it was assumed to possibly write to memory arbitraily, and `global.set` instruction was marked as `mayStore` in the .td definition, so they were identified as intervening writes. After we added `readnone` to its attribute, this test fails without this patch. Reviewers: dschuff, sunfish Subscribers: jgravelle-google, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D56094 llvm-svn: 350906	2019-01-10 23:12:07 +00:00
Anton Korobeynikov	0681d6bc90	[MSP430] Minor fixes/improvements for assembler/disassembler * Teach AsmParser to recognize @rn in distination operand as 0(rn). * Do not allow Disassembler decoding instructions that have size more than a number of input bytes. * Fix UB in MSP430MCCodeEmitter. Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D56547 llvm-svn: 350903	2019-01-10 22:59:50 +00:00
Anton Korobeynikov	29ffb6d558	[MSP430] Add missing instruction forms * Add missing mm, [r\|m]n, [r\|m]p instruction forms. * Fix bit16mc instruction. Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D56546 llvm-svn: 350902	2019-01-10 22:54:53 +00:00
Thomas Lively	64a39a1c4e	[WebAssembly] Add unimplemented-simd128 subtarget feature Summary: This is a third attempt, but this time we have vetted it on Windows first. The previous errors were due to an uninitialized class member. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D56560 llvm-svn: 350901	2019-01-10 22:32:11 +00:00
Martin Storsjo	44aefe0bfb	[llvm-objcopy] [COFF] Fix a test matching pathnames for Windows. NFC. llvm-svn: 350899	2019-01-10 22:05:21 +00:00
Martin Storsjo	10b7296484	[llvm-objcopy] [COFF] Add support for removing symbols Differential Revision: https://reviews.llvm.org/D55881 llvm-svn: 350893	2019-01-10 21:28:24 +00:00
Alina Sbirlea	cae12edaaa	Use MemorySSA in LICM to do sinking and hoisting. Summary: Step 2 in using MemorySSA in LICM: Use MemorySSA in LICM to do sinking and hoisting, all under "EnableMSSALoopDependency" flag. Promotion is disabled. Enable flag in LICM sink/hoist tests to test correctness of this change. Moved one test which relied on promotion, in order to test all sinking tests. Reviewers: sanjoy, davide, gberry, george.burgess.iv Subscribers: llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D40375 llvm-svn: 350879	2019-01-10 19:29:04 +00:00
Craig Topper	844f989608	[X86] Call SimplifyDemandedBits on conditions of X86ISD::SHRUNKBLEND This extends to combineVSelectToShrunkBlend to be able to resimplify SHRUNKBLENDS that have already been created. This should help some of the regressions from D56387 Differential Revision: https://reviews.llvm.org/D56421 llvm-svn: 350875	2019-01-10 19:05:34 +00:00
Francis Visoiu Mistrih	1bd054ead2	[llvm-objdump][MachO] Fix test to work on Windows This fails in http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/3208/steps/stage%201%20check/logs/stdio. llvm-svn: 350871	2019-01-10 18:32:30 +00:00
Francis Visoiu Mistrih	b8819dc1e3	[llvm-objdump][MachO] Fix error reporting after r350848 and r350849 llvm-svn: 350851	2019-01-10 17:36:54 +00:00
Dan Liew	4d8c8fe62e	[FileCheck] Don't propagate `FILECHECK_DUMP_INPUT_ON_FAILURE` and `FILECHECK_OPTS` into environment for FileCheck tests. Summary: This fixes the following FileCheck tests: * FileCheck/dump-input-enable.txt * FileCheck/match-full-lines.txt when `FILECHECK_DUMP_INPUT_ON_FAILURE` is set in the environment. By default llvm-lit propagates `FILECHECK_DUMP_INPUT_ON_FAILURE` and `FILECHECK_OPTS` from llvm-lit's environment into the test environment. Unfortunately this can break FileCheck's tests because they expect that these environment variables not to be set. rdar://problem/47176262 Reviewers: jdenny, probinson, george.karpenkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56541 llvm-svn: 350850	2019-01-10 17:24:06 +00:00
Francis Visoiu Mistrih	1b427d4dba	[llvm-objdump][MachO] Use the -dsym file name when reporting errors Instead of using the binary filename. llvm-svn: 350849	2019-01-10 17:16:42 +00:00
Francis Visoiu Mistrih	9f4f01182e	[llvm-objdump][MachO] Correctly handle the llvm::Error when -dsym has errors In an assert build, the Error gets destroyed and we get "Program aborted due to an unhandled Error:". In release, we get an empty message. llvm-svn: 350848	2019-01-10 17:16:37 +00:00
George Rimar	8e0a70be24	[llvm-objdump] - Do not include reserved undefined symbol in -t output. This is https://bugs.llvm.org/show_bug.cgi?id=26892, GNU objdump hides the special symbol entry: SYMBOL TABLE: 000000000000a7e0 l F .text 00000000000003f9 bi_copymodules while llvm-objdump does not: SYMBOL TABLE: 0000000000000000 UND 00000000 000000000000a7e0 l F .text 000003f9 bi_copymodules Patch makes the behavior of the llvm-objdump to be consistent with the GNU objdump. Differential revision: https://reviews.llvm.org/D56076 llvm-svn: 350840	2019-01-10 16:24:10 +00:00
Neil Henning	e85d45a699	[AMDGPU] Fix dwordx3/southern-islands failures. This commit fixes the dwordx3/southern-islands failures that were found in bugzilla https://bugs.llvm.org/show_bug.cgi?id=40129, by not generating the dwordx3 variants of load/store instructions that were added to the ISA after southern islands. Differential Revision: https://reviews.llvm.org/D56434 llvm-svn: 350838	2019-01-10 16:21:08 +00:00
Dmitry Venikov	60d71e4684	[llvm-symbolizer] Add -p as alias to -pretty-print Summary: Provides -p as a short alias for -pretty-print. Motivation: https://bugs.llvm.org/show_bug.cgi?id=40076 Reviewers: samsonov, khemant, ruiu, rnk, fjricci, jhenderson Reviewed By: jhenderson Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56542 llvm-svn: 350832	2019-01-10 15:33:35 +00:00
Alex Bradbury	6f302b8a69	[RISCV][MC] Add support for evaluating constant symbols as immediates This further improves compatibility with GNU as, allowing input such as the following to be assembled: .equ CONST, 0x123456 li a0, CONST addi a0, a0, %lo(CONST) .equ CONST, 1 slli a0, a0, CONST Note that we don't have perfect compatibility with gas, as it will avoid emitting a relocation in this case: addi a0, a0, %lo(CONST2) .equ CONST2, 0x123456 Thanks to Shiva Chen for suggesting a better way to approach this during review. Differential Revision: https://reviews.llvm.org/D52298 llvm-svn: 350831	2019-01-10 15:33:17 +00:00
Sanjay Patel	87ae1460f7	[x86] fix remaining miscompile bug in horizontal binop matching (PR40243) When we use the partial-matching function on a 128-bit chunk, we must account for the possibility that we've matched undef halves of the original source vectors, so the outputs may need to be reset. This should allow closing PR40243: https://bugs.llvm.org/show_bug.cgi?id=40243 llvm-svn: 350830	2019-01-10 15:27:23 +00:00
Sanjay Patel	ed5cfc6792	[x86] fix horizontal binop matching for 256-bit vectors (PR40243) This is a partial fix for: https://bugs.llvm.org/show_bug.cgi?id=40243 ...as seen in the integer test, we still need to correct the result when using the existing (old) horizontal op matching function because it does not model the way x86 256-bit horizontal ops return results (each 128-bit half is its own horizontal-op). A potential follow-up change for that is discussed in the bug report - see also D56490. This generally duplicates a lot of the existing matching code, but we can't just remove that without introducing regressions, so the existing code is renamed and used less often. Follow-ups may try to reduce that overlap. Differential Revision: https://reviews.llvm.org/D56450 llvm-svn: 350826	2019-01-10 15:04:52 +00:00
Bryan Chan	7ce5775e62	[AArch64] Fix operation actions for FP16 vector intrinsics Summary: This patch changes the legalization action for some half-precision floating- point vector intrinsics (FSIN, FLOG, etc.) from Promote to Expand. These ops are not supported in hardware for half-precision vectors, but promotion is not always possible (for v8f16 operands). Changing the action to Expand fixes an assertion failure in the legalizer when the frontend produces such ops. In addition, a quick microbenchmark shows that, in the v4f16 case, expanding introduces fewer spills and is therefore slightly faster than promoting. Reviewers: t.p.northover, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56296 llvm-svn: 350825	2019-01-10 15:02:37 +00:00
George Rimar	70d197d466	[llvm-objdump] - Implement -z/--disassemble-zeroes. This is https://bugs.llvm.org/show_bug.cgi?id=37151, GNU objdump spec says that "Normally the disassembly output will skip blocks of zeroes.", but currently, llvm-objdump prints them. The patch implements the -z/--disassemble-zeroes option and switches the default to always skip blocks of zeroes. Differential revision: https://reviews.llvm.org/D56083 llvm-svn: 350823	2019-01-10 14:55:26 +00:00
Simon Pilgrim	8c221b3f76	[X86] Add SSE41 vector abs tests llvm-svn: 350822	2019-01-10 14:26:15 +00:00
James Henderson	3a6a5a3370	[llvm-symbolizer] Add support for specifying addresses on command-line See https://bugs.llvm.org/show_bug.cgi?id=40070. GNU addr2line accepts input addresses both on the command-line and via stdin. llvm-symbolizer previously only supported the latter. This change adds support for the former. As with addr2line, the new behaviour is to only look for addresses on stdin if no positional arguments were provided to llvm-symbolizer. Reviewed by: ruiu Differential Revision: https://reviews.llvm.org/D56272 llvm-svn: 350821	2019-01-10 14:10:02 +00:00
Bjorn Pettersson	15313a3705	Fix RUN line in test/Transforms/LoopDeletion/crashbc.ll llvm-svn: 350807	2019-01-10 09:58:23 +00:00
Sam Parker	7208221452	[ARM] Size reduce teq to eors Add t2TEQrr to the map of instructions with can be reduced down into a T1 instruction. This is a special case because TEQ just sets the CPSR and doesn't write to a GPR, which is not the case for EOR. So, we need to ensure that the EOR can write to the first operand. Differential Revision: https://reviews.llvm.org/D56255 llvm-svn: 350801	2019-01-10 08:36:33 +00:00
Craig Topper	5d20eb240f	[X86] Disable DomainReassignment pass when AVX512BW is disabled to avoid injecting VK32/VK64 references into the MachineIR Summary: This pass replaces GR8/GR16/GR32/GR64 with their equivalent sized mask register classes. But VK32/VK64 aren't legal without AVX512BW. Apparently this mostly appears to work if the register coalescer is able to remove the VK32/VK64 register class reference. Or if we don't ever spill it. But there's no guarantee of that. Another Intel employee managed to trigger a crash due to this with ISPC. Unfortunately, I've lost the test case he sent me at the time. I'm trying to get him to reproduce it for me. I'd like to get this in before 8.0 branches since its a little scary. The regressions here are unfortunate, but I think we can make some improvements to DAG combine, load folding, etc. to fix them. Just not sure if we can get that done for 8.0. Fixes PR39741 Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56460 llvm-svn: 350800	2019-01-10 07:43:54 +00:00
Zi Xuan Wu	64c956eea8	Recommit "[PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel" This re-commit r350685. Differential Revision: https://reviews.llvm.org/D55686 llvm-svn: 350799	2019-01-10 06:20:14 +00:00
Mandeep Singh Grang	859cb2e35d	[AArch64] Emit the correct MCExpr relocations specifiers like VK_ABS_G0, etc Summary: D55896 and D56029 add support to emit fixups for :abs_g0: , :abs_g1_s: , etc. This patch adds the necessary enums and MCExpr needed for lowering these. Reviewers: rnk, mstorsjo, efriedma Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56037 llvm-svn: 350798	2019-01-10 04:59:44 +00:00
Thomas Lively	fdd4999b86	Revert "[WebAssembly] Add simd128-unimplemented subtarget feature" This reverts rL350791. llvm-svn: 350795	2019-01-10 04:09:25 +00:00
Thomas Lively	eb6f9abd41	[WebAssembly] Add simd128-unimplemented subtarget feature This is a second attempt at r350778, which was reverted in r350789. The only change is that the unimplemented-simd128 feature has been renamed simd128-unimplemented, since naming it unimplemented-simd128 somehow made the simd128 feature flag enable the unimplemented-simd128 feature on Windows. llvm-svn: 350791	2019-01-10 02:55:52 +00:00
Thomas Lively	fdca5fab60	Revert "[WebAssembly] Add unimplemented-simd128 subtarget feature" This reverts L350778. llvm-svn: 350789	2019-01-10 01:37:44 +00:00
Thomas Lively	2eeade1814	[WebAssembly] Add unimplemented-simd128 subtarget feature Summary: This replaces the old ad-hoc -wasm-enable-unimplemented-simd flag. Also makes the new unimplemented-simd128 feature imply the simd128 feature. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton Differential Revision: https://reviews.llvm.org/D56501 llvm-svn: 350778	2019-01-09 23:59:37 +00:00
Eli Friedman	d4e7a0d83c	[SimplifyLibCalls] Fix memchr expansion for constant strings. The C standard says "The memchr function locates the first occurrence of c (converted to an unsigned char)[...]". The expansion was missing the conversion to unsigned char. Fixes https://bugs.llvm.org/show_bug.cgi?id=39041 . Differential Revision: https://reviews.llvm.org/D55947 llvm-svn: 350775	2019-01-09 23:39:26 +00:00
Florian Hahn	7c122c1255	[AArch64] Add test for constant shrinking with multiple users (NFC). Test to avoid regression fixed by rL350684. llvm-svn: 350762	2019-01-09 21:04:36 +00:00
Francis Visoiu Mistrih	ac6454a7f6	[CodeGen] Ignore return sext/zext attributes of unused results for tail calls If the caller's return type does not have a zeroext attribute but the callee does a tail call zeroext, we won't consider the tail call during CodeGenPrepare because the attributes don't match. However, if the result of the tail call has no uses, it makes sense to drop the sext/zext attributes. Differential Revision: https://reviews.llvm.org/D56486 llvm-svn: 350753	2019-01-09 19:46:15 +00:00
Easwaran Raman	ed279752f0	[Inliner] Assert that the computed inline threshold is non-negative. Reviewers: chandlerc Subscribers: haicheng, llvm-commits Differential Revision: https://reviews.llvm.org/D56409 llvm-svn: 350751	2019-01-09 19:26:17 +00:00
Thomas Lively	edb54b22d3	[WebAssembly] Standardize order of SIMD bitselect arguments Summary: For some reason the backend assumed that the condition mask would be the first argument to the LLVM intrinsic, but everywhere else the condition mask is the third argument. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56412 llvm-svn: 350746	2019-01-09 18:13:11 +00:00
Sanjay Patel	15f2a4d1b9	[x86] use 'nounwind' to remove test noise; NFC llvm-svn: 350745	2019-01-09 17:29:18 +00:00
Aleksandar Beserminji	8abf680424	[mips][micrompis] Emit 16bit NOPs by default Emit 16bit NOPs by default. Use 32bit NOPs in delay slots where necessary. Differential https://reviews.llvm.org/D55323 llvm-svn: 350733	2019-01-09 15:58:02 +00:00
Alexey Bataev	59e916c214	[DEBUGINFO][NVPTX]Make tests more strict, NFC. NVPTX format requires that no labels/label arithmetics is used in the debug info sections. To avoid possible problems with the adding/modifying the debug info functionality, made these tests more strict. llvm-svn: 350731	2019-01-09 15:41:44 +00:00
Valery Pykhtin	b7a459547d	Revert "[AMDGPU] Fix DPP combiner" This reverts commit e3e2923a39cbec3b3bc3a7d3f0e9a77a4115080e, svn revision rL350721 llvm-svn: 350730	2019-01-09 15:21:53 +00:00
Kristof Beyls	c650ff77eb	Initial AArch64 SLH implementation. This is an initial implementation for Speculative Load Hardening for AArch64. It builds on top of the recently introduced AArch64SpeculationHardening pass. This doesn't implement (yet) some of the optimizations implemented for the X86SpeculativeLoadHardening pass. I thought introducing the optimizations incrementally in follow-up patches should make this easier to review. Differential Revision: https://reviews.llvm.org/D55929 llvm-svn: 350729	2019-01-09 15:13:34 +00:00
George Rimar	3ba0f3c0fb	[llvm-objdump] - Print symbol addressed when dumping disassembly output (-d) When GNU objdump dumps the input with -d it prints the symbol addresses, for example: 0000000000000031 <foo>: 31: 00 00 add %al,(%rax) ... llvm-objdump currently does not do that. Patch changes the behavior to match the GNU objdump. That is useful for implementing -z/--disassemble-zeroes (D56083), it allows omitting first zero bytes and keep the information about the symbol address in the output. Differential revision: https://reviews.llvm.org/D56123 llvm-svn: 350726	2019-01-09 14:43:33 +00:00
Nico Weber	b61910bbe8	Fix typo in comment llvm-svn: 350725	2019-01-09 14:20:20 +00:00
Simon Pilgrim	fcabfc8666	[X86][SSE] Cleanup shuffle combining test check prefixes Share prefixes whenever possible, use X86 instead of X32. llvm-svn: 350722	2019-01-09 13:46:14 +00:00
Valery Pykhtin	1e0b5c719b	[AMDGPU] Fix DPP combiner Fixed issue with identity values and other cases, f32/f16 identity values to be added later. fma/mac instructions is disabled for now. Test is fully reworked, added comments. Other fixes: 1. dpp move with uses and old reg initializer should be in the same BB. 2. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Othervise the old register value is checked for identity. 3. Added add, subrev, and, or instructions to the old folding function. 4. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user. Differential revision: https://reviews.llvm.org/D55444 llvm-svn: 350721	2019-01-09 13:43:32 +00:00
Florian Hahn	9697d2a764	Revert r350647: "[NewPM] Port tsan" This patch breaks thread sanitizer on some macOS builders, e.g. http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/52725/ llvm-svn: 350719	2019-01-09 13:32:16 +00:00
Simon Pilgrim	5a7132ff0f	[X86] Enable combining shuffles to PACKSS/PACKUS for 256/512-bit vectors llvm-svn: 350716	2019-01-09 13:23:28 +00:00
Anton Korobeynikov	c18e90369d	[MSP430] Optimize 'shl x, 8[+ N] -> swpb(zext(x)) [<< N]' for i16 Perform additional simplification to reduce shift amount. Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D56016 llvm-svn: 350712	2019-01-09 13:03:01 +00:00
Anton Korobeynikov	9222ed4485	[MSP430] Fix crash while lowering llvm.stacksave/stackrestore Perform the usual expansion of stacksave / restore intrinsics. Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D54890 llvm-svn: 350710	2019-01-09 12:52:15 +00:00
Simon Pilgrim	17eace47cb	[X86] Add extra test coverage for combining shuffles to PACKSS/PACKUS llvm-svn: 350707	2019-01-09 12:34:10 +00:00
Diogo N. Sampaio	1eb31c8e94	[AArch64] Move feature predctrl to predres Follow up patch of rL350385, for adding predres command line option. This patch renames the feature as to keep it aligned with the option passed by/to clang Differential Revision: https://reviews.llvm.org/D56484 llvm-svn: 350702	2019-01-09 11:24:15 +00:00
David Stenberg	33b192d72b	[DebugInfo] Omit location list entries with empty ranges Summary: This fixes PR39710. In that case we emitted a location list looking like this: .Ldebug_loc0: .quad .Lfunc_begin0-.Lfunc_begin0 .quad .Lfunc_begin0-.Lfunc_begin0 .short 1 # Loc expr size .byte 85 # DW_OP_reg5 .quad .Lfunc_begin0-.Lfunc_begin0 .quad .Lfunc_end0-.Lfunc_begin0 .short 1 # Loc expr size .byte 85 # super-register DW_OP_reg5 .quad 0 .quad 0 As seen, the first entry's beginning and ending addresses evalute to 0, which meant that the entry inadvertently became an "end of list" entry, resulting in the location list ending sooner than expected. To fix this, omit all entries with empty ranges. Location list entries with empty ranges do not have any effect, as specified by DWARF, so we might as well drop them: "A location list entry (but not a base address selection or end of list entry) whose beginning and ending addresses are equal has no effect because the size of the range covered by such an entry is zero." Reviewers: davide, aprantl, dblaikie Reviewed By: aprantl Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D55919 llvm-svn: 350698	2019-01-09 09:58:59 +00:00
Matt Arsenault	3dddb163dd	GlobalISel: Implement fewerElements for implicit_def llvm-svn: 350697	2019-01-09 07:51:52 +00:00
Matt Arsenault	befee402ff	GlobalISel: Implement widenScalar for implicit_def llvm-svn: 350695	2019-01-09 07:34:14 +00:00
Zi Xuan Wu	f2a75eef41	Revert "[PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel" This reverts commit r350685. See compile assert in compiler-rt. llvm-svn: 350693	2019-01-09 06:12:24 +00:00
Zi Xuan Wu	9479f6d72e	[PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel Bad machine code: Illegal virtual register for instruction function: TestULE basic block: %bb.0 entry (0x1000a39b158) instruction: %2:crrc = FCMPUD %1:vsfrc, %3:f8rc operand 1: %1:vsfrc Fix assert about missing match between fcmp instruction and register class. We should use vsx related cmp instruction xvcmpudp instead of fcmpu when vsx is opened. add -verifymachineinstrs option into related test cases to enable the verify pass. Differential Revision: https://reviews.llvm.org/D55686 llvm-svn: 350685	2019-01-09 02:31:10 +00:00
Matt Arsenault	0ad1b71fe3	RegisterCoalescer: Assume CR_Replace for SubRangeJoin Currently it's possible for following check on V.WriteLanes (which is not really meaningful during SubRangeJoin) to pass for one half of the pair, and then fall through to to one of the impossible or unresolved states. This then fails as inconsistent on the other half. During the main range join, the check between V.WriteLanes and OtherV.ValidLanes must have passed, meaning this should be a CR_Replace. Fixes most of the testcases in bugs 39542 and 39602 llvm-svn: 350678	2019-01-08 23:22:18 +00:00
Matt Arsenault	2c807410fd	RegisterCoalescer: Defer clearing implicit_def lanes We can't go back and recover the lanes if it turns out the implicit_def really can't be erased. Assume all lanes are valid if an unresolved conflict is encountered. There aren't any tests where this seems to matter either way, but this seems like a safer option. Fixes bug 39602 llvm-svn: 350676	2019-01-08 23:10:47 +00:00
Sanjay Patel	bad2406409	[InstCombine] remove stale comments; NFC These changed with rL350672. llvm-svn: 350674	2019-01-08 22:52:08 +00:00
Rong Xu	52aa224aff	[llvm-profdata] add value-cutoff functionality in show command This patch improves llvm-profdata show command: (1) add -value-cutoff=<N> option: Show only those functions whose max count values are greater or equal to N. (2) add -list-below-cutoff option: Only output names of functions whose max count value are below the cutoff. (3) formats value-profile counts and prints out percentage. Differential Revision: https://reviews.llvm.org/D56342 llvm-svn: 350673	2019-01-08 22:41:48 +00:00
Sanjay Patel	d023dd60e9	[InstCombine] canonicalize another raw IR rotate pattern to funnel shift This is matching the equivalent of the DAG expansion, so it should never end up with worse perf than the original code even if the target doesn't have a rotate instruction. llvm-svn: 350672	2019-01-08 22:39:55 +00:00
Rong Xu	7162e16e6b	[PGO] Revert r350579 to fix commit message. Will re-commit it using the correct commit message. llvm-svn: 350670	2019-01-08 22:37:12 +00:00
Heejin Ahn	321d522038	[WebAssembly] Rename StoreResults to MemIntrinsicResults Summary: StoreResults pass does not optimize store instructions anymore because store instructions don't return results values anymore. Now this pass is used solely for memory intrinsics, so update the pass name accordingly and fix outdated pass descriptions as well. This patch does not change any meaningful behavior, but not marked as NFC because it changes a comment check line in a test case. Reviewers: dschuff Subscribers: mgorny, sbc100, jgravelle-google, sunfiish, llvm-commits Differential Revision: https://reviews.llvm.org/D56093 llvm-svn: 350669	2019-01-08 22:35:18 +00:00
Evandro Menezes	9b7b5b1dcc	[llvm-mca] Update the Exynos test cases (NFC) Add more entropy to the test cases. llvm-svn: 350662	2019-01-08 22:29:56 +00:00
Zachary Turner	2fe4900525	[llvm-undname] Add support for demangling msvc's noexcept types. Starting in C++17, MSVC introduced a new mangling for function parameters that are themselves noexcept functions. This patch makes llvm-undname properly demangle them. Patch by Zachary Henkel Differential Revision: https://reviews.llvm.org/D55769 llvm-svn: 350656	2019-01-08 21:05:51 +00:00
Philip Pfaffe	82f995db75	[NewPM] Port tsan A straightforward port of tsan to the new PM, following the same path as D55647. Differential Revision: https://reviews.llvm.org/D56433 llvm-svn: 350647	2019-01-08 19:21:57 +00:00
Sanjay Patel	6a18c531ae	[x86] add tests for PR40243; NFC llvm-svn: 350646	2019-01-08 19:15:21 +00:00
Paul Robinson	7402fd9a35	Rename DIFlagFixedEnum to DIFlagEnumClass. NFC llvm-svn: 350641	2019-01-08 17:52:29 +00:00
Anna Thomas	2dfa412efe	[UnrollRuntime] Fix domTree failures in multiexit unrolling Summary: This fixes the IDom for exit blocks and all blocks reachable from the exit blocks, when runtime unrolling under multiexit/exiting case. We initially had a restrictive check that the IDom is only updated when it is the header of the loop. However, we also need to update the IDom to the correct one when the IDom is any block within the original loop. See added test cases (which fail dom tree verification without the patch). Reviewers: reames, mzolotukhin, mkazantsev, hfinkel Reviewed by: brzycki, kuhar Subscribers: zzheng, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D56284 llvm-svn: 350640	2019-01-08 17:16:25 +00:00
Yonghong Song	0d99031de0	[BPF] Fix .BTF.ext reloc type assigment issue Commit f1db33c5c1a9 ("[BPF] Disable relocation for .BTF.ext section") assigned relocation type R_BPF_NONE if the fixup type is FK_Data_4 and the symbol is temporary. The reason is we use FK_Data_4 as a fixup type for insn offsets in .BTF.ext section. Just checking whether the symbol is temporary is not enough. For example, .debug_info may reference some strings whose fixup is FK_Data_4 with a temporary symbol as well. To truely reflect the case for .BTF.ext section, this patch further checks that the section associateed with the symbol must be SHF_ALLOC and SHF_EXECINSTR, i.e., in the text section. This fixed the above-mentioned problem. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 350637	2019-01-08 16:36:06 +00:00
Petr Pavlu	bf4fdecc51	[GlobalISel] Fix choice of instruction selector for AArch64 at -O0 with -global-isel=0 Commit rL347861 introduced an unintentional change in the behaviour when compiling for AArch64 at -O0 with -global-isel=0. Previously, explicitly disabling GlobalISel resulted in using FastISel but an updated condition in the commit changed it to using SelectionDAG. The patch fixes this condition and slightly better organizes the code that chooses the instruction selector. Fixes PR40131. Differential Revision: https://reviews.llvm.org/D56266 llvm-svn: 350626	2019-01-08 14:19:06 +00:00
Philip Pfaffe	efb5ad1c58	[DA][NewPM] Add a printerpass and port the testsuite The new-pm version of DA is untested. Testing requires a printer, so add that and use it in the existing DA tests. Differential Revision: https://reviews.llvm.org/D56386 llvm-svn: 350624	2019-01-08 14:06:58 +00:00
Francis Visoiu Mistrih	7a6d7672c1	[X86][Darwin] Emit compact-unwind for register-sized stack adjustments For stack frames on the size of a register in x86, a code size optimization emits "push rax/eax" instead of "sub" for stack allocation. For example: foo: .cfi_startproc BB#0: pushq %rax Ltmp0: .cfi_def_cfa_offset 16 ... .cfi_endproc However, we are falling back to DWARF in this case because we cannot encode %rax as a saved register. This requirement is wrong, since we don't care about the contents of %rax, it is the equivalent of a sub. In order to specify that we care about the contents of %rax, we would need a .cfi_offset %rax, <offset>. It's also overzealous in the case where there are pushes for callee saved registers followed by a "push rax/eax" instead of "sub", in which case we should also be able to encode the callee saved regs and everything else using compact unwind. Patch authored by Bruno Cardoso Lopes. Differential Revision: https://reviews.llvm.org/D13793 llvm-svn: 350623	2019-01-08 13:53:15 +00:00
Tim Northover	964eea7ad2	AArch64: avoid splitting vector truncating stores. We have code to split vector splats (of zero and non-zero) for performance reasons, but it ignores the fact that a store might be truncating. Actually, truncating stores are formed for vNi8 and vNi16 types. Since the truncation is from a legal type, the size of the store is always <= 64-bits and so they don't actually benefit from being split up anyway, so this patch just disables that transformation. llvm-svn: 350620	2019-01-08 13:30:27 +00:00
James Henderson	6135b0f88c	[llvm-readobj] Don't print '@' at end of unversioned dynsym names This fixes https://bugs.llvm.org/show_bug.cgi?id=40097. The problem was caused by a regression in r188022. See also r350614. Reviewed by: rupprecht, mstorsjo, Higuoxing, jakehehrlich Differential Revision: https://reviews.llvm.org/D56319 llvm-svn: 350615	2019-01-08 10:58:05 +00:00
Sam Parker	53000a74a5	[ARM] Add missing patterns for DSP muls Using a PatLeaf for sext_16_node allowed matching smulbb and smlabb instructions once the operands had been sign extended. But we also need to use sext_inreg operands along with sext_16_node to catch a few more cases that enable use to remove the unnecessary sxth. Differential Revision: https://reviews.llvm.org/D55992 llvm-svn: 350613	2019-01-08 10:12:36 +00:00
Matt Arsenault	c765240060	AMDGPU/GlobalISel: Introduce vcc reg bank I'm not entirely sure this is the correct thing to do with the global isel philosophy, but I think this is necessary to handle how differently SGPRs are used normally vs. from a condition. For example, it makes sense to allow a copy from a VGPR to an SGPR, but it makes no sense to allow a copy from VGPRs to SGPRs used as select mask. This avoids regbankselecting strange code with a truncate feeding directly into a condition field. Now a copy is forced from sgpr(s1) to vcc, which is more sensible to handle. Some of these issues could probably avoided with making enough operations resulting in i1 illegal. I think we can't avoid this register bank for legality. For example, an i1 and where one source is from a truncate, and one source is a compare needs some kind of copy inserted to make sure both are in condition registers. llvm-svn: 350611	2019-01-08 06:30:53 +00:00
Thomas Lively	6a87ddac9a	[WebAssembly] Massive instruction renaming Summary: An automated renaming of all the instructions listed at https://github.com/WebAssembly/spec/issues/884#issuecomment-426433329 as well as some similarly-named identifiers. Reviewers: aheejin, dschuff, aardappel Subscribers: sbc100, jgravelle-google, eraman, sunfish, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D56338 llvm-svn: 350609	2019-01-08 06:25:55 +00:00
Mandeep Singh Grang	f286bee9fe	[MC] [AArch64] Support resolving signed fixups for :abs_g0_s: etc. Summary: This patch is a follow-up to D55896. Reviewers: efriedma, mstorsjo Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56029 llvm-svn: 350606	2019-01-08 04:48:00 +00:00
Matt Arsenault	a1515d2d33	AMDGPU/GlobalISel: Legalize concat_vectors llvm-svn: 350598	2019-01-08 01:30:02 +00:00
Matt Arsenault	adc40baa29	RegBankSelect: Fix copy insertion point for terminators If a copy was needed to handle the condition of brcond, it was being inserted before the defining instruction. Add tests for iterator edge cases. I find the existing code here suspect for the case where it's looking for terminators that modify the register. It's going to insert a copy in the middle of the terminators, which isn't allowed (it might be necessary to have a COPY_terminator if anybody actually needs this). Also legalize brcond for AMDGPU. llvm-svn: 350595	2019-01-08 01:22:47 +00:00
Matt Arsenault	ae6f1e07fc	AMDGPU/GlobalISel: Disallow VGPR->SCC copies This fixes using scalar adds when only the carry in is a VGPR using greedy regbankselect. llvm-svn: 350593	2019-01-08 01:13:20 +00:00
Matt Arsenault	68c668a5f3	AMDGPU/GlobalISel: RegBankSelect for carry-in I'm not sure we should be allowing the truncate to s1 for the inputs. It may be necessary to create a new VCC reg bank. llvm-svn: 350592	2019-01-08 01:09:09 +00:00
Matt Arsenault	2cc15b67b7	AMDGPU/GlobalISel: RegBankSelect for add/sub with carry out llvm-svn: 350589	2019-01-08 01:03:58 +00:00
Matt Arsenault	299302fbe7	AMDGPU/GlobalISel: InstrMapping for G_UNMERGE_VALUES llvm-svn: 350588	2019-01-08 00:46:19 +00:00
Chen Zheng	33a61d719c	fix comment typo - NFC llvm-svn: 350587	2019-01-08 00:40:01 +00:00
Wei Mi	2645fd0ece	[RegisterCoalescer] dst register's live interval needs to be updated when merging a src register in ToBeUpdated set. This is to fix PR40061 related with https://reviews.llvm.org/rL339035. In https://reviews.llvm.org/rL339035, live interval of source pseudo register in rematerialized copy may be saved in ToBeUpdated set and its update may be postponed. In PR40061, %t2 = %t1 is rematerialized and %t1 is added into toBeUpdated set to postpone its live interval update. After the rematerialization, the live interval of %t1 is larger than necessary. Then %t1 is merged into %t3 and %t1 gets removed. After the merge, %t3 contains live interval larger than necessary. Because %t3 is not in toBeUpdated set, its live interval is not updated after register coalescing and it will break some assumption in regalloc. The patch requires the live interval of destination register in a merge to be updated if the source register is in ToBeUpdated. Differential revision: https://reviews.llvm.org/D55867 llvm-svn: 350586	2019-01-08 00:26:11 +00:00
Jonas Devlieghere	91b43adb69	[dsymutil] Upstream unobfuscation logic. The unobufscation support for BCSymbolMaps was the last piece of code that hasn't been upstreamed yet. This patch contains a reworked version of the existing code and relevant tests. Differential revision: https://reviews.llvm.org/D56346 llvm-svn: 350580	2019-01-07 23:27:25 +00:00
Rong Xu	6f366c3a04	[PGO] Use SourceFileName rather module name in PGOFuncName In LTO or Thin-lto mode (though linker plugin), the module names are of temp file names which are different for different compilations. Using SourceFileName avoids the issue. This should not change any functionality for current PGO as all the current callers of getPGOFuncName() is before LTO. llvm-svn: 350579	2019-01-07 23:25:56 +00:00
Davide Italiano	bf1fdb852f	[Verifier] Reject invalid type for DILocalVariable. Reviewers: aprantl Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56414 llvm-svn: 350578	2019-01-07 23:09:09 +00:00
Michael Ferguson	e39b614d1d	[ValueTracking] Adjust comment in test Adjusts a comment in this test to verify commit access. llvm-svn: 350569	2019-01-07 21:02:22 +00:00
Craig Topper	486313b5f7	Recommit r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics." The MSVC limit we hit on AutoUpgrade.cpp has been worked around for now. llvm-svn: 350567	2019-01-07 21:00:32 +00:00
Martin Storsjo	93a7137c0a	[ObjectYAML] [COFF] Support multiple symbols with the same name Differential Revision: https://reviews.llvm.org/D56294 llvm-svn: 350566	2019-01-07 20:55:33 +00:00
Craig Topper	fad1589f39	Revert r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics." The AutoUpgrade.cpp if/else cascade hit an MSVC limit again. llvm-svn: 350562	2019-01-07 19:39:05 +00:00
Craig Topper	826f44b550	[TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes a User and OpIdx. Stop using it in AMDGPU target for simplifyI24. As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions. Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node. This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input. Differential Revision: https://reviews.llvm.org/D56087 llvm-svn: 350560	2019-01-07 19:30:43 +00:00
Craig Topper	9c4f7e9147	[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics. Differential Revision: https://reviews.llvm.org/D56377 llvm-svn: 350554	2019-01-07 19:10:12 +00:00
Diogo N. Sampaio	f192cdb5c9	[ARM] ComputeKnownBits to handle extract vectors This patch adds the sign/zero extension done by vgetlane to ARM computeKnownBitsForTargetNode. Differential revision: https://reviews.llvm.org/D56098 llvm-svn: 350553	2019-01-07 19:01:47 +00:00
Simon Pilgrim	32f77f2b52	[X86] Add OR(AND(X,C),AND(Y,~C)) bit select tests Based off work for D55935 llvm-svn: 350548	2019-01-07 18:07:56 +00:00
Armando Montanez	488545ef15	[elfabi] Add option to manually specify file read format Although llvm-elfabi will attempt to read input files without needing the format to be manually specified, doing so has the potential to introduce extraneous errors that can hinder debugging (since multiple readers may fail in attempts to read the file). This change allows the input file format to be manually specified to force elfabi to use a single reader. This makes it easier to test and debug errors specific to a given reader. llvm-svn: 350545	2019-01-07 17:33:10 +00:00
Jordan Rupprecht	70038e01c8	[llvm-objcopy] Handle -O <format> flag. Summary: The -O flag is currently being mostly ignored; it's only checked whether or not the output format is "binary". This adds support for a few formats (e.g. elf64-x86-64), so that when specified, the output can change between 32/64 bit and sizes/alignments are updated accordingly. This fixes PR39135 Reviewers: jakehehrlich, jhenderson, alexshap, espindola Reviewed By: jhenderson Subscribers: emaste, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D53667 llvm-svn: 350541	2019-01-07 16:59:12 +00:00
Sanjay Patel	47f92d3270	[x86] add more tests for LowerToHorizontalOp(); NFC These tests show missed optimizations and a miscompile similar to PR40243 - https://bugs.llvm.org/show_bug.cgi?id=40243 llvm-svn: 350533	2019-01-07 16:10:14 +00:00
Rhys Perry	f77e2e8406	AMDGPU: test for uniformity of branch instruction, not its condition Summary: If a divergent branch instruction is marked as divergent by propagation rule 2 in DivergencePropagator::exploreSyncDependency() and its condition is uniform, that branch would incorrectly be assumed to be uniform. Reviewers: arsenm, tstellar Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D56331 llvm-svn: 350532	2019-01-07 15:52:28 +00:00
James Henderson	9e014b6c3d	[llvm-nm] Add --portability as alias for --format=posix GNU nm supports this alias, so supporting it in llvm-nm makes it easier to transition between the two. Fixes https://bugs.llvm.org/show_bug.cgi?id=40002 Reviewed by: mstorsjo, rupprecht Differential Revision: https://reviews.llvm.org/D56312 llvm-svn: 350522	2019-01-07 14:12:51 +00:00
Matt Arsenault	369acb8470	AMDGPU: Remove VS/SV mappings from select These would violate the constant bus restriction llvm-svn: 350517	2019-01-07 13:21:36 +00:00
Simon Pilgrim	6aac0ec21f	Regenerate test. Prep work towards enabling SimplifyDemandedBits vector support for TRUNCATE as discussed on D56118. llvm-svn: 350514	2019-01-07 12:21:13 +00:00
Simon Pilgrim	09bf22862a	Regenerate test. Prep work towards enabling SimplifyDemandedBits vector support for TRUNCATE as discussed on D56118. llvm-svn: 350513	2019-01-07 12:20:35 +00:00
Craig Topper	1ac0839098	[X86] Update VBMI2 vshld/vshrd tests to use an immediate that doesn't require a modulo. Planning to replace these with funnel shift intrinsics which would mask out the extra bits. This will help minimize test diffs. llvm-svn: 350504	2019-01-07 05:58:53 +00:00
Craig Topper	6ffeeb705f	[X86] Add support for matching vector funnel shift to AVX512VBMI2 instructions. Summary: AVX512VBMI2 supports a funnel shift by immediate and a funnel shift by a variable vector. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56361 llvm-svn: 350498	2019-01-06 18:10:18 +00:00
Craig Topper	d0ba531a0c	[X86] Use two pmovmskbs in combineBitcastvxi1 for (i64 (bitcast (v64i1 (truncate (v64i8)))) on KNL. llvm-svn: 350481	2019-01-05 22:42:58 +00:00
Craig Topper	46f8b4a11e	[X86] Allow combinevxi1Bitcast to use pmovmskb on avx512 targets if the input is a truncate from v16i8/v32i8. This is especially helpful on targets without avx512bw since we don't have a good way to convert from v16i8/v32i8 to v16i1/v32i1 for the truncate anyway. If we're just going to convert it to a GPR we might as well use pmovmskb to accomplish both. llvm-svn: 350480	2019-01-05 21:40:07 +00:00
Stanislav Mekhanoshin	35a3a3bd11	Added single use check to ShrinkDemandedConstant Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink source node to demanded bits only even if there are other uses. Differential Revision: https://reviews.llvm.org/D56289 llvm-svn: 350475	2019-01-05 19:20:00 +00:00
Craig Topper	27406e1f9e	[X86] Regenerate test to merge 32-bit and 64-bit check lines. NFC llvm-svn: 350474	2019-01-05 19:19:37 +00:00
Craig Topper	3f48dbf72e	[X86] Allow LowerTRUNCATE to use PACKUS/PACKSS for v16i16->v16i8 truncate when -mprefer-vector-width-256 is in effect and BWI is not available. llvm-svn: 350473	2019-01-05 18:48:11 +00:00
Nikita Popov	25a02c12f1	[InstCombine] Improve cttz/ctlz + icmp tests; NFC Change part of the tests to use vectors (I'm using scalar for ugt and vector for ult), add multiuse variations, rename %lz to %tz for the cttz tests. llvm-svn: 350471	2019-01-05 17:36:05 +00:00
Nikita Popov	b46680407d	[InstCombine] Add cttz/ctlz + icmp ugt/ult tests; NFC llvm-svn: 350468	2019-01-05 15:51:59 +00:00
Nikita Popov	65038515ee	[InstCombine] Relax cttz/ctlz with select on zero The cttz/ctlz intrinsics have a parameter specifying whether the result is undefined for zero. cttz(x, false) can be relaxed to cttz(x, true) if x is known non-zero, and in fact such an optimization is already performed. However, this currently doesn't work if x is non-zero as a result of a select rather than an explicit branch. This patch adds handling for this case, thus allowing x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y. Differential Revision: https://reviews.llvm.org/D55786 llvm-svn: 350463	2019-01-05 09:48:16 +00:00
Nikita Popov	7bd4900ba0	[InstCombine] Add vector tests for select + ctlz/cttz; NFC llvm-svn: 350462	2019-01-05 09:48:05 +00:00
Evgeniy Stepanov	0184c53cbd	Revert "Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)"" This reapplies commit r348983. llvm-svn: 350448	2019-01-05 00:44:58 +00:00
Vyacheslav Zakharin	0a6f86c54b	Update the pr_datasz of .note.gnu.property section. Patch by Xiang Zhang. Differential Revision: https://reviews.llvm.org/D56080 llvm-svn: 350436	2019-01-04 21:25:01 +00:00
Nikita Popov	6658fce4fc	[BDCE] Remove dead uses of arguments In addition to finding dead uses of instructions, also find dead uses of function arguments, and replace them with zero as well. I'm changing the way the known bits are computed here to remove the coupling between the transfer function and the algorithm. It previously relied on the first op being visited first and computing known bits -- unless the first op is not an instruction, in which case they're computed on the second op. I could have adjusted this to check for "instruction or argument", but I think it's better to avoid the repeated calculation with an explicit flag. Differential Revision: https://reviews.llvm.org/D56247 llvm-svn: 350435	2019-01-04 21:21:43 +00:00
Craig Topper	cfeb1cf9af	[X86] Add INSERT_SUBVECTOR to ComputeNumSignBits This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits. My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common. Differential Revision: https://reviews.llvm.org/D56283 llvm-svn: 350432	2019-01-04 20:50:59 +00:00
Sanjay Patel	6a5656703e	[x86] add tests for potential horizontal vector ops; NFC These are modified versions of the FP tests from rL349923. llvm-svn: 350430	2019-01-04 20:14:53 +00:00
Peter Collingbourne	87f477b5e4	hwasan: Implement lazy thread initialization for the interceptor ABI. The problem is similar to D55986 but for threads: a process with the interceptor hwasan library loaded might have some threads started by instrumented libraries and some by uninstrumented libraries, and we need to be able to run instrumented code on the latter. The solution is to perform per-thread initialization lazily. If a function needs to access shadow memory or add itself to the per-thread ring buffer its prologue checks to see whether the value in the sanitizer TLS slot is null, and if so it calls __hwasan_thread_enter and reloads from the TLS slot. The runtime does the same thing if it needs to access this data structure. This change means that the code generator needs to know whether we are targeting the interceptor runtime, since we don't want to pay the cost of lazy initialization when targeting a platform with native hwasan support. A flag -fsanitize-hwaddress-abi={interceptor,platform} has been introduced for selecting the runtime ABI to target. The default ABI is set to interceptor since it's assumed that it will be more common that users will be compiling application code than platform code. Because we can no longer assume that the TLS slot is initialized, the pthread_create interceptor is no longer necessary, so it has been removed. Ideally, lazy initialization should only cost one instruction in the hot path, but at present the call may cause us to spill arguments to the stack, which means more instructions in the hot path (or theoretically in the cold path if the spills are moved with shrink wrapping). With an appropriately chosen calling convention for the per-thread initialization function (TODO) the hot path should always need just one instruction and the cold path should need two instructions with no spilling required. Differential Revision: https://reviews.llvm.org/D56038 llvm-svn: 350429	2019-01-04 19:27:04 +00:00
Teresa Johnson	853b962416	[ThinLTO] Handle chains of aliases At -O0, globalopt is not run during the compile step, and we can have a chain of an alias having an immediate aliasee of another alias. The summaries are constructed assuming aliases in a canonical form (flattened chains), and as a result only the base object but no intermediate aliases were preserved. Fix by adding a pass that canonicalize aliases, which ensures each alias is a direct alias of the base object. Reviewers: pcc, davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54507 llvm-svn: 350423	2019-01-04 19:04:54 +00:00
Sanjay Patel	6153565511	[x86] lower extracted fadd/fsub to horizontal vector math; 2nd try The 1st try for this was at rL350369, but it caused IR-level diffs because our cost models differentiate custom vs. legal/promote lowering. So that was reverted at rL350373. The cost models were fixed independently at rL350403, so this is effectively the same patch as last time. Original commit message: This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350421	2019-01-04 17:48:13 +00:00
Vedant Kumar	a1778df474	[CodeExtractor] Do not extract unsafe lifetime markers Lifetime markers which reference inputs to the extraction region are not safe to extract. Example ('rhs' will be extracted): ``` entry: +------------+ \| x = alloca \| \| y = alloca \| +------------+ / \ lhs: rhs: +-------------------+ +-------------------+ \| lifetime_start(x) \| \| lifetime_start(x) \| \| use(x) \| \| lifetime_start(y) \| \| lifetime_end(x) \| \| use(x, y) \| \| lifetime_start(y) \| \| lifetime_end(y) \| \| use(y) \| \| lifetime_end(x) \| \| lifetime_end(y) \| +-------------------+ +-------------------+ ``` Prior to extraction, the stack coloring pass sees that the slots for 'x' and 'y' are in-use at the same time. After extraction, the coloring pass infers that 'x' and 'y' are not in-use concurrently, because markers from 'rhs' are no longer available to help decide otherwise. This leads to a miscompile, because the stack slots actually are in-use concurrently in the extracted function. Fix this by moving lifetime start/end markers for memory regions defined in the calling function around the call to the extracted function. Fixes llvm.org/PR39671 (rdar://45939472). Differential Revision: https://reviews.llvm.org/D55967 llvm-svn: 350420	2019-01-04 17:43:22 +00:00
Sanjay Patel	722466e1f1	[InstCombine] reduce raw IR narrowing rotate patterns to funnel shift Similar to rL350199 - there are no known analysis/codegen holes for funnel shift intrinsics now, so we can canonicalize the 6+ regular instructions to funnel shift to improve vectorization, inlining, unrolling, etc. llvm-svn: 350419	2019-01-04 17:38:12 +00:00
Nico Weber	c9141fc99f	[gn build] Commit change that should have been in r350410. llvm-svn: 350416	2019-01-04 17:26:05 +00:00
John Brawn	39ac159c24	[LICM] Adjust how moving the re-hoist point works In some cases the order that we hoist instructions in means that when rehoisting (which uses the same order as hoisting) we can rehoist to a block A, then a block B, then block A again. This currently causes an assertion failure as it expects that when changing the hoist point it only ever moves to a block that dominates the hoist point being moved from. Fix this by moving the re-hoist point when it doesn't dominate the dominator of hoisted instruction, or in other words when it wouldn't dominate the uses of the instruction being rehoisted. Differential Revision: https://reviews.llvm.org/D55266 llvm-svn: 350408	2019-01-04 17:12:09 +00:00
Simon Pilgrim	c2054144ee	[CostModel][X86] Fix SSE1 FADD/FSUB costs Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4 Add the other costs so that we're not relying on the default "is legal/custom" cost logic. llvm-svn: 350403	2019-01-04 16:55:57 +00:00
Ranjeet Singh	107dd2565c	Revert patches 348835 and 348571 because they're causing code size performance regressions. llvm-svn: 350402	2019-01-04 16:39:10 +00:00
Simon Pilgrim	71d61567c0	[CostModel][X86] Add SSE1 fp cost tests llvm-svn: 350401	2019-01-04 16:37:01 +00:00
Simon Pilgrim	9f4dea8c06	[X86] Add VPSLLI/VPSRLI ((X >>u C1) << C2) SimplifyDemandedBits combine Repeat of the generic SimplifyDemandedBits shift combine llvm-svn: 350399	2019-01-04 15:43:43 +00:00
Simon Pilgrim	7ee2285625	[X86] Split immediate shifts tests. NFCI. A future patch will combine logical shifts more aggressively. llvm-svn: 350396	2019-01-04 14:56:10 +00:00
Florian Hahn	7902405c42	[ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset GetPointerBaseWithConstantOffset include this code, where ByteOffset and GEPOffset are both of type llvm::APInt : ByteOffset += GEPOffset.getSExtValue(); The problem with this line is that getSExtValue() returns an int64_t, but the += matches an overload for uint64_t. The problem is that the resulting APInt is no longer considered to be signed. That in turn causes assertion failures later on if the relevant pointer type is > 64 bits in width and the GEPOffset was negative. Changing it to ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth()); resolves the issue and explicitly performs the sign-extending or truncation. Additionally, instead of asserting later if the result is > 64 bits, it breaks out of the loop in that case. See also https://reviews.llvm.org/D24729 https://reviews.llvm.org/D24772 This commit must be merged after D38662 in order for the test to pass. Patch by Michael Ferguson <mpfergu@gmail.com>. Reviewers: reames, sanjoy, hfinkel Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D38501 llvm-svn: 350395	2019-01-04 14:53:22 +00:00
Craig Topper	6265a15f2e	[X86] Add post-isel peephole to fold KAND+KORTEST into KTEST if only the zero flag is used. Doing this late so we will prefer to fold the AND into a masked comparison first. That can be better for the live range of the mask register. Differential Revision: https://reviews.llvm.org/D56246 llvm-svn: 350374	2019-01-04 00:10:58 +00:00
Sanjay Patel	26ce9c38a7	revert r350369: [x86] lower extracted fadd/fsub to horizontal vector math There are non-codegen tests that need to be updated with this code change. llvm-svn: 350373	2019-01-04 00:02:02 +00:00
Sanjay Patel	ef4afca2ad	[x86] lower extracted fadd/fsub to horizontal vector math This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350369	2019-01-03 23:16:19 +00:00
Heejin Ahn	777d01c756	[WebAssembly] Optimize Irreducible Control Flow Summary: Irreducible control flow is not that rare, e.g. it happens in malloc and 3 other places in the libc portions linked in to a hello world program. This patch improves how we handle that code: it emits a br_table to dispatch to only the minimal necessary number of blocks. This reduces the size of malloc by 33%, and makes it comparable in size to asm2wasm's malloc output. Added some tests, and verified this passes the emscripten-wasm tests run on the waterfall (binaryen2, wasmobj2, other). Reviewers: aheejin, sunfish Subscribers: mgrang, jgravelle-google, sbc100, dschuff, llvm-commits Differential Revision: https://reviews.llvm.org/D55467 Patch by Alon Zakai (kripken) llvm-svn: 350367	2019-01-03 23:10:11 +00:00
Wouter van Oortmerssen	820c6263d9	[WebAssembly] Fixed disassembler not knowing about new brlist operand Summary: The previously introduced new operand type for br_table didn't have a disassembler implementation, causing an assert. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56227 llvm-svn: 350366	2019-01-03 23:01:30 +00:00
Wouter van Oortmerssen	9843295608	[WebAssembly] Made InstPrinter more robust Summary: Instead of asserting on certain kinds of malformed instructions, it now still print, but instead adds an annotation indicating the problem, and/or indicates invalid_type etc. We're using the InstPrinter from many contexts that can't always guarantee values are within range (e.g. the disassembler), where having output is more valueable than asserting. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56223 llvm-svn: 350365	2019-01-03 22:59:59 +00:00
Sanjay Patel	b8687c2168	[x86] add 512-bit vector tests for horizontal ops; NFC llvm-svn: 350364	2019-01-03 22:55:18 +00:00
Sanjay Patel	ac23c46883	[x86] add AVX512 runs for horizontal ops; NFC llvm-svn: 350362	2019-01-03 22:42:32 +00:00
Craig Topper	58c61dce1d	[X86] Add test case for D56283. This tests a case where we need to be able to compute sign bits for two insert_subvectors that is a liveout of a basic block. The result is then used as a boolean vector in another basic block. llvm-svn: 350359	2019-01-03 22:31:07 +00:00
Sanjay Patel	6b8a9dbfc4	[x86] remove dead CHECK lines from test file; NFC llvm-svn: 350358	2019-01-03 22:30:36 +00:00
Sanjay Patel	fd58d623ff	[x86] split tests for FP and integer horizontal math These are similar patterns, but when you throw AVX512 onto the pile, the number of variations explodes. For FP, we really don't care about AVX1 vs. AVX2 for FP ops. There may be some superficial shuffle diffs, but that's not what we're testing for here, so I removed those RUNs. Separating by type also lets us specify 'sse3' for the FP file vs. 'ssse3' for the integer file...because x86. llvm-svn: 350357	2019-01-03 22:26:51 +00:00
Sanjay Patel	8db27b31ac	[x86] add common FileCheck prefix to reduce assert duplication; NFC llvm-svn: 350356	2019-01-03 22:11:14 +00:00
Sanjay Patel	9633d76a40	[DAGCombiner][x86] scalarize binop followed by extractelement As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354	2019-01-03 21:31:16 +00:00
Nirav Dave	667838f034	[AVR] Update integration/blink.ll as we now generate sbi/cbi instructions. Silence long standing test failure. llvm-svn: 350353	2019-01-03 21:25:39 +00:00
Alexander Timofeev	993e2798fd	[AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression. Detailed description: SIFoldOperands::foldInstOperand iterates over the operand uses calling the function that changes def-use iteratorson the way. As a result loop exits immediately when def-use iterator is changed. Hence, the operand is folded to the very first use instruction only. This makes VGPR live along the whole basic block and increases register pressure significantly. The performance drop observed in SHOC DeviceMemory test is caused by this bug. Proposed fix: collect uses to separate container for further processing in another loop. Testing: make check-llvm SHOC performance test. Reviewers: rampitec, ronlieb Differential Revision: https://reviews.llvm.org/D56161 llvm-svn: 350350	2019-01-03 19:55:32 +00:00
Nico Weber	6f06ce641e	Remove unused %host_cc lit pattern It was added in r257236 but then the one use was removed in r309517. Since no test should call %host_cc, remove the pattern. Differential Revision: https://reviews.llvm.org/D56200 llvm-svn: 350348	2019-01-03 19:31:53 +00:00
Armando Montanez	31f0f659a8	[elfabi] Introduce tool for ELF TextAPI Follow up for D53051 This patch introduces the tool associated with the ELF implementation of TextAPI (previously llvm-tapi, renamed for better distinction). This tool will house a number of features related to enalysis and manipulation of shared object's exposed interfaces. The first major feature for this tool is support for producing binary stubs that are useful for compile-time linking of shared objects. This patch introduces beginnings of support for reading binary ELF objects to work towards that goal. Added: - elfabi tool. - support for reading architecture from a binary ELF file into an ELFStub. - Support for writing .tbe files. Differential Revision: https://reviews.llvm.org/D55352 llvm-svn: 350341	2019-01-03 18:32:36 +00:00
Sanjay Patel	4e71ff234e	[x86] add tests for buildvector with extracted element; NFC llvm-svn: 350338	2019-01-03 17:55:32 +00:00
Jordan Rupprecht	1f82176f7d	[llvm-objcopy][ELF] Implement a mutable section visitor that updates size-related fields (Size, EntrySize, Align) before layout. Summary: Fix EntrySize, Size, and Align before doing layout calculation. As a side cleanup, this removes a dependence on sizeof(Elf_Sym) within BinaryReader, so we can untemplatize that. This unblocks a cleaner implementation of handling the -O<format> flag. See D53667 for a previous attempt. Actual implementation of the -O<format> flag will come in an upcoming commit, this is largely a NFC (although not _totally_ one, because alignment on binary input was actually wrong before). Reviewers: jakehehrlich, jhenderson, alexshap, espindola Reviewed By: jhenderson Subscribers: emaste, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D56211 llvm-svn: 350336	2019-01-03 17:45:30 +00:00
Simon Pilgrim	f0c533b7db	[CostModel][X86] Add truncate cost tests to cover all legal destination types We were only testing costs for legal source vector element counts llvm-svn: 350323	2019-01-03 14:49:39 +00:00
Alex Bradbury	2ba76be882	[RISCV][MC] Accept %lo and %pcrel_lo on operands to li This matches GNU assembler behaviour. llvm-svn: 350321	2019-01-03 14:41:41 +00:00
Serge Guelton	873cba17b2	Python compat - iteritems() vs. items() Always use `items()` and introduce extra `list(...)` call when needed. Differential Revision: https://reviews.llvm.org/D56257 llvm-svn: 350312	2019-01-03 14:12:23 +00:00
Serge Guelton	beb6fee542	Python compat - portable way of raising exceptions Differential Revision: https://reviews.llvm.org/D56256 llvm-svn: 350311	2019-01-03 14:12:13 +00:00
Serge Guelton	7d0174c558	[NFC] Remove unused Python import Differential Revision: https://reviews.llvm.org/D56254 llvm-svn: 350310	2019-01-03 14:12:07 +00:00
Serge Guelton	07ccb4b81d	Pythran compat - range vs. xrange Use range instead of xrange whenever possible. The extra list creation in Python2 is generally not a performance bottleneck. Differential Revision: https://reviews.llvm.org/D56253 llvm-svn: 350309	2019-01-03 14:11:58 +00:00
Serge Guelton	4a27478a5b	Python compat - print statement Make sure all print statements are compatible with Python 2 and Python3 using the `from __future__ import print_function` statement. Differential Revision: https://reviews.llvm.org/D56249 llvm-svn: 350307	2019-01-03 14:11:33 +00:00
Philip Pfaffe	b39a97c8f6	[NewPM] Port Msan Summary: Keeping msan a function pass requires replacing the module level initialization: That means, don't define a ctor function which calls __msan_init, instead just declare the init function at the first access, and add that to the global ctors list. Changes: - Pull the actual sanitizer and the wrapper pass apart. - Add a newpm msan pass. The function pass inserts calls to runtime library functions, for which it inserts declarations as necessary. - Update tests. Caveats: - There is one test that I dropped, because it specifically tested the definition of the ctor. Reviewers: chandlerc, fedor.sergeev, leonardchan, vitalybuka Subscribers: sdardis, nemanjai, javed.absar, hiraditya, kbarton, bollu, atanasyan, jsji Differential Revision: https://reviews.llvm.org/D55647 llvm-svn: 350305	2019-01-03 13:42:44 +00:00
Diogo N. Sampaio	25ae9a84c3	[NFC] Fix missing testfile change of rL350299 This file was missing on the patch llvm-svn: 350302	2019-01-03 12:48:06 +00:00
Simon Pilgrim	44d6b25d2c	[X86] Cleanup saturated add/sub tests Use X86/X64 check prefixes Use nounwind to reduce cfi noise llvm-svn: 350301	2019-01-03 12:31:13 +00:00
Simon Pilgrim	c2aadfaaad	[SLPVectorizer] Flag ADD/SUB SSAT/USAT intrinsics trivially vectorizable (PR40123) Enables SLP vectorization for the SSE2 PADDS/PADDUS/PSUBS/PSUBUS style intrinsics llvm-svn: 350300	2019-01-03 12:18:23 +00:00
Diogo N. Sampaio	8786a946d8	[ARM] Add command-line option for SB SB (Speculative Barrier) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SB, as it was previously only possible to enable by selecting -march=armv8.5-a. This patch also renames FeatureSpecRestrict to FeatureSB. Reviewed By: olista01, LukeCheeseman Differential Revision: https://reviews.llvm.org/D55990 llvm-svn: 350299	2019-01-03 12:09:12 +00:00
Simon Pilgrim	c5e22b29e4	[SLPVectorizer][X86] Add ADD/SUB SSAT/USAT tests (PR40123) llvm-svn: 350297	2019-01-03 12:02:14 +00:00
Simon Pilgrim	d824f99a6c	[X86] Add ADD/SUB SSAT/USAT vector costs (PR40123) Costs for real SSE2 instructions llvm-svn: 350295	2019-01-03 11:38:42 +00:00
Simon Pilgrim	55ea89305d	[X86] Add ADD/SUB SSAT/USAT cost tests (PR40123) llvm-svn: 350293	2019-01-03 11:29:24 +00:00
Piotr Sobczak	3abef8f9ea	[AMDGPU] Change section name with metadata access Summary: The commit rL348922 introduced a means to set Metadata section kind for a global variable, if its explicit section name was prefixed with ".AMDGPU.metadata.". This patch changes that prefix to ".AMDGPU.comment.", as "metadata" in the section name might lead to ambiguity with metadata used by AMD PAL runtime. Change-Id: Idd4748800d6fe801441d91595fc21e5a4171e668 Reviewers: kzhuravl Reviewed By: kzhuravl Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D56197 llvm-svn: 350292	2019-01-03 11:22:58 +00:00
Markus Lavin	72b9deb21f	[CodeGen] Skip over dbg-instr in twoaddr pass A DBG_VALUE between a two-address instruction and a following COPY would prevent rescheduleMIBelowKill optimization inside TwoAddressInstructionPass. Differential Revision: https://reviews.llvm.org/D55987 llvm-svn: 350289	2019-01-03 08:36:06 +00:00
Martin Storsjo	74e7d26090	[llvm-readobj] [COFF] Print the symbol index for relocations There can be multiple local symbols with the same name (for e.g. comdat sections), and thus the symbol name itself isn't enough to disambiguate symbols. Differential Revision: https://reviews.llvm.org/D56140 llvm-svn: 350288	2019-01-03 08:08:23 +00:00
Craig Topper	5ef47ad82e	[X86] Add test cases for opportunities to use KTEST when check if the result of ANDing two mask registers is zero. The test cases are constructed to avoid folding the AND into a masked compare operation. Currently we emit a KAND and a KORTEST for these cases. llvm-svn: 350287	2019-01-03 07:12:54 +00:00
QingShan Zhang	f24ec7bdd0	[Power9] Enable the Out-of-Order scheduling model for P9 hw When switched to the MI scheduler for P9, the hardware is modeled as out of order. However, inside the MI Scheduler algorithm, we still use the in-order scheduling model as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer the op. So, only when all the available instructions issued, the pending instruction could be scheduled. That is not true for our P9 hw in fact. This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017. With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows: x264_r: +6.95% cactuBSSN_r: +6.94% lbm_r: +4.11% xz_r: -3.85% And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved. Reviewer: Nemanjai Differential Revision: https://reviews.llvm.org/D55810 llvm-svn: 350285	2019-01-03 05:04:18 +00:00
Pete Cooper	697281df42	Teach ObjCARC optimizer about equivalent PHIs when eliminating autoreleaseRV/retainRV pairs OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have equivalent arguments (either identical or equivalent PHIs). It then assumes that ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead. Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue. This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence. rdar://problem/47005143 Reviewed By: ahatanak Differential Revision: https://reviews.llvm.org/D56235 llvm-svn: 350284	2019-01-03 01:38:08 +00:00
Daniel Sanders	157c43f823	[tblgen][disasm] Emit record names again when decoder conflicts occur. And add a test for it. llvm-svn: 350277	2019-01-03 00:14:33 +00:00
Teresa Johnson	0aa09c62cb	[gold] emit assembly listing from gold plugin on LTO stage Summary: Sometimes it's useful to emit assembly after LTO stage to modify it manually. Emitting precodegen bitcode file (via save-temps plugin option) and then feeding it to llc doesn't always give the same binary as original. This patch is simpler alternative to https://reviews.llvm.org/D24020. Patch by Denis Bakhvalov. Reviewers: mehdi_amini, tejohnson Reviewed By: tejohnson Subscribers: MaskRay, inglorion, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D56114 llvm-svn: 350276	2019-01-02 23:48:00 +00:00
Craig Topper	df5304d8de	[X86] Add load folding support to the custom isel we do for X86ISD::UMUL/SMUL. The peephole pass isn't always able to fold the load because it can't commute the implicit usage of AL/AX/EAX/RAX. llvm-svn: 350272	2019-01-02 23:24:08 +00:00
Craig Topper	ce46bfa848	[X86] Add test cases to show that we fail to fold loads into i8 smulo and i8/i16/i32/i64 umulo lowering without the assistance of the peephole pass. NFC llvm-svn: 350271	2019-01-02 23:24:03 +00:00
Wouter van Oortmerssen	ad72f68501	[WebAssembly] made assembler parse block_type Summary: This was previously ignored and an incorrect value generated. Also fixed Disassembler's handling of block_type. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56092 llvm-svn: 350270	2019-01-02 23:23:51 +00:00
Xin Tong	33e3b4b9b3	[ThinLTO] Scan all variants of vague symbol for reachability. Summary: Alias can make one (but not all) live, we still need to scan all others if this symbol is reachable from somewhere else. Reviewers: tejohnson, grimar Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D56117 llvm-svn: 350269	2019-01-02 23:18:20 +00:00
Nikita Popov	41f5710328	[BDCE] Fix typo in test; NFC shl by 32 is undefined. This was intended to be a shl by 31 as part of a rotate sequence. llvm-svn: 350265	2019-01-02 22:34:32 +00:00
Pete Cooper	8d58048024	Fix assert in ObjCARC optimizer when deleting retainBlock of null or undef. The caller to EraseInstruction had this conditional: // ARC calls with null are no-ops. Delete them. if (IsNullOrUndef(Arg)) but the assert inside EraseInstruction only allowed ConstantPointerNull and not undef or bitcasts. This adds support for both of these cases. rdar://problem/47003805 llvm-svn: 350261	2019-01-02 21:00:02 +00:00
Thomas Lively	88590e99f2	[WebAssembly][NFC] Elaborate on simd-noopt test comment llvm-svn: 350260	2019-01-02 20:43:08 +00:00
Nikita Popov	cc6ef7f153	[BDCE] Remove instructions without demanded bits If an instruction has no demanded bits, remove it directly during BDCE, instead of leaving it for something else to clean up. Differential Revision: https://reviews.llvm.org/D56185 llvm-svn: 350257	2019-01-02 20:02:14 +00:00
Craig Topper	9d4860ec4e	[X86] Remove X86ISD::INC/DEC. Just select them from X86ISD::ADD/SUB at isel time INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag. This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used. I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag. This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes. Differential Revision: https://reviews.llvm.org/D55975 llvm-svn: 350245	2019-01-02 19:01:05 +00:00
Craig Topper	8dd7bd2cd7	[DAGCombiner] After performing the division by constant optimization for a DIV or REM node, replace the users of the corresponding REM or DIV node if it exists. Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced. Improves the test case from PR38217. There may be additional opportunities after this. Differential Revision: https://reviews.llvm.org/D56145 llvm-svn: 350239	2019-01-02 18:19:07 +00:00
Craig Topper	44bcc824d3	[X86] Adding full coverage of MC encoding for the XOP and LWP ISAs. Adding MC regressions tests to cover the XOP isa set. This patch is part of a larger task to cover MC encoding of all X86 isa sets started in revision: https://reviews.llvm.org/D39952 Differential Revision: https://reviews.llvm.org/D41392 llvm-svn: 350237	2019-01-02 18:09:41 +00:00
Craig Topper	3109f3a4ab	[LegalizeIntegerTypes] When promoting the result of an extract_vector_elt also promote the input type if necessary By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined. By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend. This fixes the regression on X86 in D56156. Differential Revision: https://reviews.llvm.org/D56176 llvm-svn: 350236	2019-01-02 17:58:30 +00:00
Craig Topper	c562fae02b	[DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them. If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead. The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this. Differential Revision: https://reviews.llvm.org/D56156 llvm-svn: 350235	2019-01-02 17:58:27 +00:00
Wei Mi	ecc89b76cb	[PowerPC] Remove SeenUse check when optimizing conditional branch in PPCPreEmitPeephole pass. PPCPreEmitPeephole will convert a BC to B when the conditional branch is based on a constant CR by CRSET or CRUNSET. This is added in https://reviews.llvm.org/rL343100. When the conditional branch is known to be always taken, all branches will be removed and a new unconditional branch will be inserted. However, when SeenUse is false the original patch will not remove the branches, but still insert the new unconditional branch, update the successors and create inconsistent IR. Compiling the synthetic testcase included can show the problem we run into. The patch simply removes the SeenUse condition when adding branches into InstrsToErase set. Differential Revision: https://reviews.llvm.org/D56041 llvm-svn: 350223	2019-01-02 17:07:23 +00:00
Simon Pilgrim	d8125726d5	[X86] Support SHLD/SHRD masked shift-counts (PR34641) Peek through shift modulo masks while matching double shift patterns. I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now. Differential Revision: https://reviews.llvm.org/D56199 llvm-svn: 350222	2019-01-02 17:05:37 +00:00
Sanjay Patel	eafd481aad	[x86] add more tests for potential horizontal ops; NFC As discussed in D56011 - add runs for AVX512 and tests with extra uses. llvm-svn: 350221	2019-01-02 16:36:04 +00:00
Hal Finkel	4f2381440d	[BasicAA] Support arbitrary pointer sizes (and fix an overflow bug) Motivated by the discussion in D38499, this patch updates BasicAA to support arbitrary pointer sizes by switching most remaining non-APInt calculations to use APInt. The size of these APInts is set to the maximum pointer size (maximum over all address spaces described by the data layout string). Most of this translation is straightforward, but this patch contains a fix for a bug that revealed itself during this translation process. In order for test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit pointers, the intermediate calculations must be performed using 64-bit integers. This is because, as noted in the patch, when GetLinearExpression decomposes an expression into C1V+C2, and we then multiply this by Scale, and distribute, to get (C1Scale)V + C2Scale, it can be the case that, even through C1V+C2 does not overflow for relevant values of V, (C2Scale) can overflow. If this happens, later logic will draw invalid conclusions from the (base) offset value. Thus, when initially applying the APInt conversion, because the maximum pointer size in this test is 32 bits, it started failing. Suspicious, I created a 64-bit version of this test (included here), and that failed (miscompiled) on trunk for a similar reason (the multiplication can overflow). After fixing this overflow bug, the first test case (at least) in Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was relying on having 64-bit intermediate values to have BasicAA return an accurate result. In order to fix this problem, and because I believe that it is not uncommon to use i64 indexing expressions in 32-bit code (especially portable code using int64_t), it seems reasonable to always use at least 64-bit integers. In this way, we won't regress our analysis capabilities (and there's a command-line option added, so experimenting with this should be easy). As pointed out by Eli during the review, there are other potential overflow conditions that this patch does not address. Fixing those is left to follow-up work. Patch by me with contributions from Michael Ferguson (mferguson@cray.com). Differential Revision: https://reviews.llvm.org/D38662 llvm-svn: 350220	2019-01-02 16:28:09 +00:00
Piotr Sobczak	378131bae0	[AMDGPU] Handle OR as operand of raw load/store Summary: Use isBaseWithConstantOffset() which handles OR as an operand to llvm.amdgcn.raw.buffer.load and llvm.amdgcn.raw.buffer.store. Change-Id: Ifefb9dc5ded8710d333df07ab1900b230e33539a Reviewers: nhaehnle, mareko, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55999 llvm-svn: 350208	2019-01-02 09:47:41 +00:00
Craig Topper	8969720787	[X86] Add i8/i16 smulo/umulo test cases where the overflow indication is used by a mask. llvm-svn: 350204	2019-01-02 05:46:02 +00:00
Craig Topper	6f2feb8293	[X86] Remove KNL specific check prefix from xmulo.ll test. NFC This was added at a time when i1 was a legal type with avx512f and there was a bug. i1 is no longer considered a legal type with avx512f so there should be no codegen difference. llvm-svn: 350203	2019-01-02 05:46:00 +00:00
Sanjay Patel	654e6aabb9	[InstCombine] canonicalize raw IR rotate patterns to funnel shift The final piece of IR-level analysis to allow this was committed with: rL350188 Using the intrinsics should improve transforms based on cost models like vectorization and inlining. The backend should be prepared too, so we can now canonicalize more sequences of shift/logic to the intrinsics and know that the end result should be equal or better to the original code even if the target does not have an actual rotate instruction. llvm-svn: 350199	2019-01-01 21:51:39 +00:00
Craig Topper	00b390a000	[X86] Factor the core code out of LowerXALUO into a helper function. Use it in LowerBRCOND and LowerSELECT to avoid some duplicated code. This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions. This is inspired by the helper functions used by ARM and AArch64 for the same purpose. The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO. llvm-svn: 350198	2019-01-01 19:34:11 +00:00
Craig Topper	a728214203	[X86] Remove KNL specific check prefix from xaluo.ll test. NFC This was added at a time when i1 was a legal type with avx512f and there was a bug. i1 is no longer considered a legal type with avx512f so there should be no codegen difference. llvm-svn: 350195	2019-01-01 18:44:44 +00:00
Craig Topper	9478492a80	[X86] Add test cases to show where LowerSELECT doesn't select SADDO/SSUBO to INC/DEC, but LowerXALUOOp does. Leading to duplicate code. When SADDO/SSUBO is used as a part of a condition, the X86 backend has to lower the instruction twice. One for the flags use and then once for the data use. These two selections should be kept in sync so they end up with one node providing the data and the flags. This doesn't seem to be happening for INC/DEC. llvm-svn: 350194	2019-01-01 18:44:42 +00:00
Nikita Popov	c5a023b624	[BDCE] Regenerate test checks; NFC llvm-svn: 350190	2019-01-01 12:27:23 +00:00
Nikita Popov	d4bf57be6b	[BDCE] Remove -instsimplify from BDCE test; NFC To make it more obvious which part of the transformation is carried out by BDCE. Also drop the CHECK-IO lines which only run -instsimplify as they don't really seem meaningful if the main check doesn't run -instsimplify either. llvm-svn: 350189	2019-01-01 10:17:35 +00:00
Nikita Popov	bc9986e9ad	Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions" This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771. BDCE currently detects instructions that don't have any demanded bits and replaces their uses with zero. However, if an instruction has multiple uses, then some of the uses may be dead (have no demanded bits) even though the instruction itself is still live. This patch extends DemandedBits/BDCE to detect such uses and replace them with zero. While this will not immediately render any instructions dead, it may lead to simplifications (in the motivating case, by converting a rotate into a simple shift), break dependencies, etc. The implementation tries to strike a balance between analysis power and complexity/memory usage. Originally I wanted to track demanded bits on a per-use level, but ultimately we're only really interested in whether a use is entirely dead or not. I'm using an extra set to track which uses are dead. However, as initially all uses are dead, I'm not storing uses those user is also dead. This case is checked separately instead. The previous attempt to land this lead to miscompiles, because cases where uses were initially dead but were later found to be live during further analysis were not always correctly removed from the DeadUses set. This is fixed now and the added test case demanstrates such an instance. Differential Revision: https://reviews.llvm.org/D55563 llvm-svn: 350188	2019-01-01 10:05:26 +00:00
Ayonam Ray	e00606a1b2	Reversing the commit in revision 350186. Revision causes regression in 4 tests. llvm-svn: 350187	2019-01-01 07:28:55 +00:00
Ayonam Ray	c471bb2e67	Omit range checks from jump tables when lowering switches with unreachable default During the lowering of a switch that would result in the generation of a jump table, a range check is performed before indexing into the jump table, for the switch value being outside the jump table range and a conditional branch is inserted to jump to the default block. In case the default block is unreachable, this conditional jump can be omitted. This patch implements omitting this conditional branch for unreachable defaults. Review Reference: D52002 llvm-svn: 350186	2019-01-01 06:37:50 +00:00
Chen Zheng	4952e668f8	[InstCombine] canonicalize MUL with NEG operand -X * Y --> -(X * Y) X * -Y --> -(X * Y) Differential Revision: https://reviews.llvm.org/D55961 llvm-svn: 350185	2019-01-01 01:09:20 +00:00
Simon Pilgrim	8b503c795e	[X86] Add PR34641 masked shld/shrd test cases llvm-svn: 350181	2018-12-31 19:46:18 +00:00
Craig Topper	c25f1f8f17	[X86] Add additional RUN lines to prepare for D56156. NFC llvm-svn: 350180	2018-12-31 19:09:32 +00:00
Craig Topper	ed3ffae4a4	[SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG support to computeKnownBits. Differential Revision: https://reviews.llvm.org/D56168 llvm-svn: 350179	2018-12-31 19:09:30 +00:00
Craig Topper	bb0873cf46	[X86] Add X86ISD::VSRAI to computeKnownBitsForTargetNode. Differential Revision: https://reviews.llvm.org/D56169 llvm-svn: 350178	2018-12-31 19:09:27 +00:00
Michal Gorny	7343e24f78	[test] Fix propagating HOME envvar to unittests Propagate HOME environment variable to unittests. This is necessary to fix test failures resulting from pw_home pointing to a non-existing directory while being overriden with HOME. Apparently Gentoo users hit this sometimes when they override build directory for Portage. Original bug report: https://bugs.gentoo.org/674088 Differential Revision: https://reviews.llvm.org/D56162 llvm-svn: 350175	2018-12-31 13:48:12 +00:00
Martin Storsjo	74d93f9b24	[AArch64] Accept "sve" as arch feature in assembler Differential Revision: https://reviews.llvm.org/D56128 llvm-svn: 350174	2018-12-31 10:22:04 +00:00
Alexander Potapenko	cea4f83371	[MSan] Handle llvm.is.constant intrinsic MSan used to report false positives in the case the argument of llvm.is.constant intrinsic was uninitialized. In fact checking this argument is unnecessary, as the intrinsic is only used at compile time, and its value doesn't depend on the value of the argument. llvm-svn: 350173	2018-12-31 09:42:23 +00:00
Martin Storsjo	2018777836	[AArch64] Implement the .arch_extension directive Differential Revision: https://reviews.llvm.org/D56131 llvm-svn: 350169	2018-12-30 21:06:32 +00:00
Kang Zhang	9d78c60bf4	[PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that bad machine code Summary: For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo. Then the verifier runs and it seems like we have a use of an undefined register (the register will be reserved later, but the verifier doesn't know that). So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know X2 is a reserved register. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D56148 llvm-svn: 350165	2018-12-30 15:13:51 +00:00
Kang Zhang	4aa6453767	[PowerPC] Fix ADDE, SUBE do not know how to promote operator Summary: This patch is created to fix the Bugzilla bug 39815: https://bugs.llvm.org/show_bug.cgi?id=39815 This patch is to support promotion integer result for the instruction ADDE, SUBE. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D56119 llvm-svn: 350161	2018-12-30 07:48:09 +00:00
Craig Topper	a32e353afa	[X86] Don't mark SEXTLOAD from v4i8/v4i16/v8i8 as Custom on pre-sse4.1. This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better. Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it. llvm-svn: 350159	2018-12-30 03:05:07 +00:00
Craig Topper	f237ce159e	[X86] Add custom type legalization for SIGN_EXTEND_VECTOR_INREG from 16i16/v32i8 to v4i64 when v4i64 needs splitting. This allows us to sign extend to v4i32 first. And then share that extension to implement the final steps to v4i64 using a pcmpgt and punpckl and punpckh. We already do something similar for SIGN_EXTEND with -x86-experimental-vector-widening-legalization. llvm-svn: 350158	2018-12-30 02:30:34 +00:00
Nemanja Ivanovic	0f7715afe1	[PowerPC] Complete the custom legalization of vector int to fp conversion A recent patch has added custom legalization of vector conversions of v2i16 -> v2f64. This just rounds it out for other types where the input vector has an illegal (narrower) type than the result vector. Specifically, this will handle the following conversions: v2i8 -> v2f64 v4i8 -> v4f32 v4i16 -> v4f32 Differential revision: https://reviews.llvm.org/D54663 llvm-svn: 350155	2018-12-29 13:40:48 +00:00
Chen Zheng	763c8973bf	[InstCombine] [NFC] update testcases for canonicalize MUL with NEG operand llvm-svn: 350154	2018-12-29 12:18:15 +00:00
Nemanja Ivanovic	3c7ac649ec	[PowerPC] Fix CR Bit spill pseudo expansion The current CRBIT spill pseudo-op expansion creates a KILL instruction that kills the CRBIT and defines the enclosing CR field. However, this paints a false picture to the register allocator that all bits in the CR field are killed so copies of other bits out of the field become dead and removable. This changes the expansion to preserve the KILL flag on the CRBIT as an implicit use and to treat the CR field as an undef input. Thanks to Hal Finkel for the review and Uli Weigand for implementation input. Differential revision: https://reviews.llvm.org/D55996 llvm-svn: 350153	2018-12-29 11:43:54 +00:00
Simon Atanasyan	a6424e7c4e	[mips] Show an error on attempt to use 64-bit PC-relative relocation The following code requests 64-bit PC-relative relocations unsupported by MIPS ABI. Now it triggers an assertion. It's better to show an error message. ``` foo: .quad bar - foo ``` llvm-svn: 350152	2018-12-29 10:10:02 +00:00
Simon Atanasyan	b243d8d42a	[mips] Show a regular error message on attempt to use one byte relocation llvm-svn: 350151	2018-12-29 10:09:55 +00:00
Craig Topper	7bb1d50455	[X86] Add test case from PR38217. NFC llvm-svn: 350150	2018-12-29 07:14:30 +00:00
Craig Topper	0a6cec6f9f	[X86] Don't mark SEXTLOAD v4i8->v4i64 and v8i8->v8i64 as custom under vector widening legalization. This was tricking us into making these operations and then letting them get scalarized later. But I can't prove that the scalarized version is actually better. llvm-svn: 350141	2018-12-29 01:17:11 +00:00
Anna Thomas	bae11e7999	[UnrollRuntime] NFC: Updated exiting tests and added more tests Added more tests for multiple exiting blocks to the LatchExit. Today these cases are not supported. Patch to follow soon. llvm-svn: 350135	2018-12-28 19:21:50 +00:00
Craig Topper	f814d28eb3	[X86] Directly emit X86ISD::PMULUDQ from the ReplaceNodeResults handling of v2i8/v2i16/v2i32 multiply. Previously we emitted a multiply and some masking that was supposed to matched to PMULUDQ, but the masking could sometimes be removed before we got a chance to match it. So instead just emit the PMULUDQ directly. Remove the DAG combine that was added when the ReplaceNodeResults code was originally added. Add a new DAG combine to avoid regressions in shrink_vmul.ll Some of the shrink_vmul.ll test cases now pick PMULUDQ instead of PMADDWD/PMULLD, but I think this should be an improvement on most CPUs. I think all of this can go away if/when we switch to -x86-experimental-vector-widening-legalization llvm-svn: 350134	2018-12-28 19:19:39 +00:00
Anna Thomas	98743fa77a	[UnrollRuntime] NFC: Add comment and verify LCSSA Added -verify-loop-lcssa to test cases. Updated comments in ConnectProlog. llvm-svn: 350131	2018-12-28 18:52:16 +00:00
Diogo N. Sampaio	9123f82cc4	[AArch64] Add command-line option for SB SB (Speculative Barrier) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SB, as it was previously only possible to enable by selecting -march=armv8.5-a. This patch also moves to FeatureSB the old FeatureSpecRestrict. Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman Differential Revision: https://reviews.llvm.org/D55921 llvm-svn: 350126	2018-12-28 17:14:58 +00:00
Max Kazantsev	8f70c9bf49	[NFC] Add failing test on LCSSA form preservation of LoopSimplifyCFG llvm-svn: 350119	2018-12-28 10:43:37 +00:00
Hiroshi Inoue	1ea98f040e	[PowerPC] handle ISD:TRUNCATE in BitPermutationSelector This is the last one in a series of patches to support better code generation for bitfield insert. BitPermutationSelector already support ISD::ZERO_EXTEND but not TRUNCATE. This patch adds support for ISD:TRUNCATE in BitPermutationSelector. For example of this test case, struct s64b { int a:4; int b:16; int c:24; }; void bitfieldinsert64b(struct s64b *p, unsigned char v) { p->b = v; } the selection DAG loos like: t14: i32,ch = load<(load 4 from %ir.0)> t0, t2, undef:i64 t18: i32 = and t14, Constant:i32<-1048561> t4: i64,ch = CopyFromReg t0, Register:i64 %1 t22: i64 = AssertZext t4, ValueType:ch:i8 t23: i32 = truncate t22 t16: i32 = shl nuw nsw t23, Constant:i32<4> t19: i32 = or t18, t16 t20: ch = store<(store 4 into %ir.0)> t14:1, t19, t2, undef:i64 By handling truncate in the BitPermutationSelector, we can use information from AssertZext when selecting t19 and skip the mask operation corresponding to t18. So the generated sequences with and without this patch are without this patch rlwinm 5, 5, 0, 28, 11 # corresponding to t18 rlwimi 5, 4, 4, 20, 27 with this patch rlwimi 5, 4, 4, 12, 27 Differential Revision: https://reviews.llvm.org/D49076 llvm-svn: 350118	2018-12-28 08:00:39 +00:00
Max Kazantsev	530ff8f3cc	Temporarily disable term folding in LoopSimplifyCFG, add tests llvm-svn: 350117	2018-12-28 06:22:39 +00:00
QingShan Zhang	f2d9df61c7	[PowerPC] Remove the implicit use of the register if it is replaced by Imm If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have. Differential Revision: https://reviews.llvm.org/D56078 llvm-svn: 350115	2018-12-28 03:38:09 +00:00
Zi Xuan Wu	a02a3feecf	[PowerPC] Fix assert from machine verify pass that atomic pseudo expanding causes mismatched register class For atomic value operand which less than 4 bytes need to be masked. And the related operation to calculate the newvalue can be done in 32 bit gprc. So just use gprc for mask and value calculation. Differential Revision: https://reviews.llvm.org/D56077 llvm-svn: 350113	2018-12-28 02:12:55 +00:00
Chen Zheng	5ede950df9	[PowerPC] fix register class after converting X-FORM instruction to D-FORM instruction Differential Revision: https://reviews.llvm.org/D55806 llvm-svn: 350111	2018-12-28 01:02:35 +00:00
Craig Topper	787ad92bf6	[X86] Remove check that avoids creating PMULDQ with illegal types. Rely on SplitOpsAndApply to legalize it. Create PMULDQ/PMULUDQ as long as the number of elements is a power of 2. This seems to give some improvements in our ability to use SimplifyDemandedBits. llvm-svn: 350084	2018-12-27 03:37:04 +00:00
Wouter van Oortmerssen	f227621036	[WebAssembly] Added basic support for if/else/end_if in MC layer. Summary: These instructions are currently unused in our backend, but for completeness it is good to support them, so they can be used with the assembler in hand-written code. Tests are very basic, signature support missing much like other blocks. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55973 llvm-svn: 350079	2018-12-26 22:55:26 +00:00
Wouter van Oortmerssen	29c6ce5879	[WebAssembly] Make assembler check for proper nesting of control flow. Summary: It does so using a simple nesting stack, and gives clear errors upon violation. This is unique to wasm, since most CPUs do not have any nested constructs. Had to add an end of file check to the general assembler for this. Note: if/else/end instructions are not currently supported in our tablegen defs, so these tests will be enabled in a follow-up. They already pass the nesting check. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55797 llvm-svn: 350078	2018-12-26 22:46:18 +00:00
Craig Topper	c9a6000755	[LoopIdiomRecognize] Add CTTZ support Summary: Existing LIR recognizes CTLZ where shifting input variable right until it is zero. (Shift-Until-Zero idiom) This commit: 1. Augments Shift-Until-Zero idiom to recognize CTTZ where input variable is shifted left. 2. Prepare for BitScan idiom recognition. Patch by Yuanfang Chen (tabloid.adroit) Reviewers: craig.topper, evstupac Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55876 llvm-svn: 350074	2018-12-26 21:59:48 +00:00
Reid Kleckner	c168c6f86f	[codeview] Check if this 'this' type of a method is a pointer Fixes crash reported after r347354 for frontends that don't always emit 'this' pointers for methods. Now we will silently produce debug info that makes functions like this look like static methods, which seems reasonable. llvm-svn: 350073	2018-12-26 21:52:17 +00:00
Justin Lebar	49fac56ea3	[NVPTX] Allow libcalls that are defined in the current module. The patch adds a possibility to make library calls on NVPTX. An important thing about library functions - they must be defined within the current module. This basically should guarantee that we produce a valid PTX assembly (without calls to not defined functions). The one who wants to use the libcalls is probably will have to link against compiler-rt or any other implementation. Currently, it's completely impossible to make library calls because of error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can lower ExternalSymbol to TargetExternalSymbol and verify if the function definition is available. Also, there was an issue with a DAG during legalisation. When we expand instruction into libcall, the inner call-chain isn't being "integrated" into outer chain. Since the last "data-flow" (call retval load) node is located in call-chain earlier than CALLSEQ_END node, the latter becomes a leaf and therefore a dead node (and is being removed quite fast). Proposed here solution relies on another data-flow pseudo nodes (ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and instruction selection phases - we remove the pseudo instructions before register scheduling phase. Patch by Denys Zariaiev! Differential Revision: https://reviews.llvm.org/D34708 llvm-svn: 350069	2018-12-26 19:12:31 +00:00
Simon Pilgrim	a8ff77bb34	[AMDGPU] Regenerate i64 shift tests. To show codegen diff due to a future SimplifyDemandedBits patch. llvm-svn: 350065	2018-12-26 12:09:10 +00:00
Petar Avramovic	09dff33349	[MIPS GlobalISel] Select G_SELECT Add widen scalar for type index 1 (i1 condition) for G_SELECT. Select G_SELECT for pointer, s32(integer) and smaller low level types on MIPS32. Differential Revision: https://reviews.llvm.org/D56001 llvm-svn: 350063	2018-12-25 14:42:30 +00:00
Kang Zhang	d501a1e596	[PowerPC] Fix the bug of ISD::ADDE to set its second return type to glue Summary: This patch is to fix the bug imported by rL341634. In above submit , the the return type of ISD::ADDE is 14224: SDVTList VTs = DAG.getVTList(MVT::i64, MVT::i64), but in fact, the second return type of ISD::ADDE should be MVT::Glue not MVT::i64. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D55977 llvm-svn: 350061	2018-12-25 03:29:51 +00:00
Craig Topper	0229da8f07	[X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ. This is an alternative to what I attempted in D56057. GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users. I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits. Based on a patch that Simon Pilgrim sent me in email. Fixes PR40142. llvm-svn: 350059	2018-12-24 19:40:20 +00:00
Craig Topper	6356ad940b	[X86] Add test cases for PR40142. NFC llvm-svn: 350058	2018-12-24 19:40:17 +00:00
Eugene Leviant	4dc3a3f746	[HWASAN] Instrument memorty intrinsics by default Differential revision: https://reviews.llvm.org/D55926 llvm-svn: 350055	2018-12-24 16:02:48 +00:00
Max Kazantsev	0b455c2b71	Revert rL350048 and rL350050 These patches have broken almost all buildbots on test DebugInfo/X86/addr_comments.ll. Reverting to green. llvm-svn: 350052	2018-12-24 10:30:04 +00:00
Max Kazantsev	edabb9ae56	[LoopSimplifyCFG] Delete dead exiting edges This patch teaches LoopSimplifyCFG to remove dead exiting edges from loops. Differential Revision: https://reviews.llvm.org/D54025 Reviewed By: fedor.sergeev llvm-svn: 350049	2018-12-24 07:41:33 +00:00
David Blaikie	e20bf9ab91	DebugInfo: Use assembly label arithmetic for address pool size for easier reading/editing llvm-svn: 350048	2018-12-24 07:35:10 +00:00
David Blaikie	d671eb7e7c	DebugInfo: Add assembly comments for debug_addr contribution header fields llvm-svn: 350047	2018-12-24 07:09:50 +00:00
David Blaikie	b917c3a41a	llvm-dwarfdump: Skip address index info (and dump only the address, if found) when non-verbose dumping addrx forms There's a few bugs here still - demonstrated with FIXITs in the test. llvm-svn: 350046	2018-12-24 06:52:31 +00:00
Max Kazantsev	347c583772	Return "[LoopSimplifyCFG] Delete dead in-loop blocks" The underlying bug that caused the revert should be fixed by rL348567. Differential Revision: https://reviews.llvm.org/D54023 llvm-svn: 350045	2018-12-24 06:06:17 +00:00
Craig Topper	5eb5e2bc89	[X86] Autogenerate complete checks. NFC llvm-svn: 350039	2018-12-24 01:59:31 +00:00
Sanjay Patel	93f1074677	[DAGCombiner] limit shuffle to extend transform (PR40146) It's dangerous to knowingly create an illegal vector type no matter what stage of combining we're in. This prevents the missed folding/scalarization seen in: https://bugs.llvm.org/show_bug.cgi?id=40146 llvm-svn: 350034	2018-12-23 20:48:31 +00:00
Sanjay Patel	9e5588e1df	[x86] add test for vector shuffle --> extend transform (PR40146); NFC llvm-svn: 350033	2018-12-23 20:36:52 +00:00
Sanjay Patel	9933574ac3	[DAGCombiner] allow hoisting vector bitwise logic ahead of extends llvm-svn: 350032	2018-12-23 19:58:16 +00:00
Sanjay Patel	8bc612f63b	[x86] add tests for vector extend + logic ops; NFC llvm-svn: 350031	2018-12-23 18:37:44 +00:00
Craig Topper	006bac6880	[X86] Return false from hasAndNotCompare if the comparision value is a constant. We won't end up using an ANDN instruction in this case so we should generate the same code we do for pre-BMI targets. llvm-svn: 350018	2018-12-23 05:52:55 +00:00
Craig Topper	3cc92a28ce	[X86] Fix an old FIXME about folding the zero constant into the OR instruction we use for sequentially consistent fence in 32-bit mode without SSE2. llvm-svn: 350013	2018-12-23 01:54:43 +00:00
Craig Topper	dfb8a427ff	[X86] Autogenerate complete checks. NFC llvm-svn: 350012	2018-12-23 01:54:41 +00:00
David Blaikie	25179613f6	llvm-dwarfdump: Dump the section name/number for addr attributes llvm-svn: 350009	2018-12-22 20:34:58 +00:00
Sanjay Patel	4b537aaf6d	[DAGCombiner] allow narrowing of add followed by truncate trunc (add X, C ) --> add (trunc X), C' If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type. This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine). This change used to show regressions for x86, but those are gone after D55494. This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) that does almost the same thing. Differential Revision: https://reviews.llvm.org/D55866 llvm-svn: 350006	2018-12-22 17:10:31 +00:00
Sanjay Patel	52c02d70e2	[x86] add load fold patterns for movddup with vzext_load The missed load folding noticed in D55898 is visible independent of that change either with an adjusted IR pattern to start or with AVX2/AVX512 (where the build vector becomes a broadcast first; movddup is not produced until we get into isel via tablegen patterns). Differential Revision: https://reviews.llvm.org/D55936 llvm-svn: 350005	2018-12-22 16:59:02 +00:00
Roman Lebedev	da1df56e5d	NFC][CodeGen][X86][AArch64] Tests for bit extract (pat. a/c/d) with trunc (PR36419) llvm-svn: 350000	2018-12-22 10:38:05 +00:00
Roman Lebedev	c90611db06	[NFC][CodeGen][X86][AArch64] Bit extract: add nounwind attr to drop .cfi noise Forgot about that. llvm-svn: 349999	2018-12-22 09:58:13 +00:00
Roman Lebedev	29d8af283a	[NFC][CodeGen][X86][AArch64] Tests for bit extract (pat. b) with trunc (PR36419) @bextr64_32_b1 is extracted from hotpath of real-world code (RawSpeed BitStream<>::peekBitsNoFill()) after `clang -O3`. @bextr64_32_b2/@bextr64_32_b0 is the same pattern, but with trunc done last, showing how i think it can be handled: https://rise4fun.com/Alive/K4B https://rise4fun.com/Alive/qC9 It is possible that middle-end should do some of this, too. https://bugs.llvm.org/show_bug.cgi?id=36419 llvm-svn: 349998	2018-12-22 09:40:14 +00:00
David Blaikie	9efb0153f0	llvm-dwarfdump: Remove extraneous space between '(' and 'indexed' When dumping string or address indexes llvm-svn: 349997	2018-12-22 08:43:08 +00:00
David Blaikie	c04d2bf22a	llvm-dwarfdump: Print the section name/number for addr_index attributes (addr attributes coming shortly) llvm-svn: 349996	2018-12-22 08:33:55 +00:00
Reid Kleckner	98bbd07cc3	[MC] Enable .file support on COFF and diagnose it on unsupported targets Summary: The "single parameter" .file directive appears to be an ELF-only feature that is intended to insert the main source filename into the string table table. I noticed that if you assemble an ELF .s file for COFF, typically it will assert right away on a .file directive near the top of the file. My first change was to make this emit a proper error in the asm parser so that we don't assert so easily. However, COFF actually does have some support for this directive, and if you emit an object file, llvm-mc does not assert. When emitting a COFF object, MC will take those file names and create "debug" symbol table entries for them. I'm not familiar with these kinds of symbol table entries, and I'm not aware of any users of them, but @compnerd added them a while ago. They don't introduce absolute paths, and most main source file paths are short enough that this extra entry shouldn't cause any problems, so I enabled the flag in MCAsmInfoCOFF that indicates that it's supported. This has the side effect of adding an extra debug symbol to every object produced by clang, which is a pretty big functional change. My question is, should we keep the functionality or remove it in the name of symbol table minimalism? Reviewers: mstorsjo, compnerd Subscribers: hiraditya, compnerd, llvm-commits Differential Revision: https://reviews.llvm.org/D55900 llvm-svn: 349976	2018-12-21 23:35:48 +00:00
David Blaikie	c3f30a7fc6	Reapply: DebugInfo: Assume an absence of ranges or high_pc on a CU means the CU is empty (devoid of code addresses) Originally committed in r349333, reverted in r349353. GCC emitted these unconditionally on/before 4.4/March 2012 Clang emitted these unconditionally on/before 3.5/March 2014 This improves performance when parsing CUs (especially those using split DWARF) that contain no code ranges (such as the mini CUs that may be created by ThinLTO importing - though generally they should be/are avoided, especially for Split DWARF because it produces a lot of very small CUs, which don't scale well in a bunch of other ways too (including size)). The revert was due to a (Google internal) test that had some checked in old object files missing DW_AT_ranges. That's since been fixed. llvm-svn: 349968	2018-12-21 22:25:01 +00:00
Craig Topper	e58cd9cbc6	[X86] Add isel patterns to match BMI/TBMI instructions when lowering has turned the root nodes into one of the flag producing binops. This fixes the patterns that have or/and as a root. 'and' is handled differently since thy usually have a CMP wrapped around them. I had to look for uses of the CF flag because all these nodes have non-standard CF flag behavior. A real or/xor would always clear CF. In practice we shouldn't be using the CF flag from these nodes as far as I know. Differential Revision: https://reviews.llvm.org/D55813 llvm-svn: 349962	2018-12-21 21:42:43 +00:00
Craig Topper	62ec024d3b	[X86] Don't allow optimizeCompareInstr to replace a CMP with BEXTR if the sign flag is used. The BEXTR instruction documents the SF bit as undefined. The TBM BEXTR instruction has the same issue, but I'm not sure how to test it. With the control being an immediate we can determine the sign bit is 0 or the BEXTR would have been removed. Fixes PR40060 Differential Revision: https://reviews.llvm.org/D55807 llvm-svn: 349956	2018-12-21 21:16:26 +00:00
Changpeng Fang	6f539294b5	AMDGPU: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing. Summary: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing. This is because the M0 field is of unsigned. This patch achieves the similar goal as https://reviews.llvm.org/D55241, but keeps the optimization if the base is known unsigned. Reviewers: arsemn Differential Revision: https://reviews.llvm.org/D55568 llvm-svn: 349951	2018-12-21 20:57:34 +00:00
Reid Kleckner	84a2d29681	[BasicAA] Fix AA bug on dynamic allocas and stackrestore Summary: BasicAA has special logic for unescaped allocas, which normally applies equally well to dynamic and static allocas. However, llvm.stackrestore has the power to end the lifetime of dynamic allocas, without referring to them directly. stackrestore is already marked with the most conservative memory modification attributes, but because the alloca is not escaped, the normal logic produces incorrect results. I think BasicAA needs a special case here to teach it about the relationship between dynamic allocas and stackrestore. Fixes PR40118 Reviewers: gbiv, efriedma, george.burgess.iv Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55969 llvm-svn: 349945	2018-12-21 19:59:03 +00:00
Sanjay Patel	80187b8a17	[x86] add movddup specialization for build vector lowering (PR37502) This is admittedly a narrow fix for the problem: https://bugs.llvm.org/show_bug.cgi?id=37502 ...but as the XOP restriction shows, it's a maze to get this right. In the motivating example, note that we have movddup before SSE4.1 and again with AVX2. That's because insertps isn't available pre-SSE41 and vbroadcast is (more generally) available with AVX2 (and the splat is reduced to movddup via isel pattern). Differential Revision: https://reviews.llvm.org/D55898 llvm-svn: 349937	2018-12-21 18:48:32 +00:00
Florian Hahn	8c9f865e3d	[ARM] Set Defs = [CPSR] for COPY_STRUCT_BYVAL, as it clobbers CPSR. Fixes PR35023. Reviewers: MatzeB, t.p.northover, sunfish, qcolombet, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D55909 llvm-svn: 349935	2018-12-21 18:07:10 +00:00
Sanjay Patel	a87fba4e92	[x86] remove excess check lines; NFC Forgot that the integer variants have an extra 's'. llvm-svn: 349929	2018-12-21 17:19:43 +00:00
Sanjay Patel	252660c1ff	[x86] move misplaced tests; NFC Mixed up integer and FP in rL349923. llvm-svn: 349928	2018-12-21 17:06:43 +00:00
Jessica Paquette	453ab1db5b	[GlobalISel][AArch64] Add support for widening G_FCEIL This adds support for widening G_FCEIL in LegalizerHelper and AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't support full FP 16. This also updates AArch64/f16-instructions.ll to show that we perform the correct transformation. llvm-svn: 349927	2018-12-21 17:05:26 +00:00
Sanjay Patel	41eebdefa7	[x86] add tests for possible horizontal op transform; NFC llvm-svn: 349923	2018-12-21 16:49:41 +00:00
Sanjay Patel	fef39ecd31	[x86] move test for movddup; NFC This adds an AVX512 run as suggested in D55936. The test didn't really belong with other build vector tests because that's not the pattern here. I don't see much value in adding 64-bit RUNs because they wouldn't exercise the isel patterns that we're aiming to expose. llvm-svn: 349920	2018-12-21 16:08:27 +00:00
Luke Cheeseman	2c8ead823b	[AArch64] Adding missing REQUIRES in aarch64 dwarf test llvm-svn: 349900	2018-12-21 13:39:13 +00:00
Fedor Sergeev	2d94c2265e	[NewPM] -print-module-scope -print-after now prints module even after invalidated Loop/SCC -print-after IR printing generally can not print the IR unit (Loop or SCC) which has just been invalidated by the pass. However, when working in -print-module-scope mode even if Loop was invalidated there is still a valid module that we can print. Since we can not access invalidated IR unit from AfterPassInvalidated instrumentation point we can remember the module to be printed before pass. This change introduces BeforePass instrumentation that stores all the information required for module printing into the stack and then after pass (in AfterPassInvalidated) just print whatever has been placed on stack. Reviewed By: philip.pfaffe Differential Revision: https://reviews.llvm.org/D55278 llvm-svn: 349896	2018-12-21 11:49:05 +00:00
Luke Cheeseman	41a9e53500	[Dwarf/AArch64] Return address signing B key dwarf support - When signing return addresses with -msign-return-address=<scope>{+<key>}, either the A key instructions or the B key instructions can be used. To correctly authenticate the return address, the unwinder/debugger must know which key was used to sign the return address. - When and exception is thrown or a break point reached, it may be necessary to unwind the stack. To accomplish this, the unwinder/debugger must be able to first authenticate an the return address if it has been signed. - To enable this, the augmentation string of CIEs has been extended to allow inclusion of a 'B' character. Functions that are signed using the B key variant of the instructions should have and FDE whose associated CIE has a 'B' in the augmentation string. - One must also be able to preserve these semantics when first stepping from a high level language into assembly and then, as a second step, into an object file. To achieve this, I have introduced a new assembly directive '.cfi_b_key_frame ', that tells the assembler the current frame uses return address signing with the B key. - This ensures that the FDE is associated with a CIE that has 'B' in the augmentation string. Differential Revision: https://reviews.llvm.org/D51798 llvm-svn: 349895	2018-12-21 10:45:08 +00:00
Simon Pilgrim	5d403f6bf8	[X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm) This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics. Clang counterpart: https://reviews.llvm.org/D55890 Differential Revision: https://reviews.llvm.org/D55894 llvm-svn: 349892	2018-12-21 09:04:14 +00:00
Thomas Lively	b6dac89c87	[WebAssembly] Fix invalid machine instrs in -O0, verify in tests Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55956 llvm-svn: 349889	2018-12-21 06:58:15 +00:00
Matt Arsenault	3eae3c4590	AMDGPU/GlobalISel: RegBankSelect for amdgcn.wqm.vote llvm-svn: 349882	2018-12-21 03:20:54 +00:00
Matt Arsenault	f4c21c575a	AMDGPU/GlobalISel: RegBankSelect for some fp ops llvm-svn: 349880	2018-12-21 03:14:45 +00:00
Matt Arsenault	bee2ad7185	AMDGPU/GlobalISel: Redo legality for build_vector It seems better to avoid using the callback if possible since there are coverage assertions which are disabled if this is used. Also fix missing tests. Only test the legal cases since it seems legalization for build_vector is quite lacking. llvm-svn: 349878	2018-12-21 03:03:11 +00:00
Craig Topper	7b78137403	[X86] Autogenerate complete checks. NFC llvm-svn: 349870	2018-12-21 01:27:33 +00:00
Eli Friedman	b1bbd5dca3	[ARM] Complete the Thumb1 shift+and->shift+shift transforms. This saves materializing the immediate. The additional forms are less common (they don't usually show up for bitfield insert/extract), but they're still relevant. I had to add a new target hook to prevent DAGCombine from reversing the transform. That isn't the only possible way to solve the conflict, but it seems straightforward enough. Differential Revision: https://reviews.llvm.org/D55630 llvm-svn: 349857	2018-12-20 23:39:54 +00:00
Chen Zheng	ddfaf07526	[InstCombine] [NFC] testcases for canonicalize MUL with NEG operand llvm-svn: 349847	2018-12-20 22:37:05 +00:00
Jessica Paquette	a6b9c68a85	[GlobalISel][AArch64] Add G_FCEIL to isPreISelGenericFloatingPointOpcode If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback. Add it to the switch, and add a test verifying that this happens. llvm-svn: 349822	2018-12-20 21:14:15 +00:00
David Blaikie	b3c56af49b	DebugInfo: Fix for missing comp_dir handling with r349207 When deciding lazily whether a CU would be split or non-split I accidentally dropped some handling for the line tables comp_dir (by doing it lazily it was too late to be handled properly by the MC line table code). Move that bit of the code back to the non-lazy place. llvm-svn: 349819	2018-12-20 20:46:55 +00:00
Nikita Popov	f17421e595	[ConstantFolding] Consolidate and extend bitcount intrinsic tests; NFC Move constant folding tests into ConstantFolding/bitcount.ll and drop various tests in other places. Add coverage for undefs. llvm-svn: 349806	2018-12-20 19:46:52 +00:00
Nikita Popov	6d79e43208	[ConstantFolding] Add undef tests for overflow intrinsics; NFC llvm-svn: 349805	2018-12-20 19:46:46 +00:00
Nikita Popov	da1beca0ac	[ConstantFolding] Regenerate test checks; NFC Bring overflow-ops.ll into current format. Remove redundant entry blocks. llvm-svn: 349804	2018-12-20 19:46:43 +00:00
Nikita Popov	41a7b109c3	[ConstantFolding] Add tests for funnel shifts with undef operands; NFC llvm-svn: 349803	2018-12-20 19:46:39 +00:00
Nikita Popov	2d6fa5d6b4	[ConstantFolding] Add tests for sat add/sub with undefs; NFC llvm-svn: 349802	2018-12-20 19:46:35 +00:00
Nikita Popov	f7ef6d6461	[ConstantFolding] Split up saturating add/sub tests; NFC Split each test into a separate function. llvm-svn: 349801	2018-12-20 19:46:30 +00:00
Eli Friedman	48397102d0	[MC] [AArch64] Correctly resolve ":abs_g1:3" etc. We have to treat constructs like this as if they were "symbolic", to use the correct codepath to resolve them. This mostly only affects movz etc. because the other uses of classifySymbolRef conservatively treat everything that isn't a constant as if it were a symbol. Differential Revision: https://reviews.llvm.org/D55906 llvm-svn: 349800	2018-12-20 19:46:14 +00:00
Eli Friedman	4648209e16	[MC] [AArch64] Support resolving fixups for abs_g0 etc. This requires a bit more code than other fixups, to distingush between abs_g0/abs_g1/etc. Actually, I think some of the other fixups are missing some checks, but I won't try to address that here. I haven't seen any real-world code that uses a construct like this, but it clearly should work, and we're considering using it in the implementation of localescape/localrecover on Windows (see https://reviews.llvm.org/D53540). I've verified that binutils produces the same code as llvm-mc for the testcase. This currently doesn't include support for the *_s variants (that requires a bit more work to set the opcode). Differential Revision: https://reviews.llvm.org/D55896 llvm-svn: 349799	2018-12-20 19:38:07 +00:00
Simon Pilgrim	2a25360ae3	[X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (llvm) This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics. Clang counterpart: https://reviews.llvm.org/D55937 Differential Revision: https://reviews.llvm.org/D55938 llvm-svn: 349795	2018-12-20 19:01:07 +00:00
Florian Hahn	ef307b8c26	[LAA] Avoid generating RT checks for known deps preventing vectorization. If we found unsafe dependences other than 'unknown', we already know at compile time that they are unsafe and the runtime checks should always fail. So we can avoid generating them in those cases. This should have no negative impact on performance as the runtime checks that would be created previously should always fail. As a sanity check, I measured the test-suite, spec2k and spec2k6 and there were no regressions. Reviewers: Ayal, anemet, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D55798 llvm-svn: 349794	2018-12-20 18:49:09 +00:00
Adrian Prantl	1f6189ed06	Add missing -oso-prepend-path to dsymutil test. Thanks to Galina Kistanova for pointing this out! llvm-svn: 349793	2018-12-20 18:47:17 +00:00
Brock Wyma	b17464e4e8	[CodeView] Emit global variables within lexical scopes to limit visibility Emit static locals within the correct lexical scope so variables with the same name will not confuse the debugger into getting the wrong value. Differential Revision: https://reviews.llvm.org/D55336 llvm-svn: 349777	2018-12-20 17:33:45 +00:00
Michael Kruse	199427100b	[InstCombine] Preserve access-group metadata. Preserve llvm.access.group metadata when combining store instructions. This was forgotten in r349725. Fixes llvm.org/PR40117 llvm-svn: 349774	2018-12-20 17:11:02 +00:00
Sanjay Patel	18b008b577	[x86] add test to show missed movddup load fold; NFC llvm-svn: 349773	2018-12-20 17:05:57 +00:00
Amilendra Kodithuwakku	388bb86d7b	Test commit Fix a simple typo. llvm-svn: 349771	2018-12-20 16:44:26 +00:00
Krzysztof Parzyszek	30c42e2ab6	[Hexagon] Add patterns for funnel shifts llvm-svn: 349770	2018-12-20 16:39:20 +00:00
Simon Pilgrim	b208255fe0	[SelectionDAGBuilder] Enable funnel shift building to custom rotates This patch enables funnel shift -> rotate building for all ROTL/ROTR custom/legal operations. AFAICT X86 was the last target that was missing modulo support (PR38243), but I've tried to CC stakeholders for every target that has ROTL/ROTR custom handling for their final OK. Differential Revision: https://reviews.llvm.org/D55747 llvm-svn: 349765	2018-12-20 14:56:44 +00:00
Alex Bradbury	eb3a64a4da	[RISCV] Properly evaluate fixup_riscv_pcrel_lo12 This is a update to D43157 to correctly handle fixup_riscv_pcrel_lo12. Notable changes: Rebased onto trunk Handle and test S-type Test case pcrel-hilo.s is merged into relocations.s D43157 description: VK_RISCV_PCREL_LO has to be handled specially. The MCExpr inside is actually the location of an auipc instruction with a VK_RISCV_PCREL_HI fixup pointing to the real target. Differential Revision: https://reviews.llvm.org/D54029 Patch by Chih-Mao Chen and Michael Spencer. llvm-svn: 349764	2018-12-20 14:52:15 +00:00
Simon Pilgrim	09c081176a	[X86][AVX512] Don't custom lower v16i8 rotations. As discussed on D55747, the expansion to (wider) shifts is better on all AVX512 cases, not just BWI. llvm-svn: 349763	2018-12-20 14:38:35 +00:00
Simon Pilgrim	8c4abd20eb	[InstCombine] Make x86 PADDS/PSUBS constant folding tests generic As discussed on D55894, this replaces the existing PADDS/PSUBUS intrinsics with the the sadd/ssub.sat generic intrinsics and moves the tests out of the x86 subfolder. PR40110 has been raised to fix the regression with constant folding vectors containing undef elements. llvm-svn: 349759	2018-12-20 13:50:12 +00:00
Ulrich Weigand	44d37ae38c	[SystemZ] Make better use of VLLEZ This patch fixes two deficiencies in current code that recognizes the VLLEZ idiom: - For the floating-point versions, we have ISel patterns that match on a bitconvert as the top node. In more complex cases, that bitconvert may already have been merged into something else. Fix the patterns to match the inner nodes instead. - For the 64-bit integer versions, depending on the surrounding code, we may get either a DAG tree based on JOIN_DWORDS or one based on INSERT_VECTOR_ELT. Use a PatFrags to simply match both variants. llvm-svn: 349749	2018-12-20 13:05:03 +00:00
Ulrich Weigand	8bb46b0f01	[SystemZ] Make better use of VGEF/VGEG Current code in SystemZDAGToDAGISel::tryGather refuses to perform any transformation if the Load SDNode has more than one use. This (erronously) counts uses of the chain result, which prevents the optimization in many cases unnecessarily. Fixed by this patch. llvm-svn: 349748	2018-12-20 13:01:20 +00:00
Clement Courbet	36a3480385	Re-land r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads. Update PPC ir following GEP->bitcat to bitcat->GEP->bitcat change. llvm-svn: 349747	2018-12-20 13:01:04 +00:00
Ulrich Weigand	f43b510015	[SystemZ] Make better use of VLDEB We already have special code (DAG combine support for FP_ROUND) to recognize cases where we an use a vector version of VLEDB to perform two floating-point truncates in parallel, but equivalent support for VLEDB (vector floating-point extends) has been missing so far. This patch adds corresponding DAG combine support for FP_EXTEND. llvm-svn: 349746	2018-12-20 12:59:05 +00:00
Simon Pilgrim	6bbf39b48c	[X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm) Pulled out of D55894 to match the clang changes in D55890. Differential Revision: https://reviews.llvm.org/D55890 llvm-svn: 349744	2018-12-20 11:53:54 +00:00
Simon Pilgrim	e85ad60ee0	[X86] Update PADDSW/PSUBSW intrinsic usage with generic saturated intrinsics. As discussed on D55894, this makes no difference to the actual test. llvm-svn: 349742	2018-12-20 11:14:56 +00:00
Simon Pilgrim	6199784c5a	[X86] Change 'simple nonmem' intrinsic test to not use PADDSW Those intrinsics will be autoupgraded soon to @llvm.sadd.sat generics (D55894), so to keep a x86-specific case I'm replacing it with @llvm.x86.sse2.pmulhu.w llvm-svn: 349739	2018-12-20 10:54:59 +00:00
George Rimar	4ded77334e	[llvm-objcopy] - Do not drop the OS/ABI and ABIVersion fields of ELF header This is https://bugs.llvm.org/show_bug.cgi?id=40005, Patch teaches llvm-objcopy to preserve OS/ABI and ABIVersion fields of ELF header. (Currently, it drops them to zero). Differential revision: https://reviews.llvm.org/D55886 llvm-svn: 349738	2018-12-20 10:51:42 +00:00
George Rimar	6367d7a6d1	[yaml2obj/obj2yaml] - Support dumping/parsing ABI version. These tools were assuming ABI version is 0, that is not always true. Patch teaches them to work with that field. Differential revision: https://reviews.llvm.org/D55884 llvm-svn: 349737	2018-12-20 10:43:49 +00:00
Piotr Sobczak	deaacc17fe	[InstCombine][AMDGPU] Handle more buffer intrinsics Summary: Include the following intrinsics in the InsctCombine simplification: * amdgcn_raw_buffer_load * amdgcn_raw_buffer_load_format * amdgcn_struct_buffer_load * amdgcn_struct_buffer_load_format Change-Id: I14deceff74bcb21179baf6aa6e94bf39e7d63d5d Reviewers: arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55882 llvm-svn: 349735	2018-12-20 10:08:18 +00:00
Alexander Potapenko	0e3b85a730	[MSan] Don't emit __msan_instrument_asm_load() calls LLVM treats void* pointers passed to assembly routines as pointers to sized types. We used to emit calls to __msan_instrument_asm_load() for every such void*, which sometimes led to false positives. A less error-prone (and truly "conservative") approach is to unpoison only assembly output arguments. llvm-svn: 349734	2018-12-20 10:05:00 +00:00
Clement Courbet	e22cf4d7cb	Revert r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads." Forgot to update PowerPC tests for the GEP->bitcast change. llvm-svn: 349733	2018-12-20 09:58:33 +00:00
Clement Courbet	1bb6e1b0f2	[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads. Summary: This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp in just two loads on X86. These were previously calling memcmp. Reviewers: spatel, gchatelet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55263 llvm-svn: 349731	2018-12-20 09:13:47 +00:00
Eugene Leviant	2d98eb1b2e	[HWASAN] Add support for memory intrinsics Differential revision: https://reviews.llvm.org/D55117 llvm-svn: 349728	2018-12-20 09:04:33 +00:00
Kang Zhang	ca8db48974	[PowerPC] Implement the isSelectSupported() target hook Summary: PowerPC has scalar selects (isel) and vector mask selects (xxsel). But PowerPC does not have vector CR selects, PowerPC does not support scalar condition selects on vectors. In addition to implementing this hook, isSelectSupported() should return false when the SelectSupportKind is ScalarCondVectorVal, so that predictable selects are converted into branch sequences. Reviewed By: steven.zhang, hfinkel Differential Revision: https://reviews.llvm.org/D55754 llvm-svn: 349727	2018-12-20 06:19:59 +00:00
Michael Kruse	978ba61536	Introduce llvm.loop.parallel_accesses and llvm.access.group metadata. The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()). Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass. This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop). This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together. The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated. Differential Revision: https://reviews.llvm.org/D52116 llvm-svn: 349725	2018-12-20 04:58:07 +00:00
Thomas Lively	feb18fe927	[WebAssembly] Emit a splat for v128 IMPLICIT_DEF Summary: This is a code size savings and is also important to get runnable code while engines do not support v128.const. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55910 llvm-svn: 349724	2018-12-20 04:20:32 +00:00
Amara Emerson	321bfb210a	Fix build errors introduced by r349712 on aarch64 bots. llvm-svn: 349723	2018-12-20 03:27:42 +00:00
Thomas Lively	8dbf29af95	[WebAssembly] Gate unimplemented SIMD ops on flag Summary: Gates v128.const, f32x4.sqrt, f32x4.div, i8x16.extract_lane_u, and i16x8.extract_lane_u on the --wasm-enable-unimplemented-simd flag, since these ops are not implemented yet in V8. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55904 llvm-svn: 349720	2018-12-20 02:10:22 +00:00
Matt Arsenault	4339883710	AMDGPU: Make i1/i64/v2i32 and/or/xor legal The 64-bit types do depend on the register bank, but that's another issue to deal with later. llvm-svn: 349716	2018-12-20 01:35:49 +00:00
Matt Arsenault	8cc98bee8a	AMDGPU/GlobalISel: Fix ValueMapping tables for i1 This was incorrectly selecting SGPR for any i1 values, e.g. G_TRUNC to i1 from a VGPR was still an SGPR. llvm-svn: 349715	2018-12-20 01:33:43 +00:00
Amara Emerson	8cb186ce17	[AArch64][GlobalISel] Implement selection og G_MERGE of two s32s into s64. This code pattern is an unfortunate side effect of the way some types get split at call lowering. Ideally we'd either not generate it at all or combine it away in the legalizer artifact combiner. Until then, add selection support anyway which is a significant proportion of our current fallbacks on CTMark. rdar://46491420 llvm-svn: 349712	2018-12-20 01:11:04 +00:00
Matt Arsenault	dff33c38e1	AMDGPU/GlobalISel: RegBankSelect for fp conversions llvm-svn: 349709	2018-12-20 00:37:02 +00:00
Matt Arsenault	36d4092173	AMDGPU/GlobalISel: Legality/regbankselect for atomicrmw/atomic_cmpxchg llvm-svn: 349708	2018-12-20 00:33:49 +00:00
Vitaly Buka	07a55f27dc	[asan] Undo special treatment of linkonce_odr and weak_odr Summary: On non-Windows these are already removed by ShouldInstrumentGlobal. On Window we will wait until we get actual issues with that. Reviewers: pcc Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55899 llvm-svn: 349707	2018-12-20 00:30:27 +00:00
Vitaly Buka	d414e1bbb5	[asan] Prevent folding of globals with redzones Summary: ICF prevented by removing unnamed_addr and local_unnamed_addr for all sanitized globals. Also in general unnamed_addr is not valid here as address now is important for ODR violation detector and redzone poisoning. Before the patch ICF on globals caused: 1. false ODR reports when we register global on the same address more than once 2. globals buffer overflow if we fold variables of smaller type inside of large type. Then the smaller one will poison redzone which overlaps with the larger one. Reviewers: eugenis, pcc Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55857 llvm-svn: 349706	2018-12-20 00:30:18 +00:00
Rhys Perry	3931ad38b9	AMDGPU: Add patterns for v4i16/v4f16 -> v4i16/v4f16 bitcasts Reviewers: arsenm, tstellar Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55058 llvm-svn: 349694	2018-12-19 22:53:33 +00:00
Eli Friedman	a69084ffa8	[CodeGenPrepare] Fix bad IR created by large offset GEP splitting. Creating the IR builder, then modifying the CFG, leads to an IRBuilder where the BB and insertion point are inconsistent, so new instructions have the wrong parent. Modified an existing test because the test wasn't covering anything useful (the "invoke" was not actually an invoke by the time we hit the code in question). Differential Revision: https://reviews.llvm.org/D55729 llvm-svn: 349693	2018-12-19 22:52:04 +00:00
Evandro Menezes	7927a45cdb	[llvm-mca] Rename directory for the Cortex tests (NFC) llvm-svn: 349688	2018-12-19 22:24:42 +00:00
Evandro Menezes	7f37ec7cd3	[llvm-mca] Update Exynos test cases (NFC) llvm-svn: 349687	2018-12-19 22:24:39 +00:00
Nikita Popov	3817ee7908	Revert "[BDCE][DemandedBits] Detect dead uses of undead instructions" This reverts commit r349674. It causes a failure in test-suite enc-3des.execution_time. llvm-svn: 349684	2018-12-19 22:09:02 +00:00
Sanjay Patel	ca6434de37	[x86] add test to show ddup hole; NFC (PR37502) llvm-svn: 349680	2018-12-19 20:35:28 +00:00
Nikita Popov	649e125451	[BDCE][DemandedBits] Detect dead uses of undead instructions This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771. BDCE currently detects instructions that don't have any demanded bits and replaces their uses with zero. However, if an instruction has multiple uses, then some of the uses may be dead (have no demanded bits) even though the instruction itself is still live. This patch extends DemandedBits/BDCE to detect such uses and replace them with zero. While this will not immediately render any instructions dead, it may lead to simplifications (in the motivating case, by converting a rotate into a simple shift), break dependencies, etc. The implementation tries to strike a balance between analysis power and complexity/memory usage. Originally I wanted to track demanded bits on a per-use level, but ultimately we're only really interested in whether a use is entirely dead or not. I'm using an extra set to track which uses are dead. However, as initially all uses are dead, I'm not storing uses those user is also dead. This case is checked separately instead. The test case has a couple of cases that are not simplified yet. In particular, we're only looking at uses of instructions right now. I think it would make sense to also extend this to arguments. Furthermore DemandedBits doesn't yet know some of the tricks that InstCombine does for the demanded bits or bitwise or/and/xor in combination with known bits information. Differential Revision: https://reviews.llvm.org/D55563 llvm-svn: 349674	2018-12-19 19:56:21 +00:00
David Blaikie	ac69af7ad6	llvm-dwarfdump: Improve/fix pretty printing of array dimensions This is to address post-commit feedback from Paul Robinson on r348954. The original commit misinterprets count and upper bound as the same thing (I thought I saw GCC producing an upper bound the same as Clang's count, but GCC correctly produces an upper bound that's one less than the count (in C, that is, where arrays are zero indexed)). I want to preserve the C-like output for the common case, so in the absence of a lower bound the count (or one greater than the upper bound) is rendered between []. In the trickier cases, where a lower bound is specified, a half-open range is used (eg: lower bound 1, count 2 would be "[1, 3)" and an unknown parts use a '?' (eg: "[1, ?)" or "[?, 7)" or "[?, ? + 3)"). Reviewers: aprantl, probinson, JDevlieghere Differential Revision: https://reviews.llvm.org/D55721 llvm-svn: 349670	2018-12-19 19:34:24 +00:00
Matthew Voss	62fcfc5adb	[ThinLTO] Remove dllimport attribute from locally defined symbols Summary: The LTO/ThinLTO driver currently creates invalid bitcode by setting symbols marked dllimport as dso_local. The compiler often has access to the definition (often dllexport) and the declaration (often dllimport) of an object at link-time, leading to a conflicting declaration. This patch resolves the inconsistency by removing the dllimport attribute. Reviewers: tejohnson, pcc, rnk, echristo Reviewed By: rnk Subscribers: dmikulin, wristow, mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, llvm-commits Differential Revision: https://reviews.llvm.org/D55627 llvm-svn: 349667	2018-12-19 19:07:45 +00:00
Jessica Paquette	3560e93dc1	[GlobalISel][AArch64] Add support for @llvm.ceil This adds a G_FCEIL generic instruction and uses it in AArch64. This adds selection for floating point ceil where it has a supported, dedicated instruction. Other cases aren't handled here. It updates the relevant gisel tests and adds a select-ceil test. It also adds a check to arm64-vcvt.ll which ensures that we don't fall back when we run into one of the relevant cases. llvm-svn: 349664	2018-12-19 19:01:36 +00:00
Craig Topper	84a00bd98a	[X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing The (cmp (and X, Y) 0) pattern is greedy and ends up forming a TESTrr and consuming the and when it might be better to use one of the BMI/TBM like BLSR or BLSI. This patch moves removes the pattern from isel and adds a post processing check to combine TESTrr+ANDrr into just a TESTrr. With this patch we are able to select the BMI/TBM instructions, but we'll also emit a TESTrr when the result is compared to 0. In many cases the peephole pass will be able to use optimizeCompareInstr to remove the TEST, but its probably not perfect. Differential Revision: https://reviews.llvm.org/D55870 llvm-svn: 349661	2018-12-19 18:49:13 +00:00
Craig Topper	291470347a	[X86] Fix assert fails in pass X86AvoidSFBPass Fixes https://bugs.llvm.org/show_bug.cgi?id=38743 The function removeRedundantBlockingStores is supposed to remove any blocking stores contained in each other in lockingStoresDispSizeMap. But it currently looks only at the previous one, which will miss some cases that result in assert. This patch refine the function to check all previous layouts until find the uncontained one. So all redundant stores will be removed. Patch by Pengfei Wang Differential Revision: https://reviews.llvm.org/D55642 llvm-svn: 349660	2018-12-19 18:45:57 +00:00
Evandro Menezes	5d409b2278	[AArch64] Improve the Exynos M3 pipeline model llvm-svn: 349652	2018-12-19 17:37:51 +00:00
Evandro Menezes	1cfab9747d	[llvm-mca] Split test (NFC) Split the Exynos test of the register offset addressing mode into separate loads and stores tests. llvm-svn: 349651	2018-12-19 17:37:14 +00:00
Simon Pilgrim	420d1e12b6	Regenerate test llvm-svn: 349646	2018-12-19 17:24:34 +00:00
Simon Pilgrim	171f3aa012	[X86] Remove already upgraded llvm.x86.avx512.mask.padds/psubs tests Duplicate tests have already been moved to avx512bw-intrinsics-upgrade.ll llvm-svn: 349643	2018-12-19 17:18:27 +00:00
Yonghong Song	7b410ac352	[BPF] Generate BTF DebugInfo under BPF target This patch implements BTF (BPF Type Format). The BTF is the debug info format for BPF, introduced in the below linux patch: `69b693f0ae (diff-06fb1c8825f653d7e539058b72c83332)` and further extended several times, e.g., https://www.spinics.net/lists/netdev/msg534640.html https://www.spinics.net/lists/netdev/msg538464.html https://www.spinics.net/lists/netdev/msg540246.html The main advantage of implementing in LLVM is: . better integration/deployment as no extra tools are needed. . bpf JIT based compilation (like bcc, bpftrace, etc.) can get BTF without much extra effort. . BTF line_info needs selective source codes, which can be easily retrieved when inside the compiler. This patch implemented BTF generation by registering a BPF specific DebugHandler in BPFAsmPrinter. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D55752 llvm-svn: 349640	2018-12-19 16:40:25 +00:00
Peter Wu	f0ad811b54	[Object] Deduplicate long archive member names Summary: Import libraries as created by llvm-dlltool always use the same archive member name for every object file (namely, the DLL library name). Ensure that long names are not repeatedly stored in the string table. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D55860 llvm-svn: 349637	2018-12-19 16:15:05 +00:00
Amy Kwan	22cd453ba7	Test commit llvm-svn: 349633	2018-12-19 15:21:07 +00:00
Simon Pilgrim	7bfbf3caa4	[X86][SSE] Auto upgrade PADDUS/PSUBUS intrinsics to UADD_SAT/USUB_SAT generic intrinsics (llvm) Now that we use the generic ISD opcodes, we can use the generic intrinsics directly as well. This fixes the poor fast-isel codegen by not expanding to an easily broken IR code sequence. I'm intending to deal with the signed saturation equivalents as well. Clang counterpart: https://reviews.llvm.org/D55879 Differential Revision: https://reviews.llvm.org/D55855 llvm-svn: 349630	2018-12-19 14:43:36 +00:00
Simon Pilgrim	2ae3a91656	[SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 2 of 2) Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs. This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument. I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1\|c2) fold to demonstrate its use, which I believe is safe for undef cases. Differential Revision: https://reviews.llvm.org/D55822 llvm-svn: 349629	2018-12-19 14:09:38 +00:00
Simon Pilgrim	6c95bea072	[TargetLowering] Fix propagation of undefs in zero extension ops (PR40091) As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero. SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue. Thanks to @dmgreen for catching this. Differential Revision: https://reviews.llvm.org/D55883 llvm-svn: 349625	2018-12-19 13:37:59 +00:00
Nicolai Haehnle	fc2d26857e	Fix test MC/AMDGPU/reloc.s Missed this change in r349620 Change-Id: I5123e31ed4bb99ad6903b9ede4de4dbe2cc6d453 llvm-svn: 349622	2018-12-19 12:13:21 +00:00
Simon Pilgrim	ac62c8a3aa	[X86][SSE] Remove use of SSE ADDS/SUBS saturation intrinsics from schedule/stack tests These are due to be upgraded soon, but good to replace them with generic llvm sadd_sat/ssub_sat intrinsics now. The avx512 masked cases need doing as well but require a bit of tidyup first. llvm-svn: 349621	2018-12-19 12:00:25 +00:00
Nicolai Haehnle	8d5e974076	AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1 Summary: Using HI here makes no logical sense, since the dword is only 32 bits to begin with. Current Mesa master does not look at the relocation type at all, so this change is fine. Future Mesa will rely on this, however. Change-Id: I91085707834c4ac0370926602b93c94b90e44cb1 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55369 llvm-svn: 349620	2018-12-19 11:55:03 +00:00
Simon Pilgrim	2072b5afbe	[SelectionDAG] Optional handling of UNDEF elements in matchUnaryPredicate Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs. This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument. I've updated SelectionDAG::simplifyShift to demonstrate its use. Differential Revision: https://reviews.llvm.org/D55819 llvm-svn: 349616	2018-12-19 10:41:06 +00:00
Simon Pilgrim	d4b077698a	[X86][SSE] Remove SSE ADDUS/SUBUS saturation intrinsics from schedule/stack tests These are already being autoupgraded, currently to an IR sequence, but best to replace them with generic llvm uadd_sat/usub_sat intrinsics (which D55855 will be doing shortly anyhow). The avx512 masked cases need doing as well but require a bit of tidyup first. llvm-svn: 349615	2018-12-19 10:39:14 +00:00
George Rimar	6622d41e2c	[llvm-objdump] - Demangle the symbols when printing symbol table and relocations. This is https://bugs.llvm.org/show_bug.cgi?id=40009, llvm-objdump does not demangle the symbols when prints symbol table and/or relocations. Patch teaches it to do that. Differential revision: https://reviews.llvm.org/D55821 llvm-svn: 349613	2018-12-19 10:21:45 +00:00
Carl Ritson	c521ac3a44	AMDGPU/InsertWaitcnts: Update VGPR/SGPR bounds when brackets are merged Summary: Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged. This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds. Irreducible loop test has been extended based on a CTS failure detected for GFX9. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D55602 llvm-svn: 349611	2018-12-19 10:17:49 +00:00
Diana Picus	6c35a1e5af	[ARM GlobalISel] Support G_CONSTANT for Thumb2 All we have to do is mark it as legal. This allows us to select a lot of new patterns handled by TableGen. This patch adds tests for them and splits up the existing test file for binary operators into 2 files, one for arithmetic ops and one for logical ones. llvm-svn: 349610	2018-12-19 09:55:10 +00:00
Matt Arsenault	b110e2277c	AMDGPU/GlobalISel: Regbankselect for fsub llvm-svn: 349608	2018-12-19 09:07:58 +00:00
Martin Storsjo	e84a0b5a9e	[llvm-objcopy] Initial COFF support This is an initial implementation of no-op passthrough copying of COFF with objcopy. Differential Revision: https://reviews.llvm.org/D54939 llvm-svn: 349605	2018-12-19 07:24:38 +00:00
Brian Gesiak	274981eb83	[bugpoint][PR29027] Reduce function attributes Summary: In addition to reducing the functions in an LLVM module, bugpoint now reduces the function attributes associated with each of the remaining functions. To test this, add a -bugpoint-crashfuncattr test pass, which crashes if a function in the module has a "bugpoint-crash" attribute. A test case demonstrates that the IR is reduced to just that one attribute. Reviewers: MatzeB, silvas, davide, reames Reviewed By: reames Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D55216 llvm-svn: 349601	2018-12-19 03:42:19 +00:00
Kewen Lin	a6247e7cf4	[PowerPC]Exploit P9 vabsdu for unsigned vselect patterns For type v4i32/v8ii16/v16i8, do following transforms: (vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b) (vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b) (vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b) (vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b) Differential Revision: https://reviews.llvm.org/D55812 llvm-svn: 349599	2018-12-19 03:04:07 +00:00
Evandro Menezes	031abc2bd7	[llvm-mca] Improve test (NFC) Add more instruction variations for Exynos. llvm-svn: 349567	2018-12-18 23:19:52 +00:00
Sanjay Patel	bb3d3a8f2e	[InstCombine] add tests for extract of vector load; NFC There's a mismatch internally about how we are handling these patterns. We count loads as cheapToScalarize(), but then we don't actually scalarize them, so that can leave extra instructions compared to where we started when scalarizing other ops. If it's cheapToScalarize, then we should be scalarizing. llvm-svn: 349560	2018-12-18 22:51:06 +00:00
Pete Cooper	a3e0be109c	Preserve the linkage for objc* intrinsics as clang will set them to weak_external in some cases Clang uses weak linkage for objc runtime functions when they are not available on the platform. The intrinsic has this linkage so we just need to pass that on to the runtime call. llvm-svn: 349559	2018-12-18 22:42:08 +00:00
Pete Cooper	d0ffdf8782	Add nonlazybind to objc_retain/objc_release when converting from intrinsics. For performance reasons, clang set nonlazybind on these functions. Now that we are using intrinsics instead of runtime calls, we should set this attribute when creating the runtime functions. llvm-svn: 349558	2018-12-18 22:31:34 +00:00
Florian Hahn	485f2826ba	[LAA] Introduce enum for vectorization safety status (NFC). This patch adds a VectorizationSafetyStatus enum, which will be extended in a follow up patch to distinguish between 'safe with runtime checks' and 'known unsafe' dependences. Reviewers: anemet, anna, Ayal, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D54892 llvm-svn: 349556	2018-12-18 22:25:11 +00:00
Vitaly Buka	4e4920694c	[asan] Restore ODR-violation detection on vtables Summary: unnamed_addr is still useful for detecting of ODR violations on vtables Still unnamed_addr with lld and --icf=safe or --icf=all can trigger false reports which can be avoided with --icf=none or by using private aliases with -fsanitize-address-use-odr-indicator Reviewers: eugenis Reviewed By: eugenis Subscribers: kubamracek, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55799 llvm-svn: 349555	2018-12-18 22:23:30 +00:00
Sanjay Patel	608d128c42	[LoopVectorize] auto-generate complete checks; NFC The first test claims to show that the vectorizer will generate a vector load/loop, but then this file runs other passes which might scalarize that op. I'm removing instcombine from the RUN line here to break that dependency. Also, I'm generating full checks to make it clear exactly what the vectorizer has done. llvm-svn: 349554	2018-12-18 22:23:04 +00:00
Pete Cooper	f86db5ce9e	Rewrite objc intrinsics to runtime methods in PreISelIntrinsicLowering instead of SDAG. SelectionDAG currently changes these intrinsics to function calls, but that won't work for other ISel's. Also we want to eventually support nonlazybind and weak linkage coming from the front-end which we can't do in SelectionDAG. llvm-svn: 349552	2018-12-18 22:20:03 +00:00
Martin Storsjo	df20c666d6	[AArch64] Avoid crashing on .seh directives in assembly Differential Revision: https://reviews.llvm.org/D55670 llvm-svn: 349549	2018-12-18 22:10:17 +00:00
Sanjay Patel	84758ead22	[InstCombine] auto-generate complete checks; NFC llvm-svn: 349548	2018-12-18 22:09:15 +00:00
Kuba Mracek	3760fc9f3d	[asan] In llvm.asan.globals, allow entries to be non-GlobalVariable and skip over them Looks like there are valid reasons why we need to allow bitcasts in llvm.asan.globals, see discussion at https://github.com/apple/swift-llvm/pull/133. Let's look through bitcasts when iterating over entries in the llvm.asan.globals list. Differential Revision: https://reviews.llvm.org/D55794 llvm-svn: 349544	2018-12-18 21:20:17 +00:00
Simon Pilgrim	73c8685295	[AARCH64] Added test case for PR40091 llvm-svn: 349543	2018-12-18 21:05:22 +00:00
Evandro Menezes	4bfd4ce1bc	[llvm-mca] Update the Exynos test cases (NFC) Add more entropy to the test cases. llvm-svn: 349537	2018-12-18 20:46:03 +00:00
Pete Cooper	be4f571107	Change the objc ARC optimizer to use the new objc.* intrinsics We're moving ARC optimisation and ARC emission in clang away from runtime methods and towards intrinsics. This is the part which actually uses the intrinsics in the ARC optimizer when both analyzing the existing calls and emitting new ones. Differential Revision: https://reviews.llvm.org/D55348 Reviewers: ahatanak llvm-svn: 349534	2018-12-18 20:32:49 +00:00
Craig Topper	18a9d545e1	[X86] Add BSR to isUseDefConvertible. We already had BSF here as part of __builtin_ffs improvements and I was just wondering yesterday whether we should have BSR there. This addresses one issue from PR40090. llvm-svn: 349531	2018-12-18 20:03:54 +00:00
Nikita Popov	20853a7807	[InstCombine] Simplify cttz/ctlz + icmp eq/ne into mask check Checking whether a number has a certain number of trailing / leading zeros means checking whether it is of the form XXXX1000 / 0001XXXX, which can be done with an and+icmp. Related to https://bugs.llvm.org/show_bug.cgi?id=28668. As a next step, this can be extended to non-equality predicates. Differential Revision: https://reviews.llvm.org/D55745 llvm-svn: 349530	2018-12-18 19:59:50 +00:00
Farhana Aleen	59ee2c5362	[AMDGPU] Removed the unnecessary operand size-check-assert from processBaseWithConstOffset(). Summary: 32bit operand sizes are guaranteed by the opcode check AMDGPU::V_ADD_I32_e64 and AMDGPU::V_ADDC_U32_e64. Therefore, we don't any additional operand size-check-assert. Author: FarhanaAleen llvm-svn: 349529	2018-12-18 19:58:39 +00:00
David Blaikie	693f617763	DebugInfo: Fix missing local imported entities after r349207 Post commit review/bug reported by Pavel Labath - thanks! llvm-svn: 349528	2018-12-18 19:40:22 +00:00
Nikita Popov	f6058ff140	[X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for @llvm.sadd.sat() and @llvm.ssub.sat() intrinsics. This is a followup to D55787 and part of PR40056. Differential Revision: https://reviews.llvm.org/D55833 llvm-svn: 349520	2018-12-18 18:28:22 +00:00
Craig Topper	20a6db5a84	[X86] Create PSUBUS from (add (umax X, C), -C) InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant. This patch recognizes this pattern and turns it into PSUBUS. Future work could improve undef element handling. Fixes some of PR40053 Differential Revision: https://reviews.llvm.org/D55780 llvm-svn: 349519	2018-12-18 18:26:25 +00:00
Sanjay Patel	e0afd278b1	[InstCombine] add tests for scalarization; NFC We miss pattern matching a splat constant if it has undef elements. llvm-svn: 349515	2018-12-18 17:56:59 +00:00
Michael Berg	c6a5245cf7	Add FMF management to common fp intrinsics in GlobalIsel Summary: This the initial code change to facilitate managing FMF flags from Instructions to MI wrt Intrinsics in Global Isel. Eventually the GlobalObserver interface will be added as well, where FMF additions can be tracked for the builder and CSE. Reviewers: aditya_nandakumar, bogner Reviewed By: bogner Subscribers: rovka, kristof.beyls, javed.absar Differential Revision: https://reviews.llvm.org/D55668 llvm-svn: 349514	2018-12-18 17:54:52 +00:00
Simon Pilgrim	1411917431	[X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for constant rotation amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion. llvm-svn: 349510	2018-12-18 17:31:11 +00:00
Michael Kruse	3284775b70	[LoopUnroll] Honor '#pragma unroll' even with -fno-unroll-loops. When using clang with `-fno-unroll-loops` (implicitly added with `-O1`), the LoopUnrollPass is not not added to the (legacy) pass pipeline. This also means that it will not process any loop metadata such as llvm.loop.unroll.enable (which is generated by #pragma unroll or WarnMissedTransformationsPass emits a warning that a forced transformation has not been applied (see https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html). Such explicit transformations should take precedence over disabling heuristics. This patch unconditionally adds LoopUnrollPass to the optimizing pipeline (that is, it is still not added with `-O0`), but passes a flag indicating whether automatic unrolling is dis-/enabled. This is the same approach as LoopVectorize uses. The new pass manager's pipeline builder has no option to disable unrolling, hence the problem does not apply. Differential Revision: https://reviews.llvm.org/D55716 llvm-svn: 349509	2018-12-18 17:16:05 +00:00
Simon Pilgrim	e9effe9744	[X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for splat rotation amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion. llvm-svn: 349500	2018-12-18 16:02:23 +00:00
Petar Avramovic	0a5e4eb776	[MIPS GlobalISel] Select G_SDIV, G_UDIV, G_SREM and G_UREM Add support for s64 libcalls for G_SDIV, G_UDIV, G_SREM and G_UREM and use integer type of correct size when creating arguments for CLI.lowerCall. Select G_SDIV, G_UDIV, G_SREM and G_UREM for types s8, s16, s32 and s64 on MIPS32. Differential Revision: https://reviews.llvm.org/D55651 llvm-svn: 349499	2018-12-18 15:59:51 +00:00
Simon Pilgrim	be0fbe673e	[X86][SSE] Add shift combine 'out of range' tests with UNDEFs Shows failure to simplify out of range shift amounts to UNDEF if any element is UNDEF. llvm-svn: 349483	2018-12-18 13:37:04 +00:00
Nikita Popov	665ab08178	[X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes UADDSAT and USUBSAT. As a side-effect, this also makes codegen for the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable. This only replaces use in the X86 backend, and does not move any of the ADDUS/SUBUS X86 specific combines into generic codegen. Differential Revision: https://reviews.llvm.org/D55787 llvm-svn: 349481	2018-12-18 13:23:03 +00:00
Nikita Popov	a7d2a235bb	[SelectionDAG][X86] Fix [US](ADD\|SUB)SAT vector legalization, add tests Integer result promotion needs to use the scalar size, and we need support for result widening. This is in preparation for D55787. llvm-svn: 349480	2018-12-18 13:22:53 +00:00
George Rimar	1ec49110a7	[llvm-dwarfdump] - Do not error out on R_X86_64_DTPOFF64/R_X86_64_DTPOFF32 relocations. This is https://bugs.llvm.org/show_bug.cgi?id=39992, If we have the following code (test.cpp): thread_local int tdata = 24; and build an .o file with debug information: clang --target=x86_64-pc-linux -c bar.cpp -g Then object produced may have R_X86_64_DTPOFF64/R_X86_64_DTPOFF32 relocations. (clang emits R_X86_64_DTPOFF64 and gcc emits R_X86_64_DTPOFF32 for the code above for me) Currently, llvm-dwarfdump fails to compute this TLS relocation when dumping object and reports an error: failed to compute relocation: R_X86_64_DTPOFF64, Invalid data was encountered while parsing the file This relocation represents the offset in the TLS block and resolved by the linker, but this info is unavailable at the point when the object file is dumped by this tool. The patch adds the simple evaluation for such relocations to avoid emitting errors. Resulting behavior seems to be equal to GNU dwarfdump. Differential revision: https://reviews.llvm.org/D55762 llvm-svn: 349476	2018-12-18 12:15:01 +00:00
Petar Avramovic	150fd430f6	[MIPS GlobalISel] ClampScalar G_AND G_OR and G_XOR Add narrowScalar for G_AND and G_XOR. Legalize G_AND G_OR and G_XOR for types other then s32 with clampScalar on MIPS32. Differential Revision: https://reviews.llvm.org/D55362 llvm-svn: 349475	2018-12-18 11:36:14 +00:00
Luke Cheeseman	f57d7d8237	[AArch64] - Return address signing dwarf support - Reapply changes intially introduced in r343089 - The archtecture info is no longer loaded whenever a DWARFContext is created - The runtimes libraries (santiziers) make use of the dwarf context classes but do not intialise the target info - The architecture of the object can be obtained without loading the target info - Adding a method to the dwarf context to get this information and multiplex the string printing later on Differential Revision: https://reviews.llvm.org/D55774 llvm-svn: 349472	2018-12-18 10:37:42 +00:00
Simon Pilgrim	ba8e84b31c	[X86][AVX] Add 256/512-bit vector funnel shift tests Extra coverage for D55747 llvm-svn: 349471	2018-12-18 10:32:54 +00:00
Simon Pilgrim	46b90e851b	[X86][SSE] Add 128-bit vector funnel shift tests Extra coverage for D55747 llvm-svn: 349470	2018-12-18 10:08:23 +00:00
Dylan McKay	f920da009e	[IPO][AVR] Create new Functions in the default address space specified in the data layout This modifies the IPO pass so that it respects any explicit function address space specified in the data layout. In targets with nonzero program address spaces, all functions should, by default, be placed into the default program address space. This is required for Harvard architectures like AVR. Without this, the functions will be marked as residing in data space, and thus not be callable. This has no effect to any in-tree official backends, as none use an explicit program address space in their data layouts. Patch by Tim Neumann. llvm-svn: 349469	2018-12-18 09:52:52 +00:00
Matt Arsenault	c94e26c71d	AMDGPU: Legalize/regbankselect frame_index llvm-svn: 349468	2018-12-18 09:46:13 +00:00
Matt Arsenault	c0ea221068	AMDGPU: Legalize/regbankselect fma llvm-svn: 349467	2018-12-18 09:39:56 +00:00
Simon Pilgrim	af6fbbf18b	[TargetLowering] Fallback from SimplifyDemandedVectorElts to SimplifyDemandedBits For opcodes not covered by SimplifyDemandedVectorElts, SimplifyDemandedBits might be able to help now that it supports demanded elts as well. llvm-svn: 349466	2018-12-18 09:33:25 +00:00
Tim Northover	856628f707	SROA: preserve alignment tags on loads and stores. When splitting up an alloca's uses we were dropping any explicit alignment tags, which means they default to the ABI-required default alignment and this can cause miscompiles if the real value was smaller. Also refactor the TBAA metadata into a parent class since it's shared by both children anyway. llvm-svn: 349465	2018-12-18 09:29:39 +00:00
Matt Arsenault	e01e7c81f2	AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsub llvm-svn: 349463	2018-12-18 09:19:03 +00:00
Simon Pilgrim	26c630f416	[X86][SSE] Replace (VSRLI (VSRAI X, Y), 31) -> (VSRLI X, 31) fold. This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same). Test change is merely a rescheduling. llvm-svn: 349459	2018-12-18 08:55:47 +00:00
Kristof Beyls	e66bc1f756	Introduce control flow speculation tracking pass for AArch64 The pass implements tracking of control flow miss-speculation into a "taint" register. That taint register can then be used to mask off registers with sensitive data when executing under miss-speculation, a.k.a. "transient execution". This pass is aimed at mitigating against SpectreV1-style vulnarabilities. At the moment, it implements the tracking of miss-speculation of control flow into a taint register, but doesn't implement a mechanism yet to then use that taint register to mask off vulnerable data in registers (something for a follow-on improvement). Possible strategies to mask out vulnerable data that can be implemented on top of this are: - speculative load hardening to automatically mask of data loaded in registers. - using intrinsics to mask of data in registers as indicated by the programmer (see https://lwn.net/Articles/759423/). For AArch64, the following implementation choices are made. Some of these are different than the implementation choices made in the similar pass implemented in X86SpeculativeLoadHardening.cpp, as the instruction set characteristics result in different trade-offs. - The speculation hardening is done after register allocation. With a relative abundance of registers, one register is reserved (X16) to be the taint register. X16 is expected to not clash with other register reservation mechanisms with very high probability because: . The AArch64 ABI doesn't guarantee X16 to be retained across any call. . The only way to request X16 to be used as a programmer is through inline assembly. In the rare case a function explicitly demands to use X16/W16, this pass falls back to hardening against speculation by inserting a DSB SYS/ISB barrier pair which will prevent control flow speculation. - It is easy to insert mask operations at this late stage as we have mask operations available that don't set flags. - The taint variable contains all-ones when no miss-speculation is detected, and contains all-zeros when miss-speculation is detected. Therefore, when masking, an AND instruction (which only changes the register to be masked, no other side effects) can easily be inserted anywhere that's needed. - The tracking of miss-speculation is done by using a data-flow conditional select instruction (CSEL) to evaluate the flags that were also used to make conditional branch direction decisions. Speculation of the CSEL instruction can be limited with a CSDB instruction - so the combination of CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL aren't speculated. When conditional branch direction gets miss-speculated, the semantics of the inserted CSEL instruction is such that the taint register will contain all zero bits. One key requirement for this to work is that the conditional branch is followed by an execution of the CSEL instruction, where the CSEL instruction needs to use the same flags status as the conditional branch. This means that the conditional branches must not be implemented as one of the AArch64 conditional branches that do not use the flags as input (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction selectors to not produce these instructions when speculation hardening is enabled. This pass will assert if it does encounter such an instruction. - On function call boundaries, the miss-speculation state is transferred from the taint register X16 to be encoded in the SP register as value 0. Future extensions/improvements could be: - Implement this functionality using full speculation barriers, akin to the x86-slh-lfence option. This may be more useful for the intrinsics-based approach than for the SLH approach to masking. Note that this pass already inserts the full speculation barriers if the function for some niche reason makes use of X16/W16. - no indirect branch misprediction gets protected/instrumented; but this could be done for some indirect branches, such as switch jump tables. Differential Revision: https://reviews.llvm.org/D54896 llvm-svn: 349456	2018-12-18 08:50:02 +00:00
Martin Storsjo	8f0cb9c3a8	[AArch64] [MinGW] Allow enabling SEH exceptions The default still is dwarf, but SEH exceptions can now be enabled optionally for the MinGW target. Differential Revision: https://reviews.llvm.org/D55748 llvm-svn: 349451	2018-12-18 08:32:37 +00:00
Craig Topper	284d426f6d	[X86] Add test cases to show isel failing to match BMI blsmsk/blsi/blsr when the flag result is used. A similar things happen to TBM instructions which we already have tests for. llvm-svn: 349450	2018-12-18 08:26:01 +00:00
Kewen Lin	bbb461f758	[PowerPC][NFC]Update vabsd cases with vselect test cases Power9 VABSDU* instructions can be exploited for some special vselect sequences. Check in the orignal test case here, later the exploitation patch will update this and reviewers can check the differences easily. llvm-svn: 349446	2018-12-18 08:11:32 +00:00
Kewen Lin	44ace92596	[PowerPC] Exploit power9 new instruction setb Check the expected pattens feeding to SELECT_CC like: (select_cc lhs, rhs, 1, (sext (setcc [lr]hs, [lr]hs, cc2)), cc1) (select_cc lhs, rhs, -1, (zext (setcc [lr]hs, [lr]hs, cc2)), cc1) (select_cc lhs, rhs, 0, (select_cc [lr]hs, [lr]hs, 1, -1, cc2), seteq) (select_cc lhs, rhs, 0, (select_cc [lr]hs, [lr]hs, -1, 1, cc2), seteq) Further transform the sequence to comparison + setb if hits. Differential Revision: https://reviews.llvm.org/D53275 llvm-svn: 349445	2018-12-18 07:53:26 +00:00
QingShan Zhang	ecdab5bdd8	[NFC] Add new test to cover the lhs scheduling issue for P9. llvm-svn: 349443	2018-12-18 06:32:42 +00:00
Craig Topper	4adf9ca738	[X86] Add test case for PR40060. NFC llvm-svn: 349441	2018-12-18 04:58:07 +00:00
QingShan Zhang	f549812599	[NFC] fix test case issue that with wrong label check. llvm-svn: 349439	2018-12-18 04:25:41 +00:00
Kewen Lin	3dac1252da	[PowerPC] Improve vec_abs on P9 Improve the current vec_abs support on P9, generate ISD::ABS node for vector types, combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns, do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity. Differential Revision: https://reviews.llvm.org/D54783 llvm-svn: 349437	2018-12-18 03:16:43 +00:00
Joel E. Denny	c646b4b05e	[FileCheck] Try to fix test on windows due to r349418 llvm-svn: 349432	2018-12-18 01:17:28 +00:00
Reid Kleckner	53ce05960e	[codeview] Align symbol records to save 441MB during linking clang.pdb In PDBs, symbol records must be aligned to four bytes. However, in the object file, symbol records may not be aligned. MSVC does not pad out symbol records to make sure they are aligned. That means the linker has to do extra work to insert the padding. Currently, LLD calculates the required space with alignment, and copies each record one at a time while padding them out to the correct size. It has a fast path that avoids this copy when the records are already aligned. This change fixes a bug in that codepath so that the copy is actually saved, and tweaks LLVM's symbol record emission to align symbol records. Here's how things compare when doing a plain clang Release+PDB build: - objs are 0.65% bigger (negligible) - link is 3.3% faster (negligible) - saves allocating 441MB - new LLD high water mark is ~1.05GB llvm-svn: 349431	2018-12-18 01:14:05 +00:00
David Blaikie	c4e08feb00	Recommit r348806: DebugInfo: Use symbol difference for CU length to simplify assembly reading/editing Mucking about simplifying a test case ( https://reviews.llvm.org/D55261 ) I stumbled across something I've hit before - that LLVM's (GCC's does too, FWIW) assembly output includes a hardcode length for a DWARF unit in its header. Instead we could emit a label difference - making the assembly easier to read/edit (though potentially at a slight (I haven't tried to observe it) performance cost of delaying/sinking the length computation into the MC layer). Fix: Predicated all the changes (including creating the labels, even if they aren't used/needed) behind the NVPTX useSectionsAsReferences, avoiding emitting labels in NVPTX where ptxas can't parse them. Reviewers: JDevlieghere, probinson, ABataev Differential Revision: https://reviews.llvm.org/D55281 llvm-svn: 349430	2018-12-18 01:06:09 +00:00
Joel E. Denny	e2afb61499	[FileCheck] Annotate input dump (final tweaks) Apply final suggestions from probinson for this patch series plus a few more tweaks: * Improve various docs, for MatchType in particular. * Rename some members of MatchType. The main problem was that the term "final match" became a misnomer when CHECK-COUNT-<N> was created. * Split InputStartLine, etc. declarations into multiple lines. Differential Revision: https://reviews.llvm.org/D55738 Reviewed By: probinson llvm-svn: 349425	2018-12-18 00:03:51 +00:00
Joel E. Denny	96f0e84ccf	[FileCheck] Annotate input dump (7/7) This patch implements annotations for diagnostics reporting CHECK-NOT failed matches. These diagnostics are enabled by -vv. As for diagnostics reporting failed matches for other directives, these annotations mark the search ranges using `X~~`. The difference here is that failed matches for CHECK-NOT are successes not errors, so they are green not red when colors are enabled. For example: ``` $ FileCheck -dump-input=help The following description was requested by -dump-input=help to explain the input annotations printed by -dump-input=always and -dump-input=fail: - L: labels line number L of the input file - T:L labels the only match result for a pattern of type T from line L of the check file - T:L'N labels the Nth match result for a pattern of type T from line L of the check file - ^~~ marks good match (reported if -v) - !~~ marks bad match, such as: - CHECK-NEXT on same line as previous match (error) - CHECK-NOT found (error) - CHECK-DAG overlapping match (discarded, reported if -vv) - X~~ marks search range when no match is found, such as: - CHECK-NEXT not found (error) - CHECK-NOT not found (success, reported if -vv) - CHECK-DAG not found after discarded matches (error) - ? marks fuzzy match when no match is found - colors success, error, fuzzy match, discarded match, unmatched input If you are not seeing color above or in input dumps, try: -color $ FileCheck -vv -dump-input=always check5 < input5 \|& sed -n '/^<<<</,$p' <<<<<< 1: abcdef check:1 ^~~ not:2 X~~ 2: ghijkl not:2 ~~~ check:3 ^~~ 3: mnopqr not:4 X~~~~~ 4: stuvwx not:4 ~~~~~~ 5: eof:4 ^ >>>>>> $ cat check5 CHECK: abc CHECK-NOT: foobar CHECK: jkl CHECK-NOT: foobar $ cat input5 abcdef ghijkl mnopqr stuvwx ``` Reviewed By: george.karpenkov, probinson Differential Revision: https://reviews.llvm.org/D53899 llvm-svn: 349424	2018-12-18 00:03:36 +00:00
Joel E. Denny	f7c1c4d8a4	[FileCheck] Annotate input dump (6/7) This patch implements input annotations for diagnostics reporting CHECK-DAG discarded matches. These diagnostics are enabled by -vv. These annotations mark discarded match ranges using `!~~` because they are bad matches even though they are not errors. CHECK-DAG discarded matches create another case where there can be multiple match results for the same directive. For example: ``` $ FileCheck -dump-input=help The following description was requested by -dump-input=help to explain the input annotations printed by -dump-input=always and -dump-input=fail: - L: labels line number L of the input file - T:L labels the only match result for a pattern of type T from line L of the check file - T:L'N labels the Nth match result for a pattern of type T from line L of the check file - ^~~ marks good match (reported if -v) - !~~ marks bad match, such as: - CHECK-NEXT on same line as previous match (error) - CHECK-NOT found (error) - CHECK-DAG overlapping match (discarded, reported if -vv) - X~~ marks search range when no match is found, such as: - CHECK-NEXT not found (error) - CHECK-DAG not found after discarded matches (error) - ? marks fuzzy match when no match is found - colors success, error, fuzzy match, discarded match, unmatched input If you are not seeing color above or in input dumps, try: -color $ FileCheck -vv -dump-input=always check4 < input4 \|& sed -n '/^<<<</,$p' <<<<<< 1: abcdef dag:1 ^~~~ dag:2'0 !~~~ discard: overlaps earlier match 2: cdefgh dag:2'1 ^~~~ check:3 X~ error: no match found >>>>>> $ cat check4 CHECK-DAG: abcd CHECK-DAG: cdef CHECK: efgh $ cat input4 abcdef cdefgh ``` This shows that the line 3 CHECK fails to match even though its pattern appears in the input because its search range starts after the line 2 CHECK-DAG's match range. The trouble might be that the line 2 CHECK-DAG's match range is later than expected because its first match range overlaps with the line 1 CHECK-DAG match range and thus is discarded. Because `!~~` for CHECK-DAG does not indicate an error, it is not colored red. Instead, when colors are enabled, it is colored cyan, which suggests a match that went cold. Reviewed By: george.karpenkov, probinson Differential Revision: https://reviews.llvm.org/D53898 llvm-svn: 349423	2018-12-18 00:03:19 +00:00
Joel E. Denny	7df86967b4	[FileCheck] Annotate input dump (5/7) This patch implements input annotations for diagnostics enabled by -v, which report good matches for directives. These annotations mark match ranges using `^~~`. For example: ``` $ FileCheck -dump-input=help The following description was requested by -dump-input=help to explain the input annotations printed by -dump-input=always and -dump-input=fail: - L: labels line number L of the input file - T:L labels the only match result for a pattern of type T from line L of the check file - T:L'N labels the Nth match result for a pattern of type T from line L of the check file - ^~~ marks good match (reported if -v) - !~~ marks bad match, such as: - CHECK-NEXT on same line as previous match (error) - CHECK-NOT found (error) - X~~ marks search range when no match is found, such as: - CHECK-NEXT not found (error) - ? marks fuzzy match when no match is found - colors success, error, fuzzy match, unmatched input If you are not seeing color above or in input dumps, try: -color $ FileCheck -v -dump-input=always check3 < input3 \|& sed -n '/^<<<</,$p' <<<<<< 1: abc foobar def check:1 ^~~ not:2 !~~~~~ error: no match expected check:3 ^~~ >>>>>> $ cat check3 CHECK: abc CHECK-NOT: foobar CHECK: def $ cat input3 abc foobar def ``` -vv enables these annotations for FileCheck's implicit EOF patterns as well. For an example where EOF patterns become relevant, see patch 7 in this series. If colors are enabled, `^~~` is green to suggest success. -v plus color enables highlighting of input text that has no final match for any expected pattern. The highlight uses a cyan background to suggest a cold section. This highlighting can make it easier to spot text that was intended to be matched but that failed to be matched in a long series of good matches. CHECK-COUNT-<num> good matches are another case where there can be multiple match results for the same directive. Reviewed By: george.karpenkov, probinson Differential Revision: https://reviews.llvm.org/D53897 llvm-svn: 349422	2018-12-18 00:03:03 +00:00
Joel E. Denny	0e7e3fa0e9	[FileCheck] Annotate input dump (4/7) This patch implements input annotations for diagnostics that report unexpected matches for CHECK-NOT. Like wrong-line matches for CHECK-NEXT, CHECK-SAME, and CHECK-EMPTY, these annotations mark match ranges using red `!~~` to indicate bad matches that are errors. For example: ``` $ FileCheck -dump-input=help The following description was requested by -dump-input=help to explain the input annotations printed by -dump-input=always and -dump-input=fail: - L: labels line number L of the input file - T:L labels the only match result for a pattern of type T from line L of the check file - T:L'N labels the Nth match result for a pattern of type T from line L of the check file - !~~ marks bad match, such as: - CHECK-NEXT on same line as previous match (error) - CHECK-NOT found (error) - X~~ marks search range when no match is found, such as: - CHECK-NEXT not found (error) - ? marks fuzzy match when no match is found - colors error, fuzzy match If you are not seeing color above or in input dumps, try: -color $ FileCheck -v -dump-input=always check3 < input3 \|& sed -n '/^<<<</,$p' <<<<<< 1: abc foobar def not:2 !~~~~~ error: no match expected >>>>>> $ cat check3 CHECK: abc CHECK-NOT: foobar CHECK: def $ cat input3 abc foobar def ``` Reviewed By: george.karpenkov, probinson Differential Revision: https://reviews.llvm.org/D53896 llvm-svn: 349421	2018-12-18 00:02:47 +00:00
Joel E. Denny	cadfcef493	[FileCheck] Annotate input dump (3/7) This patch implements input annotations for diagnostics that report wrong-line matches for the directives CHECK-NEXT, CHECK-SAME, and CHECK-EMPTY. Instead of the usual `^~~`, which is used by later patches for good matches, these annotations use `!~~` to mark the bad match ranges so that this category of errors is visually distinct. Because such matches are errors, these annotates are red when colors are enabled. For example: ``` $ FileCheck -dump-input=help The following description was requested by -dump-input=help to explain the input annotations printed by -dump-input=always and -dump-input=fail: - L: labels line number L of the input file - T:L labels the only match result for a pattern of type T from line L of the check file - T:L'N labels the Nth match result for a pattern of type T from line L of the check file - !~~ marks bad match, such as: - CHECK-NEXT on same line as previous match (error) - X~~ marks search range when no match is found, such as: - CHECK-NEXT not found (error) - ? marks fuzzy match when no match is found - colors error, fuzzy match If you are not seeing color above or in input dumps, try: -color $ FileCheck -v -dump-input=always check2 < input2 \|& sed -n '/^<<<</,$p' <<<<<< 1: foo bar next:2 !~~ error: match on wrong line >>>>>> $ cat check2 CHECK: foo CHECK-NEXT: bar $ cat input2 foo bar ``` Reviewed By: george.karpenkov, probinson Differential Revision: https://reviews.llvm.org/D53894 llvm-svn: 349420	2018-12-18 00:02:22 +00:00
Joel E. Denny	2c007c807d	[FileCheck] Annotate input dump (2/7) This patch implements input annotations for diagnostics that suggest fuzzy matches for directives for which no matches were found. Instead of using the usual `^~~`, which is used by later patches for good matches, these annotations use `?` so that fuzzy matches are visually distinct. No tildes are included as these diagnostics (independently of this patch) currently identify only the start of the match. For example: ``` $ FileCheck -dump-input=help The following description was requested by -dump-input=help to explain the input annotations printed by -dump-input=always and -dump-input=fail: - L: labels line number L of the input file - T:L labels the only match result for a pattern of type T from line L of the check file - T:L'N labels the Nth match result for a pattern of type T from line L of the check file - X~~ marks search range when no match is found - ? marks fuzzy match when no match is found - colors error, fuzzy match If you are not seeing color above or in input dumps, try: -color $ FileCheck -v -dump-input=always check1 < input1 \|& sed -n '/^<<<</,$p' <<<<<< 1: ; abc def 2: ; ghI jkl next:3'0 X~~~~~~~~ error: no match found next:3'1 ? possible intended match >>>>>> $ cat check1 CHECK: abc CHECK-SAME: def CHECK-NEXT: ghi CHECK-SAME: jkl $ cat input1 ; abc def ; ghI jkl ``` This patch introduces the concept of multiple "match results" per directive. In the above example, the first match result for the CHECK-NEXT directive is the failed match, for which the annotation shows the search range. The second match result is the fuzzy match. Later patches will introduce other cases of multiple match results per directive. When colors are enabled, `?` is colored magenta. That is, it doesn't indicate the actual error, which a red `X~~` marker indicates, but its color suggests it's closely related. Reviewed By: george.karpenkov, probinson Differential Revision: https://reviews.llvm.org/D53893 llvm-svn: 349419	2018-12-18 00:02:04 +00:00
Joel E. Denny	3c5d267eb7	[FileCheck] Annotate input dump (1/7) Extend FileCheck to dump its input annotated with FileCheck's diagnostics: errors, good matches if -v, and additional information if -vv. The goal is to make it easier to visualize FileCheck's matching behavior when debugging. Each patch in this series implements input annotations for a particular category of FileCheck diagnostics. While the first few patches alone are somewhat useful, the annotations become much more useful as later patches implement annotations for -v and -vv diagnostics, which show the matching behavior leading up to the error. This first patch implements boilerplate plus input annotations for error diagnostics reporting that no matches were found for a directive. These annotations mark the search ranges of the failed directives. Instead of using the usual `^~~`, which is used by later patches for good matches, these annotations use `X~~` so that this category of errors is visually distinct. For example: ``` $ FileCheck -dump-input=help The following description was requested by -dump-input=help to explain the input annotations printed by -dump-input=always and -dump-input=fail: - L: labels line number L of the input file - T:L labels the match result for a pattern of type T from line L of the check file - X~~ marks search range when no match is found - colors error If you are not seeing color above or in input dumps, try: -color $ FileCheck -v -dump-input=always check1 < input1 \|& sed -n '/^Input file/,$p' Input file: <stdin> Check file: check1 -dump-input=help describes the format of the following dump. Full input was: <<<<<< 1: ; abc def 2: ; ghI jkl next:3 X~~~~~~~~ error: no match found >>>>>> $ cat check1 CHECK: abc CHECK-SAME: def CHECK-NEXT: ghi CHECK-SAME: jkl $ cat input1 ; abc def ; ghI jkl ``` Some additional details related to the boilerplate: * Enabling: The annotated input dump is enabled by `-dump-input`, which can also be set via the `FILECHECK_OPTS` environment variable. Accepted values are `help`, `always`, `fail`, or `never`. As shown above, `help` describes the format of the dump. `always` is helpful when you want to investigate a successful FileCheck run, perhaps for an unexpected pass. `-dump-input-on-failure` and `FILECHECK_DUMP_INPUT_ON_FAILURE` remain as a deprecated alias for `-dump-input=fail`. * Diagnostics: The usual diagnostics are not suppressed in this mode and are printed first. For brevity in the example above, I've omitted them using a sed command. Sometimes they're perfectly sufficient, and then they make debugging quicker than if you were forced to hunt through a dump of long input looking for the error. If you think they'll get in the way sometimes, keep in mind that it's pretty easy to grep for the start of the input dump, which is `<<<`. * Colored Annotations: The annotated input is colored if colors are enabled (enabling colors can be forced using -color). For example, errors are red. However, as in the above example, colors are not vital to reading the annotations. I don't know how to test color in the output, so any hints here would be appreciated. Reviewed By: george.karpenkov, zturner, probinson Differential Revision: https://reviews.llvm.org/D52999 llvm-svn: 349418	2018-12-18 00:01:39 +00:00
Craig Topper	6a6f6109c4	[X86] Add baseline tests for D55780 This adds tests for (add (umax X, C), -C) as part of fixing PR40053 llvm-svn: 349416	2018-12-17 23:20:14 +00:00
Peter Collingbourne	d3a3e4b46d	hwasan: Move ctor into a comdat. Differential Revision: https://reviews.llvm.org/D55733 llvm-svn: 349413	2018-12-17 22:56:34 +00:00
Simon Pilgrim	7e2975a44c	[X86][SSE] Improve immediate vector shift known bits handling. Convert VSRAI to VSRLI is the sign bit is known zero and improve KnownBits output for all shift instruction. Fixes the poor codegen comments in D55768. llvm-svn: 349407	2018-12-17 22:09:47 +00:00
Wouter van Oortmerssen	d3c544aa6e	[WebAssembly] Fix assembler parsing of br_table. Summary: We use `variable_ops` in the tablegen defs to denote the list of branch targets in `br_table`, but unlike other uses of `variable_ops` (e.g. call) the these branch targets need to actually be encoded in the instruction. The existing tables for `variable_ops` cause not operands to be accepted by the assembly matcher. Following the example of ARM: `2cc0a7da87/lib/Target/ARM/ARMInstrInfo.td (L550-L555)` we introduce a new operand type to capture this list, and we use the same {} syntax as ARM as well to differentiate them from regular integer operands. Also removed definition and use of TSFlags in tablegen defs, since `br_table` now has a non-variable_ops immediate operand, so the previous logic of only the variable_ops arguments being labels didn't make sense anymore. Reviewers: dschuff, aheejin, sunfish Subscribers: javed.absar, sbc100, jgravelle-google, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D55401 llvm-svn: 349405	2018-12-17 22:04:44 +00:00
Craig Topper	8c9d772991	[X86] Add T1MSKC and TZMSK to isDefConvertible used by optimizeCompareInstr. These seem to have been missed when the other TBM instructions were added. llvm-svn: 349404	2018-12-17 21:50:06 +00:00
Reid Kleckner	94ee0728e5	[codeview] Flush labels before S_DEFRANGE* fragments This was a pre-existing bug that could be triggered with assembly like this: .p2align 2 .LtmpN: .cv_def_range "..." I noticed this when attempting to change clang to emit aligned symbol records. llvm-svn: 349403	2018-12-17 21:49:35 +00:00
Sanjay Patel	200885e654	[AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924) Now, that we have funnel shift intrinsics, it should be safe to convert this form of rotate to it. In the worst case (a target that doesn't have rotate instructions), we will expand this into a branch-less sequence of ALU ops (neg/and/and/lshr/shl/or) in the backend, so it's still very likely to be a perf improvement over the original code. The motivating source code pattern for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=34924 Background: I looked at several different options before deciding where to try this - instcombine, simplifycfg, CGP - because it doesn't fit cleanly anywhere AFAIK. The backend (CGP, SDAG, GlobalIsel?) is too late for what we're trying to accomplish. We want to have the IR converted before we reach things like vectorization because the reduced code can make a loop much simpler to transform. Technically, this could be included in instcombine, but it's a large pattern match that includes control-flow, so it just felt wrong to stuff into there (although I have a draft of that patch). Similarly, this could be part of simplifycfg, but all of this pattern matching is a stretch. So we're left with our relatively new dumping ground for homeless transforms: aggressive-instcombine. This only runs at -O3, but that seems like a reasonable limitation given that source code has many options to avoid this pattern (including the recently added clang intrinsics for rotates). I'm including a PhaseOrdering test because we require the teamwork of 3 passes (aggressive-instcombine, instcombine, simplifycfg) to get this into the minimal IR form that we want. That test shows a bug with the new pass manager that's independent of this change (but it will be masked if we canonicalize harder to funnel shift intrinsics in instcombine). Differential Revision: https://reviews.llvm.org/D55604 llvm-svn: 349396	2018-12-17 21:14:51 +00:00
David Blaikie	a9bcf5b334	DebugInfo: Update gold plugin tests due to CU attribute reordering in r349207 llvm-svn: 349395	2018-12-17 21:10:28 +00:00
Sanjay Patel	1a6e9ec434	[InstCombine] don't widen an arbitrary sequence of vector ops (PR40032) The problem is shown specifically for a case with vector multiply here: https://bugs.llvm.org/show_bug.cgi?id=40032 ...and this might mask the original backend bug for ARM shown in: https://bugs.llvm.org/show_bug.cgi?id=39967 As the test diffs here show, we were (and probably still aren't) doing these kinds of transforms in a principled way. We are producing more or equal wide instructions than we started with in some cases, so we still need to restrict/correct other transforms from overstepping. If there are perf regressions from this change, we can either carve out exceptions to the general IR rules, or improve the backend to do these transforms when we know the transform is profitable. That's probably similar to a change like D55448. Differential Revision: https://reviews.llvm.org/D55744 llvm-svn: 349389	2018-12-17 20:27:43 +00:00
Craig Topper	728cbc0378	Convert (CMP (srl/shl X, C), 0) to (CMP (and X, C'), 0) when only the zero flag is used. This allows a TEST to be used and can be combined with any AND that may already exist as an input to the shift. This was already done in EmitTest, but was easily tricked by multiple uses because the setcc might be used by multiple instructions. Once the SETCC and users are legalized then we can look for the shift to be used by a single CMP, but the CMP itself can have multiple users. This appears to fix the case in PR39968. llvm-svn: 349385	2018-12-17 20:02:16 +00:00
JF Bastien	1d460eb52e	AsmParser: test .double NaN and .double inf Summary: It looks like this support was added to match GNU AS, but only tests .float and not .double. I asked RedHat folks to confirm that 0x7fffffffffffffff was indeed the right value for NaN. Same for infinity, but it only has positive / negative encodings. Reviewers: scanon, rjmccall Subscribers: jkorous, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D55531 llvm-svn: 349376	2018-12-17 18:54:22 +00:00
Simon Pilgrim	9274f17a5e	[TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000) This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen. I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well. Differential Revision: https://reviews.llvm.org/D55768 llvm-svn: 349374	2018-12-17 18:43:43 +00:00
Nikita Popov	221f3fc750	[InstSimplify] Simplify saturating add/sub + icmp If a saturating add/sub has one constant operand, then we can determine the possible range of outputs it can produce, and simplify an icmp comparison based on that. The implementation is based on a similar existing mechanism for simplifying binary operator + icmps. Differential Revision: https://reviews.llvm.org/D55735 llvm-svn: 349369	2018-12-17 17:45:18 +00:00
Tim Northover	256a16d031	FastIsel: take care to update iterators when removing instructions. We keep a few iterators into the basic block we're selecting while performing FastISel. Usually this is fine, but occasionally code wants to remove already-emitted instructions. When this happens we have to be careful to update those iterators so they're not pointint at dangling memory. llvm-svn: 349365	2018-12-17 17:25:53 +00:00
Tim Northover	ae3b66b7b0	ARM: use acquire/release instruction variants when available. These features (fairly) recently got split out into their own feature, so we should make CodeGen use them when available. The main change here is that the check used to be based on the triple, but now it's based on CPU features. llvm-svn: 349355	2018-12-17 15:05:32 +00:00
Andrea Di Biagio	4c73711069	[MCA] Add support for BeginGroup/EndGroup. llvm-svn: 349354	2018-12-17 14:27:33 +00:00
Eric Liu	6c933a2bed	Revert "DebugInfo: Assume an absence of ranges or high_pc on a CU means the CU is empty (devoid of code addresses)" This reverts commit r349333. It caused internal test to fail. I have sent more information to the author. llvm-svn: 349353	2018-12-17 14:14:40 +00:00
Andrea Di Biagio	4506067593	[MCA] Don't assume that createMCInstrAnalysis() always returns a valid pointer. Class InstrBuilder wrongly assumed that llvm targets were always able to return a non-null pointer when createMCInstrAnalysis() was called on them. This was causing crashes when simulating executions for targets that don't provide an MCInstrAnalysis object. This patch fixes the issue by making MCInstrAnalysis optional. llvm-svn: 349352	2018-12-17 14:00:37 +00:00
Simon Pilgrim	193429ea15	Regenerate test in prep for SimplifyDemandedBits improvements. llvm-svn: 349350	2018-12-17 12:48:34 +00:00
Sanjay Patel	beb7bb6192	[AggressiveInstCombine] add test for rotate insertion point; NFC As noted in D55604 - we need a test to make sure that the new intrinsic is inserted into a valid position. llvm-svn: 349347	2018-12-17 12:36:35 +00:00
Petar Avramovic	b8276f2280	[MIPS GlobalISel] Lower G_UADDE and narrowScalar G_ADD Lower G_UADDE and legalize G_ADD using narrowScalar on MIPS32. Differential Revision: https://reviews.llvm.org/D54580 llvm-svn: 349346	2018-12-17 12:31:07 +00:00
Alexandros Lamprineas	490ae11717	[AArch64] Re-run load/store optimizer after aggressive tail duplication The Load/Store Optimizer runs before Machine Block Placement. At O3 the Tail Duplication Threshold is set to 4 instructions and this can create new opportunities for the Load/Store Optimizer. It seems worthwhile to run it once again. llvm-svn: 349338	2018-12-17 10:45:43 +00:00
David Blaikie	884deed1b3	DebugInfo: Assume an absence of ranges or high_pc on a CU means the CU is empty (devoid of code addresses) GCC emitted these unconditionally on/before 4.4/March 2012 Clang emitted these unconditionally on/before 3.5/March 2014 This improves performance when parsing CUs (especially those using split DWARF) that contain no code ranges (such as the mini CUs that may be created by ThinLTO importing - though generally they should be/are avoided, especially for Split DWARF because it produces a lot of very small CUs, which don't scale well in a bunch of other ways too (including size)). llvm-svn: 349333	2018-12-17 08:27:19 +00:00
Craig Topper	792d4f130d	[X86] Add test case for PR39968. NFC llvm-svn: 349331	2018-12-17 07:51:17 +00:00
Kewen Lin	c68ce89ae1	[Power9][NFC]update vabsd case for better dumping Appended options -ppc-vsr-nums-as-vr and -ppc-asm-full-reg-names to get the more descriptive output. Also removed useless function attributes. llvm-svn: 349329	2018-12-17 06:32:02 +00:00
Kewen Lin	3ee103085e	[Power9][NFC]Make pre-inc-disable case more robust With some patch adopted for Power9 vabsd* insns, some CHECKs can't get the expected results. But it's false alarm, we should update the case more robust. llvm-svn: 349325	2018-12-17 03:16:12 +00:00
Davide Italiano	e41e1d015f	[EarlyCSE] If DI can't be salvaged, mark it as unavailable. Fixes PR39874. llvm-svn: 349323	2018-12-17 01:42:39 +00:00
Nikita Popov	d03c27f837	[InstCombine] Add cttz/ctlz + select non-bitwidth tests; NFC llvm-svn: 349322	2018-12-16 23:48:18 +00:00
Nikita Popov	b4133791c8	[InstCombine] Regenerate test checks; NFC Also drop unnecessary entry blocks and avoid use of anonymous variables. llvm-svn: 349321	2018-12-16 23:48:11 +00:00
Nikita Popov	8be3314c3f	[InstCombine] Make cttz/ctlz knownbits tests more robust; NFC Tests checking for the addition of !range metadata should be preserved if cttz/ctlz + icmp is optimized. llvm-svn: 349318	2018-12-16 19:12:08 +00:00
Simon Pilgrim	0dea14f2f2	Regenerate test (merges X86+X64 cases). NFCI. llvm-svn: 349317	2018-12-16 19:07:57 +00:00
Craig Topper	10f8892837	[X86] Remove truncation handling from EmitTest. Replace it with a DAG combine. I'd like to try to move a lot of the flag matching out of EmitTest and push it to isel or isel preprocessing. This is a step towards that. The test-shrink-bug.ll changie is an improvement because we are no longer interfering with test shrink handling in isel. The pr34137.ll change is a regression, but the IR came from -O0 and was not reduced by InstCombine. So it contains a lot of redundancies like duplicate loads that made it combine poorly. llvm-svn: 349315	2018-12-16 18:35:55 +00:00
Craig Topper	b0b9c54578	[X86] Autogenerate complete checks. NFC llvm-svn: 349314	2018-12-16 18:35:54 +00:00
Nikita Popov	3b5d224bde	Revert "[InstCombine] Regenerate test checks; NFC" This reverts commit r349311. Didn't check this carefully enough... llvm-svn: 349312	2018-12-16 18:27:37 +00:00
Nikita Popov	db2ef76ab7	[InstCombine] Regenerate test checks; NFC llvm-svn: 349311	2018-12-16 18:22:57 +00:00
Nikita Popov	bb69aaa681	[InstCombined] Add more tests for cttz/ctlz + icmp; NFC Test cases other than icmp with the bitwidth. llvm-svn: 349310	2018-12-16 17:51:32 +00:00
Nikita Popov	32083bd072	[InstCombine] Add additional saturating add/sub + icmp tests; NFC These test comparisons with saturating add/sub in non-canonical form. llvm-svn: 349309	2018-12-16 17:45:25 +00:00
Sanjay Patel	e49db58616	[InstCombine] regenerate test checks; NFC llvm-svn: 349307	2018-12-16 16:14:42 +00:00
Sanjay Patel	db84980851	[InstCombine] add tests for vector widening transforms (PR40032); NFC llvm-svn: 349306	2018-12-16 15:50:50 +00:00
Sanjay Patel	13ac2f15b0	[x86] increment/decrement constant vector with min/max in vsetcc lowering (PR39859) This is part of fixing PR39859: https://bugs.llvm.org/show_bug.cgi?id=39859 We have a crippled vector ISA, so we have to invert a typical fold and create min/max here. As discussed in the bug report, we can probably do better by using saturating subtract when it's available, but we should have this improvement for the min/max patterns regardless. Alive proofs: https://rise4fun.com/Alive/zsf https://rise4fun.com/Alive/Qrl Differential Revision: https://reviews.llvm.org/D55515 llvm-svn: 349304	2018-12-16 15:05:48 +00:00
Sanjay Patel	f24900b934	[DAGCombiner] allow hoisting vector bitwise logic ahead of truncates The transform performs a bitwise logic op in a wider type followed by truncate when both inputs are truncated from the same source type: logic_op (truncate x), (truncate y) --> truncate (logic_op x, y) There are a bunch of other checks that should prevent doing this when it might be harmful. We already do this transform for scalars in this spot. The vector limitation was shared with a check for the case when the operands are extended. I'm not sure if that limit is needed either, but that would be a separate patch. Differential Revision: https://reviews.llvm.org/D55448 llvm-svn: 349303	2018-12-16 14:57:04 +00:00
Simon Pilgrim	0ef977b83d	[SelectionDAG] Add FSHL/FSHR support to computeKnownBits Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type). llvm-svn: 349298	2018-12-16 13:33:37 +00:00
Simon Pilgrim	780b3ca775	[X86] Add computeKnownBits tests for funnel shift intrinsics llvm-svn: 349297	2018-12-16 12:15:31 +00:00
Craig Topper	392edb6223	[X86] Autogenerate complete checks. NFC llvm-svn: 349287	2018-12-15 22:52:57 +00:00
Simon Pilgrim	ef7b5949e5	[X86] Lower to SHLD/SHRD on slow machines for optsize Use consistent rules for when to lower to SHLD/SHRD for slow machines - fixes a weird issue where funnel shift gets expanded but then X86ISelLowering's combineOr sees the optsize and combines to SHLD/SHRD, but now with the modulo amount guard...... llvm-svn: 349285	2018-12-15 19:43:44 +00:00
Simon Pilgrim	53c8b1b6f7	[X86] Add optsize SHLD/SHRD tests llvm-svn: 349284	2018-12-15 19:32:26 +00:00
Dinar Temirbulatov	8c8724dd0d	[CodeGen] Enhance machine PHIs optimization Summary: Make machine PHIs optimization to work for single value register taken from several different copies. This is the first step to fix PR38917. This change allows to get rid of redundant PHIs (see opt_phis2.mir test) to make the subsequent optimizations (like CSE) possible and simpler. For instance, before this patch the code like this: %b = COPY %z ... %a = PHI %bb1, %a; %bb2, %b could be optimized to: %a = %b but the code like this: %c = COPY %z ... %b = COPY %z ... %a = PHI %bb1, %a; %bb2, %b; %bb3, %c would remain unchanged. With this patch the latter case will be optimized: %a = %z```. Committed on behalf of: Anton Afanasyev anton.a.afanasyev@gmail.com Reviewers: RKSimon, MatzeB Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54839 llvm-svn: 349271	2018-12-15 14:37:01 +00:00
Simon Pilgrim	bfbe510d4f	Regenerate neon copy tests. NFCI. llvm-svn: 349270	2018-12-15 14:23:18 +00:00
Simon Pilgrim	1e1fd9c761	[TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts Differential Revision: https://reviews.llvm.org/D55600 llvm-svn: 349264	2018-12-15 11:36:36 +00:00
Nikita Popov	f61567f42a	[InstSimplify] Add tests for saturating add/sub + icmp; NFC If a saturating add/sub with a constant operand is compared to another constant, we should be able to determine that the condition is always true/false in some cases (but currently don't). llvm-svn: 349261	2018-12-15 10:37:01 +00:00
Fangrui Song	da53d8739a	[mips] Fix test typo in rL348914 RUN; -> RUN: llvm-svn: 349258	2018-12-15 08:44:47 +00:00
Kewen Lin	3ac031bb8f	[Power9][NFC] add setb exploitation test case Add an original test case for setb before the exploitation actually takes effect, later we can check the difference. Differential Revision: https://reviews.llvm.org/D55696 llvm-svn: 349251	2018-12-15 04:39:37 +00:00
Heejin Ahn	feef720bb8	[WebAssembly] Check if the section order is correct Summary: This patch checks if the section order is correct when reading a wasm object file in `WasmObjectFile` and converting YAML to wasm object in yaml2wasm. (It is not possible to check when reading YAML because it is handled exclusively by the YAML reader.) This checks the ordering of all known sections (core sections + known custom sections). This also adds section ID DataCount section that will be scheduled to be added in near future. Reviewers: sbc100 Subscribers: dschuff, mgorny, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54924 llvm-svn: 349221	2018-12-15 00:58:12 +00:00
Florian Hahn	c214bc2b8d	[NewGVN] Update use counts for SSA copies when replacing them by their operands. The current code relies on LeaderUseCount to determine if we can remove an SSA copy, but in that the LeaderUseCount does not refer to the SSA copy. If a SSA copy is a dominating leader, we use the operand as dominating leader instead. This means we removed a user of a ssa copy and we should decrement its use count, so we can remove the ssa copy once it becomes dead. Fixes PR38804. Reviewers: efriedma, davide Reviewed By: davide Differential Revision: https://reviews.llvm.org/D51595 llvm-svn: 349217	2018-12-15 00:32:38 +00:00
Vedant Kumar	9d1827331f	[Util] Refer to [s\|z]exts of args when converting dbg.declares (fix PR35400) When converting dbg.declares, if the described value is a [s\|z]ext, refer to the ext directly instead of referring to its operand. This fixes a narrowing bug (the debugger got the sign of a variable wrong, see llvm.org/PR35400). The main reason to refer to the ext's operand was that an optimization may remove the ext itself, leading to a dropped variable. Now that InstCombine has been taught to use replaceAllDbgUsesWith (r336451), this is less of a concern. Other passes can/should adopt this API as needed to fix dropped variable bugs. Differential Revision: https://reviews.llvm.org/D51813 llvm-svn: 349214	2018-12-15 00:03:33 +00:00
Artem Belevich	6d74bd638a	[NVPTX] Lower instructions that expand into libcalls. The change is an effort to split and refactor abandoned D34708 into smaller parts. Here the behaviour of unsupported instructions is changed to match the behaviour of explicit intrinsics calls. Currently LLVM crashes with: > Assertion getInstruction() && "Not a call or invoke instruction!" failed. With this patch LLVM produces a more sensible error message: > Cannot select: ... i32 = ExternalSymbol'__foobar' Author: Denys Zariaiev <denys.zariaiev@gmail.com> Differential Revision: https://reviews.llvm.org/D55145 llvm-svn: 349213	2018-12-14 23:53:06 +00:00
David Blaikie	560ff35592	DebugInfo: Avoid using split DWARF when the split unit would be empty. In ThinLTO many split CUs may be effectively empty because of the lack of support for cross-unit references in split DWARF. Using a split unit in those cases is just a waste/overhead - and turned out to be one contributor to a significant symbolizer performance issue when global variable debug info was being imported (see r348416 for the primary fix) due to symbolizers seeing CUs with no ranges, assuming there might still be addresses covered and walking into the split CU to see if there are any ranges (when that split CU was in a DWP file, that meant loading the DWP and its index, the index was extra large because of all these fractured/empty CUs... and so was very expensive to load). (the 3rd fix which will follow, is to assume that a CU with no ranges is empty rather than merely missing its CU level range data - and to not walk into its DIEs (split or otherwise) in search of address information that is generally not present) llvm-svn: 349207	2018-12-14 22:44:46 +00:00
Krzysztof Parzyszek	26d994f56e	[Hexagon] Add patterns for shifts of v2i16 This fixes https://llvm.org/PR39983. llvm-svn: 349202	2018-12-14 22:33:48 +00:00
Volkan Keles	574d737e06	[GlobalISel] LegalizerHelper: Implement fewerElementsVector for G_LOAD/G_STORE Reviewers: aemerson, dsanders, bogner, paquette, aditya_nandakumar Reviewed By: dsanders Subscribers: rovka, kristof.beyls, javed.absar, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53728 llvm-svn: 349200	2018-12-14 22:11:20 +00:00
Farhana Aleen	ce095c564a	[AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. Summary: Promote constant offset to immediate by recomputing the relative 13bit offset from nearby instructions. E.g. s_movk_i32 s0, 0x1800 v_add_co_u32_e32 v0, vcc, s0, v2 v_addc_co_u32_e32 v1, vcc, 0, v6, vcc s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[0:1], off => s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[5:6], off offset:2048 Author: FarhanaAleen Reviewed By: arsenm, rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D55539 llvm-svn: 349196	2018-12-14 21:13:14 +00:00
Michael Kruse	ea9ef34558	[TransformWarning] Do not warn missed transformations in optnone functions. Optimization transformations are intentionally disabled by the 'optnone' function attribute. Therefore do not warn if transformation metadata is still present. Using the legacy pass manager structure, the `skipFunction` method takes care for the optnone attribute (already called before this patch). For the new pass manager, there is no equivalent, so we check for the 'optnone' attribute manually. Differential Revision: https://reviews.llvm.org/D55690 llvm-svn: 349184	2018-12-14 19:45:43 +00:00
Sanjay Patel	7b776863ac	[x86] add tests for extractelement of FP binops; NFC llvm-svn: 349179	2018-12-14 19:15:54 +00:00
Sanjay Patel	b7e2d6e493	[ARM] make test immune to scalarization improvements; NFC llvm-svn: 349177	2018-12-14 18:47:04 +00:00
Sanjay Patel	4f4963b9cb	[x86] make tests immune to scalarization improvements; NFC llvm-svn: 349176	2018-12-14 18:44:16 +00:00
Daniel Sanders	ec29eac5dd	[globalisel][combiner] Fix r349167 for release mode bots This test relies on -debug-only which is unavailable in non-asserts builds. llvm-svn: 349174	2018-12-14 18:25:05 +00:00
Michael Kruse	5948b7f30f	[Transforms] Preserve metadata when converting invoke to call. The `changeToCall` function did not preserve the invoke's metadata. Currently, there is probably no metadata that depends on being applied on a CallInst or InvokeInst. Therefore we can replace the instruction's metadata. This fixes http://llvm.org/PR39994 Suggested-by: Moritz Kreutzer <moritz.kreutzer@siemens.com> Differential Revision: https://reviews.llvm.org/D55666 llvm-svn: 349170	2018-12-14 18:15:11 +00:00
Zachary Turner	8fb9a71dde	[MS Demangler] Fail gracefully on invalid pointer types. Once we detect a 'P', we know we a pointer type is upcoming, so we make some assumptions about the output that follows. If those assumptions didn't hold, we would assert. Instead, we should fail gracefully and propagate the error up. llvm-svn: 349169	2018-12-14 18:10:13 +00:00
Zachary Turner	f47d8be7be	[MS Demangler] Add a regression test for an invalid mangled name. llvm-svn: 349168	2018-12-14 17:59:27 +00:00
Daniel Sanders	629db5d8e5	[globalisel][combiner] Make the CombinerChangeObserver a MachineFunction::Delegate Summary: This allows us to register it with the MachineFunction delegate and be notified automatically about erasure and creation of instructions. However, we still need explicit notification for modifications such as those caused by setReg() or replaceRegWith(). There is a catch with this though. The notification for creation is delivered before any operands can be added. While appropriate for scheduling combiner work. This is unfortunate for debug output since an opcode by itself doesn't provide sufficient information on what happened. As a result, the work list remembers the instructions (when debug output is requested) and emits a more complete dump later. Another nit is that the MachineFunction::Delegate provides const pointers which is inconvenient since we want to use it to schedule future modification. To resolve this GISelWorkList now has an optional pointer to the MachineFunction which describes the scope of the work it is permitted to schedule. If a given MachineInstr* is in this function then it is permitted to schedule work to be performed on the MachineInstr's. An alternative to this would be to remove the const from the MachineFunction::Delegate interface, however delegates are not permitted to modify the MachineInstr's they receive. In addition to this, the observer has three interface changes. * erasedInstr() is now erasingInstr() to indicate it is about to be erased but still exists at the moment. * changingInstr() and changedInstr() have been added to report changes before and after they are made. This allows us to trace the changes in the debug output. * As a convenience changingAllUsesOfReg() and finishedChangingAllUsesOfReg() will report changingInstr() and changedInstr() for each use of a given register. This is primarily useful for changes caused by MachineRegisterInfo::replaceRegWith() With this in place, both combine rules have been updated to report their changes to the observer. Finally, make some cosmetic changes to the debug output and make Combiner and CombinerHelp Reviewers: aditya_nandakumar, bogner, volkan, rtereshin, javed.absar Reviewed By: aditya_nandakumar Subscribers: mgorny, rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D52947 llvm-svn: 349167	2018-12-14 17:50:14 +00:00
Sanjay Patel	95f90ef3b3	[AArch64] make test immune to scalarization improvements; NFC This is explicitly implementing what the comment says rather than relying on the implicit zext of a costant operand. llvm-svn: 349166	2018-12-14 17:44:07 +00:00
Sanjay Patel	a44dc32708	[SystemZ] make test immune to scalarization improvements; NFC The undef operands mean this test is probably still too fragile to accomplish what the comments suggest. llvm-svn: 349164	2018-12-14 17:28:52 +00:00
Sanjay Patel	25fc03c5c0	[Hexagon] make test immune to scalarization improvements; NFC llvm-svn: 349163	2018-12-14 17:23:01 +00:00
Sanjay Patel	5a97a105f8	[x86] auto-generate complete checks; NFC llvm-svn: 349162	2018-12-14 16:49:57 +00:00
Sanjay Patel	41e8112ed6	[x86] regenerate test checks; NFC llvm-svn: 349161	2018-12-14 16:46:21 +00:00
Sanjay Patel	b7d9f9117e	[x86] make tests immune to scalarization improvements; NFC llvm-svn: 349160	2018-12-14 16:44:58 +00:00
Scott Linder	de6beb02a5	Implement -frecord-command-line (-frecord-gcc-switches) Implement options in clang to enable recording the driver command-line in an ELF section. Implement a new special named metadata, llvm.commandline, to support frontends embedding their command-line options in IR/ASM/ELF. This differs from the GCC implementation in some key ways: * In GCC there is only one command-line possible per compilation-unit, in LLVM it mirrors llvm.ident and multiple are allowed. * In GCC individual options are separated by NULL bytes, in LLVM entire command-lines are separated by NULL bytes. The advantage of the GCC approach is to clearly delineate options in the face of embedded spaces. The advantage of the LLVM approach is to support merging multiple command-lines unambiguously, while handling embedded spaces with escaping. Differential Revision: https://reviews.llvm.org/D54487 Clang Differential Revision: https://reviews.llvm.org/D54489 llvm-svn: 349155	2018-12-14 15:38:15 +00:00
John Brawn	1d0d86ae40	[RegAllocGreedy] IMPLICIT_DEF values shouldn't prefer registers It costs nothing to spill an IMPLICIT_DEF value (the only spill code that's generated is a KILL of the value), so when creating split constraints if the live-out value is IMPLICIT_DEF the exit constraint should be DontCare instead of PrefReg. Differential Revision: https://reviews.llvm.org/D55652 llvm-svn: 349151	2018-12-14 14:07:57 +00:00
Diana Picus	02c8343c75	[ARM GlobalISel] Thumb2: casts between int and ptr Mark as legal and add tests. Nothing special to do. llvm-svn: 349147	2018-12-14 13:45:38 +00:00
Diana Picus	acca60b49e	[ARM GlobalISel] Remove duplicate test. NFCI Fixup for r349026. I forgot to delete these test functions from the original file when I moved them to arm-legalize-exts.mir. llvm-svn: 349146	2018-12-14 13:28:34 +00:00
Diana Picus	14dc3b2959	[ARM GlobalISel] Allow simple binary ops in Thumb2 Mark G_ADD, G_SUB, G_MUL, G_AND, G_OR and G_XOR as legal for both ARM and Thumb2. Extract the legalizer tests for these opcodes into another file. Add tests for the instruction selector. llvm-svn: 349142	2018-12-14 11:58:14 +00:00
Fangrui Song	54a18bb0e3	[ThinLTO] Fix test added in rL349076 llvm-svn: 349135	2018-12-14 08:21:08 +00:00
Petr Hosek	493a082483	[llvm-xray] Support for PIE When the instrumented binary is linked as PIE, we need to apply the relative relocations to sleds. This is handled by the dynamic linker at runtime, but when processing the file we have to do it ourselves. Differential Revision: https://reviews.llvm.org/D55542 llvm-svn: 349120	2018-12-14 01:37:56 +00:00
Alex Lorenz	afa75d7843	[macho] save the SDK version stored in module metadata into the version min and build version load commands in the object file This commit introduces a new metadata node called "SDK Version". It will be set by the frontend to mark the platform SDK (macOS/iOS/etc) version which was used during that particular compilation. This node is used when machine code is emitted, by either saving the SDK version into the appropriate macho load command (version min/build version), or by emitting the assembly for these load commands with the SDK version specified as well. The assembly for both load commands is extended by allowing it to contain the sdk_version X, Y [, Z] trailing directive to represent the SDK version respectively. rdar://45774000 Differential Revision: https://reviews.llvm.org/D55612 llvm-svn: 349119	2018-12-14 01:14:10 +00:00
Evgeniy Stepanov	eb238ecf0f	Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)" Breaks sanitizer-android buildbot. This reverts commit af8443a984c3b491c9ca2996b8d126ea31e5ecbe. llvm-svn: 349092	2018-12-13 23:47:50 +00:00
Wei Mi	66c6c5abea	[SampleFDO] handle ProfileSampleAccurate when initializing function entry count ProfileSampleAccurate is used to indicate the profile has exact match to the code to be optimized. Previously ProfileSampleAccurate is handled in ProfileSummaryInfo::isColdCallSite and ProfileSummaryInfo::isColdBlock. A better solution is to initialize function entry count to 0 when ProfileSampleAccurate is true, so we don't have to handle ProfileSampleAccurate in multiple places. Differential Revision: https://reviews.llvm.org/D55660 llvm-svn: 349088	2018-12-13 21:51:42 +00:00
Aakanksha Patil	bc568766b2	Revert r348971: [AMDGPU] Support for "uniform-work-group-size" attribute This patch breaks RADV (and probably RadeonSI as well) llvm-svn: 349084	2018-12-13 21:23:12 +00:00
Matt Arsenault	934e534c47	AMDGPU/GlobalISel: Legalize/regbankselect block_addr llvm-svn: 349081	2018-12-13 20:34:15 +00:00
Nikita Popov	dc73a6edde	Reapply "[MemCpyOpt] memset->memcpy forwarding with undef tail" Currently memcpyopt optimizes cases like memset(a, byte, N); memcpy(b, a, M); to memset(a, byte, N); memset(b, byte, M); if M <= N. Often this allows further simplifications down the line, which drop the first memset entirely. This patch extends this optimization for the case where M > N, but we know that the bytes a[N..M] are undef due to alloca/lifetime.start. This situation arises relatively often for Rust code, because Rust does not initialize trailing structure padding and loves to insert redundant memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844. The previous version of this patch did not perform dependency checking properly: While the dependency is checked at the position of the memset, the used size must be that of the memcpy. Previously the size of the memset was used, which missed modification in the region MemSetSize..CopySize, resulting in miscompiles. The added tests cover variations of this issue. Differential Revision: https://reviews.llvm.org/D55120 llvm-svn: 349078	2018-12-13 20:04:27 +00:00
Easwaran Raman	5a7056fa03	[ThinLTO] Compute synthetic function entry count Summary: This patch computes the synthetic function entry count on the whole program callgraph (based on module summary) and writes the entry counts to the summary. After function importing, this count gets attached to the IR as metadata. Since it adds a new field to the summary, this bumps up the version. Reviewers: tejohnson Subscribers: mehdi_amini, inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D43521 llvm-svn: 349076	2018-12-13 19:54:27 +00:00
Jordan Rupprecht	4888c4aba5	[llvm-size][libobject] Add explicit "inTextSegment" methods similar to "isText" section methods to calculate size correctly. Summary: llvm-size uses "isText()" etc. which seem to indicate whether the section contains code-like things, not whether or not it will actually go in the text segment when in a fully linked executable. The unit test added (elf-sizes.test) shows some types of sections that cause discrepencies versus the GNU size tool. llvm-size is not correctly reporting sizes of things mapping to text/data segments, at least for ELF files. This fixes pr38723. Reviewers: echristo, Bigcheese, MaskRay Reviewed By: MaskRay Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54369 llvm-svn: 349074	2018-12-13 19:40:12 +00:00

... 9 10 11 12 13 ...

58975 Commits