llvm-project

Commit Graph

Author	SHA1	Message	Date
Evandro Menezes	54252b8243	[AArch64] Improve jump tables testing (NFC) Improve testing of the minimum and maximum sizes of jump tables. llvm-svn: 363837	2019-06-19 16:35:30 +00:00
James Henderson	e20326ed33	[test][llvm-dwarfdump] Remove pointless CHECK-NOT lines The original line was there from when this test was added, but it is checking for a switch that doesn't exist, so really has no purpose, at least any more. llvm-svn: 363833	2019-06-19 16:31:59 +00:00
Hubert Tong	1f6ddfb6a3	[NFC][llvm-objcopy] Fix overly restrictive od output check The check against the output of `od` in the affected tests expect a specific input offset format. They also expect a specific offset value, not consistent with the EXAMPLE section for `od` in POSIX.1-2017 Chapter 4, while using the `-j` option. In particular, the example shows that the input offset begins at 0 following the bytes skipped. This patch adjusts the matching of the input offset to be more generic. In order to avoid false matches, it restricts the number of bytes to be formatted. llvm-svn: 363829	2019-06-19 16:04:24 +00:00
Hubert Tong	e9983eed5a	[NFC][LSR] Avoid undefined grep in pr2570.ll greater-than-sign is not a BRE special character. POSIX.1-2017 XBD Section 9.3.2 indicates that the interpretation of `\>` is undefined. This patch replaces the pattern. llvm-svn: 363828	2019-06-19 16:02:54 +00:00
Cameron McInally	7aa898e61e	[DFSan] Add UnaryOperator visitor to DataFlowSanitizer Differential Revision: https://reviews.llvm.org/D62815 llvm-svn: 363814	2019-06-19 15:11:41 +00:00
Cameron McInally	a027cf4764	[Reassociate] Handle unary FNeg in the Reassociate pass Differential Revision: https://reviews.llvm.org/D63445 llvm-svn: 363813	2019-06-19 14:59:14 +00:00
Bjorn Pettersson	16ff5fea87	[ConstantFolding] Add constant folding for smul.fix and smul.fix.sat Summary: This patch teaches ConstantFolding to constant fold both scalar and vector variants of llvm.smul.fix and llvm.smul.fix.sat. As described in the LangRef rounding is unspecified for these instrinsics. If the result cannot be represented exactly the default behavior in ConstantFolding is to round down towards negative infinity. If a target has a preferred rounding that is different some kind of target hook would be needed (same strategy as used by the SelectionDAG legalizer). Reviewers: nikic, leonardchan, RKSimon Reviewed By: leonardchan Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63385 llvm-svn: 363811	2019-06-19 14:28:03 +00:00
Ulrich Weigand	3641b10f3d	[SystemZ] Support vector load/store alignment hints Vector load/store instructions support an optional alignment field that the compiler can use to provide known alignment info to the hardware. If the field is used (and the information is correct), the hardware may be able (on some models) to perform faster memory accesses than otherwise. This patch adds support for alignment hints in the assembler and disassembler, and fills in known alignment during codegen. llvm-svn: 363806	2019-06-19 14:20:00 +00:00
Simon Pilgrim	c3994f77cb	[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero. Matches what we already do for SIGN_EXTEND. llvm-svn: 363802	2019-06-19 13:58:02 +00:00
Fangrui Song	102b1efd53	[llvm-dwarfdump] --gdb-index: fix uninitialized TuListOffset The test only checks the existence of the `Types CU list` line. Unfortunately I can't make a better test because {gcc,clang} -fuse-ld={lld,gold} --gdb-index do not give me a non-empty types CU list. Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D63537 llvm-svn: 363800	2019-06-19 13:51:29 +00:00
Simon Pilgrim	128ce93c60	Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics There may or may not be additional work to handle this correctly on SI/CI. ........ Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/ llvm-svn: 363797	2019-06-19 13:00:54 +00:00
David Bolvansky	e3cd19d330	[NFC] Added tests for D63534 llvm-svn: 363796	2019-06-19 12:59:37 +00:00
David Bolvansky	21fd232385	[NFC] Added tests for cttz(abs(x)) -> cttz(x) fold llvm-svn: 363795	2019-06-19 12:55:39 +00:00
Simon Pilgrim	9eed5d2f78	[DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. llvm-svn: 363793	2019-06-19 12:41:37 +00:00
Simon Pilgrim	8c49366c9b	[DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> 0 non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. This requires us to tweak matchBinaryPredicate to allow it to (optionally) handle constants with different type widths. llvm-svn: 363792	2019-06-19 12:25:29 +00:00
Simon Pilgrim	85f70baa23	[X86] Add non-uniform (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) test llvm-svn: 363791	2019-06-19 11:36:01 +00:00
Simon Pilgrim	d954a53633	[DAGCombine] Fix (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) comment. NFCI. We pre-extend, not post. llvm-svn: 363787	2019-06-19 11:17:48 +00:00
Orlando Cazalet-Hyams	1251cac62a	[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. I have set up a separate review D61933 for a fix which is required for this patch. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse Reviewed By: hfinkel, jmorse Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 > llvm-svn: 363046 llvm-svn: 363786	2019-06-19 10:50:47 +00:00
Jay Foad	45d19fb470	[ConstantFolding] Fix assertion failure on non-power-of-two vector load. Summary: The test case does an (out of bounds) load from a global constant with type <3 x float>. InstSimplify tried to turn this into an integer load of the whole alloc size of the vector, which is 128 bits due to alignment padding, and then bitcast this to <3 x vector> which failed an assertion due to the type size mismatch. The fix is to do an integer load of the normal size of the vector, with no alignment padding. Reviewers: tpr, arsenm, majnemer, dstuttard Reviewed By: arsenm Subscribers: hfinkel, wdng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63375 llvm-svn: 363784	2019-06-19 10:28:48 +00:00
Lewis Revill	18737e81eb	[RISCV] Allow parsing immediates that use tilde & exclaim This patch allows immediates (and CSR alias immediates) which start with a tilde token or an exclaim (!) token to be parsed as intended. Differential Revision: https://reviews.llvm.org/D57320 llvm-svn: 363783	2019-06-19 10:27:24 +00:00
Lewis Revill	218aa0edb1	[RISCV] Fix failure to parse parenthesized immediates Since the parser attempts to parse an operand as a register with parentheses before parsing it as an immediate, immediates in parentheses should not be parsed by parseRegister. However in the case where the immediate does not start with an identifier, the LParen is not unlexed and so the RParen causes an unexpected token error. This patch adds the missing UnLex, and modifies the existing UnLex to not use a buffered token, as it should always be unlexing an LParen. Differential Revision: https://reviews.llvm.org/D57319 llvm-svn: 363782	2019-06-19 10:11:13 +00:00
Clement Courbet	f7a6fb9f2c	Fix r363773: Update Barcelona MCA tests. llvm-svn: 363781	2019-06-19 10:00:36 +00:00
George Rimar	b6e20937b3	[yaml2obj/obj2yaml] - Make RawContentSection::Info Optional<> This allows to customize this field for "implicit" sections properly. Differential revision: https://reviews.llvm.org/D63487 llvm-svn: 363777	2019-06-19 08:57:38 +00:00
Roman Lebedev	9f9691c032	[NFC][X86][MCA] Barcelona: add load/store/load-store-throughput tests llvm-svn: 363775	2019-06-19 08:53:34 +00:00
Roman Lebedev	4358016b03	[NFC][X86][MCA] BdVer2: add load-store-throughput test llvm-svn: 363774	2019-06-19 08:53:28 +00:00
Clement Courbet	4ef7c2868a	[X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsr Summary: llvm.x86.sse.stmxcsr only writes to memory. llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE. Reviewers: craig.topper, RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62896 llvm-svn: 363773	2019-06-19 08:44:31 +00:00
Lewis Revill	39263ac5d1	[RISCV] Add lowering of global TLS addresses This patch adds lowering for global TLS addresses for the TLS models of InitialExec, GlobalDynamic, LocalExec and LocalDynamic. LocalExec support required using a 4-operand add instruction, which uses the fourth operand to express a relocation on the symbol. The necessary fixup is emitted when the instruction is emitted. Differential Revision: https://reviews.llvm.org/D55305 llvm-svn: 363771	2019-06-19 08:40:59 +00:00
Alex Bradbury	ec4e0809df	[RISCV] Fix test after r363757 r363757 renamed ExpandISelPseudo to FinalizeISel, so the RUN line in select-optimize-multiple.mir needed updating to refer to finalize-isel. llvm-svn: 363762	2019-06-19 03:18:48 +00:00
Matt Arsenault	9cac4e6d14	Rename ExpandISelPseudo->FinalizeISel, delay register reservation This allows targets to make more decisions about reserved registers after isel. For example, now it should be certain there are calls or stack objects in the frame or not, which could have been introduced by legalization. Patch by Matthias Braun llvm-svn: 363757	2019-06-19 00:25:39 +00:00
Thomas Lively	1885747498	[WebAssembly] Optimize ISel for SIMD Boolean reductions Summary: Converting the result *.{all,any}_true to a bool at the source level generates LLVM IR that compares the result to 0. This check is redundant since these instructions already return either 0 or 1 and therefore conform to the BooleanContents setting for WebAssembly. This CL adds patterns to detect and remove such redundant operations on the result of Boolean reductions. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63529 llvm-svn: 363756	2019-06-19 00:02:13 +00:00
Evandro Menezes	1933cbe866	[test] Change comment wording (NFC) llvm-svn: 363751	2019-06-18 23:31:10 +00:00
Michael Trent	c2885ded2b	Print dylib load kind (weak, reexport, etc) in llvm-objdump -m -dylibs-used Summary: Historically llvm-objdump prints the path to a dylib as well as the dylib's compatibility version and current version number. This change extends this information by adding the kind of dylib load: weak, reexport, etc. rdar://51383512 Reviewers: pete, lhames Reviewed By: pete Subscribers: rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62866 llvm-svn: 363746	2019-06-18 22:20:10 +00:00
Michael Liao	4f7f70e262	Recommit [SROA] Enhance SROA to handle `addrspacecast`ed allocas [SROA] Enhance SROA to handle `addrspacecast`ed allocas - Fix typo in original change - Add additional handling to ensure all return pointers are properly casted. Summary: - After `addrspacecast` is allowed to be eliminated in SROA, the adjusting of storage pointer (from `alloca) needs to handle the potential different address spaces between the storage pointer (from alloca) and the pointer being used. Reviewers: arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63501 llvm-svn: 363743	2019-06-18 21:41:13 +00:00
Matt Arsenault	e8d8bb5170	InstCombine: Pre-commit test for reassociating nuw D39417 llvm-svn: 363741	2019-06-18 21:32:51 +00:00
Huihui Zhang	d16779a732	[ARM] Comply with rules on ARMv8-A thumb mode partial deprecation of IT. Summary: When identifing instructions that can be folded into a MOVCC instruction, checking for a predicate operand is not enough, also need to check for thumb2 function, with restrict-IT, is the machine instruction eligible for ARMv8 IT or not. Notes in ARMv8-A Architecture Reference Manual, section "Partial deprecation of IT" https://usermanual.wiki/Pdf/ARM20Architecture20Reference20ManualARMv8.1667877052.pdf "ARMv8-A deprecates some uses of the T32 IT instruction. All uses of IT that apply to instructions other than a single subsequent 16-bit instruction from a restricted set are deprecated, as are explicit references to the PC within that single 16-bit instruction. This permits the non-deprecated forms of IT and subsequent instructions to be treated as a single 32-bit conditional instruction." Reviewers: efriedma, lebedev.ri, t.p.northover, jmolloy, aemerson, compnerd, stoklund, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63474 llvm-svn: 363739	2019-06-18 20:55:09 +00:00
Sam Elliott	9f155bc6e5	[RISCV] Prevent re-ordering some adds after shifts Summary: DAGCombine will normally turn a `(shl (add x, c1), c2)` into `(add (shl x, c2), c1 << c2)`, where `c1` and `c2` are constants. This can be prevented by a callback in TargetLowering. On RISC-V, materialising the constant `c1 << c2` can be more expensive than materialising `c1`, because materialising the former may take more instructions, and may use a register, where materialising the latter would not. This patch implements the hook in RISCVTargetLowering to prevent this transform, in the cases where: - `c1` fits into the immediate field in an `addi` instruction. - `c1` takes fewer instructions to materialise than `c1 << c2`. In future, DAGCombine could do the check to see whether `c1` fits into an add immediate, which might simplify more targets hooks than just RISC-V. Reviewers: asb, luismarques, efriedma Reviewed By: asb Subscribers: xbolva00, lebedev.ri, craig.topper, lewis-revill, Jim, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62857 llvm-svn: 363736	2019-06-18 20:38:08 +00:00
Sanjay Patel	413ed69b4b	[x86] add another test for load splitting with extracted stores (PR42305); NFC llvm-svn: 363732	2019-06-18 20:13:35 +00:00
Adrian Prantl	fc5107cde6	Add debug location verification for !llvm.loop attachments. This patch teaches the Verifier how to detect broken !llvm.loop attachments as discussed in https://reviews.llvm.org/D60831. This allows LLVM to warn and strip out the broken debug info before attempting an LTO compilation with input generated by LLVM predating https://reviews.llvm.org/rL361149. rdar://problem/51631158 Differential Revision: https://reviews.llvm.org/D63499 [Re-applies r363725 without changes after fixing a broken testcase.] llvm-svn: 363731	2019-06-18 20:09:09 +00:00
Adrian Prantl	1db8d4a866	Fix broken debug info in in an !llvm.loop attachment in this testcase. llvm-svn: 363730	2019-06-18 20:07:53 +00:00
Adrian Prantl	acc93d62e0	Revert Add debug location verification for !llvm.loop attachments. This reverts r363725 (git commit `8ff822d61d`) llvm-svn: 363728	2019-06-18 19:54:17 +00:00
Adrian Prantl	8ff822d61d	Add debug location verification for !llvm.loop attachments. This patch teaches the Verifier how to detect broken !llvm.loop attachments as discussed in https://reviews.llvm.org/D60831. This allows LLVM to warn and strip out the broken debug info before attempting an LTO compilation with input generated by LLVM predating https://reviews.llvm.org/rL361149. rdar://problem/51631158 Differential Revision: https://reviews.llvm.org/D63499 llvm-svn: 363725	2019-06-18 19:42:29 +00:00
Jordan Rupprecht	33e85ad956	Revert [SROA] Enhance SROA to handle `addrspacecast`ed allocas This reverts r363711 (git commit `76a149ef81`) This causes stage2 build failures, e.g.: http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/132/steps/stage%202%20build/logs/stdio http://lab.llvm.org:8011/builders/ppc64le-lld-multistage-test/builds/87/steps/build-stage2-unified-tree/logs/stdio llvm-svn: 363718	2019-06-18 18:40:04 +00:00
Michael Liao	76a149ef81	[SROA] Enhance SROA to handle `addrspacecast`ed allocas Summary: - After `addrspacecast` is allowed to be eliminated in SROA, the adjusting of storage pointer (from `alloca) needs to handle the potential different address spaces between the storage pointer (from alloca) and the pointer being used. Reviewers: arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63501 llvm-svn: 363711	2019-06-18 17:58:49 +00:00
Sanjay Patel	223176f5d7	[x86] add test for load splitting with extracted store (PR42305); NFC llvm-svn: 363704	2019-06-18 17:16:17 +00:00
Simon Tatham	cfc70782d7	[ARM] Add MVE vector shift instructions. This includes saturating and non-saturating shifts, both with immediate shift count and with the shift counts given by another vector register; VSHLC (in which the bits shifted out of each active vector lane are shifted in to the next active lane); and also VMOVL, which is enough like an immediate shift that it didn't fit too badly in this category. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62672 llvm-svn: 363696	2019-06-18 16:19:59 +00:00
Simon Tatham	faaf1a5366	[ARM] Add MVE integer vector min/max instructions. Summary: These form a small family of their own, to go with the floating-point VMINNM/VMAXNM instructions added in a previous commit. They introduce the first of many special cases in the mnemonic recognition code, because VMIN with the E suffix used by the VPT predication system needs to avoid being interpreted as the nonexistent instruction 'VMI' with an ordinary 'NE' condition suffix. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62671 llvm-svn: 363695	2019-06-18 15:51:46 +00:00
Simon Pilgrim	9aa25be149	[TargetLowering] SimplifyDemandedVectorElts - support MUL and ANY_EXTEND_VECTOR_INREG Also fold ANY_EXTEND_VECTOR_INREG -> BITCAST if we only need the bottom element. Fixes temporary regression introduced in rL363693. llvm-svn: 363694	2019-06-18 15:49:35 +00:00
Simon Pilgrim	9c8593934a	[X86][AVX] extract_subvector(any_extend(x)) -> any_extend_vector_inreg(x) Part of fixing the X86 regression noted in D63281 - I've split this into X86 and generic parts - the generic commit will be coming shortly and will fix the vector-reduce-mul-widen.ll regression introduced here. llvm-svn: 363693	2019-06-18 15:30:50 +00:00
Simon Tatham	ed4a602515	[ARM] Rename MVE instructions in Tablegen for consistency. Summary: Their names began with a mishmash of `MVE_`, `t2` and no prefix at all. Now they all start with `MVE_`, which seems like a reasonable choice on the grounds that (a) NEON is the thing they're most at risk of being confused with, and (b) MVE implies Thumb-2, so a prefix indicating MVE is strictly more specific than one indicating Thumb-2. Reviewers: ostannard, SjoerdMeijer, dmgreen Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63492 llvm-svn: 363690	2019-06-18 15:05:42 +00:00
Lewis Revill	74c8364954	[RISCV] Lower calls through PLT This patch adds support for generating calls through the procedure linkage table where required for a given ExternalSymbol or GlobalAddress callee. Differential Revision: https://reviews.llvm.org/D55304 llvm-svn: 363686	2019-06-18 14:29:45 +00:00
Fangrui Song	677423997d	[llvm-readobj] Allow --hex-dump/--string-dump to dump multiple sections 1) `-x foo` currently dumps one `foo`. This change makes it dump all `foo`. 2) `-x foo -x foo` currently dumps `foo` twice. This change makes it dump `foo` once. In addition, if foo has section index 9, `-x foo -x 9` dumps `foo` once. 3) Give a warning instead of an error if `foo` does not exist. The new behaviors match GNU readelf. Also, print a new line as a separator between two section dumps. GNU readelf uses two lines, but one seems good enough. Reviewed By: grimar, jhenderson Differential Revision: https://reviews.llvm.org/D63475 llvm-svn: 363683	2019-06-18 14:01:03 +00:00
Matt Arsenault	8d35dcd703	AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics There may or may not be additional work to handle this correctly on SI/CI. llvm-svn: 363678	2019-06-18 13:19:57 +00:00
Simon Pilgrim	83bacd8d72	[SelectionDAG] Legalize vaargs that require vector splitting This adds vector splitting for vaarg instructions during type legalization Committed on behalf of @luke (Luke Lau) Differential Revision: https://reviews.llvm.org/D60762 llvm-svn: 363671	2019-06-18 12:24:02 +00:00
Matt Arsenault	bcb5ea0042	AMDGPU: Fold readlane from copy of SGPR or imm These may be inserted to assert uniformity somewhere. llvm-svn: 363670	2019-06-18 12:23:46 +00:00
Matt Arsenault	23f03f5059	AMDGPU: Fix iterator crash in AMDGPUPromoteAlloca The lifetime intrinsic was erased, which was the next iterator. llvm-svn: 363668	2019-06-18 12:23:44 +00:00
Matt Arsenault	d5ce8ec778	AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.scale llvm-svn: 363667	2019-06-18 12:23:42 +00:00
Jonas Paulsson	5c64a8c4c6	[SystemZ] Fix AHIMuxK pseudo expansion. Do not emit a copy if the source and destination registers are the same. Review: Ulrich Weigand llvm-svn: 363665	2019-06-18 12:10:02 +00:00
Graham Hunter	43854e3ccc	[SVE][IR] Scalable Vector IR Type with pr42210 fix Recommit of D32530 with a few small changes: - Stopped recursively walking through aggregates in the verifier, so that we don't impose too much overhead on large modules under LTO (see PR42210). - Changed tests to match; the errors are slightly different since they only report the array or struct that actually contains a scalable vector, rather than all aggregates which contain one in a nested member. - Corrected an older comment Reviewers: thakis, rengolin, sdesmalen Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D63321 llvm-svn: 363658	2019-06-18 10:11:56 +00:00
Simon Pilgrim	6658bfb171	[X86] Regenerate promote.ll. NFC. llvm-svn: 363657	2019-06-18 10:10:53 +00:00
Fangrui Song	291e11ea02	[llvm-objdump] Tidy up AMDGCNPrettyPrinter llvm-svn: 363650	2019-06-18 06:35:18 +00:00
Craig Topper	02a445c245	[X86] Add i128 ctpop and i32/i64/i128 optsize test cases to popcnt.ll Test cases for PR41151 and D59909. llvm-svn: 363647	2019-06-18 04:52:49 +00:00
Craig Topper	587427716c	[X86] Remove MOVDI2SSrm/MOV64toSDrm/MOVSS2DImr/MOVSDto64mr CodeGenOnly instructions. The isel patterns for these use a bitcast and load/store, but DAG combine should have canonicalized those away. For the purposes of the memory folding table these opcodes can be replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes. llvm-svn: 363644	2019-06-18 03:23:15 +00:00
Craig Topper	8582ecd8d9	[X86] Introduce new MOVSSrm/MOVSDrm opcodes that use VR128 register class. Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt. Use the new versions in patterns that previously used a COPY_TO_REGCLASS to VR128. These patterns expect the upper bits to be zero. The current set up appears to work, but I'm not sure we should be enforcing upper bits being zero through a COPY_TO_REGCLASS. I wanted to flip the arrangement and use a COPY_TO_REGCLASS to FR32/FR64 for the patterns that need an f32/f64 result, but that complicated fastisel and globalisel. I've been doing some experiments with reducing some isel patterns and ended up in a situation where I had a (SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our post-isel peephole was unable to avoid using an instruction for the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128 instruction removes the COPY_TO_REGCLASS that was breaking this. llvm-svn: 363643	2019-06-18 03:23:11 +00:00
Alex Brachet	7747700937	[llvm-strip] Error when using stdin twice Summary: Implements bug [[ https://bugs.llvm.org/show_bug.cgi?id=42204 \| 42204 ]]. llvm-strip now warns when the same input file is used more than once, and errors when stdin is used more than once. Reviewers: jhenderson, rupprecht, espindola, alexshap Reviewed By: jhenderson, rupprecht Subscribers: emaste, arichardson, jakehehrlich, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63122 llvm-svn: 363638	2019-06-18 00:39:10 +00:00
Matt Arsenault	5a321b899e	GlobalISel: Use the original flags when lowering fneg to fsub This was ignoring the flag on fneg, and using the source instruction's flags. Also fixes tests missing from r358702. Note the expansion itself isn't correct without nnan, but that should be fixed separately. llvm-svn: 363637	2019-06-17 23:48:43 +00:00
Peter Collingbourne	d57f7cc15e	hwasan: Use bits [3..11) of the ring buffer entry address as the base stack tag. This saves roughly 32 bytes of instructions per function with stack objects and causes us to preserve enough information that we can recover the original tags of all stack variables. Now that stack tags are deterministic, we no longer need to pass -hwasan-generate-tags-with-calls during check-hwasan. This also means that the new stack tag generation mechanism is exercised by check-hwasan. Differential Revision: https://reviews.llvm.org/D63360 llvm-svn: 363636	2019-06-17 23:39:51 +00:00
Peter Collingbourne	fb9ce100d1	hwasan: Add a tag_offset DWARF attribute to instrumented stack variables. The goal is to improve hwasan's error reporting for stack use-after-return by recording enough information to allow the specific variable that was accessed to be identified based on the pointer's tag. Currently we record the PC and lower bits of SP for each stack frame we create (which will eventually be enough to derive the base tag used by the stack frame) but that's not enough to determine the specific tag for each variable, which is the stack frame's base tag XOR a value (the "tag offset") that is unique for each variable in a function. In IR, the tag offset is most naturally represented as part of a location expression on the llvm.dbg.declare instruction. However, the presence of the tag offset in the variable's actual location expression is likely to confuse debuggers which won't know about tag offsets, and moreover the tag offset is not required for a debugger to determine the location of the variable on the stack, so at the DWARF level it is represented as an attribute so that it will be ignored by debuggers that don't know about it. Differential Revision: https://reviews.llvm.org/D63119 llvm-svn: 363635	2019-06-17 23:39:41 +00:00
Amara Emerson	146882242f	[GlobalISel][Localizer] Rewrite localizer to run in 2 phases, inter & intra block. Inter-block localization is the same as what currently happens, except now it only runs on the entry block because that's where the problematic constants with long live ranges come from. The second phase is a new intra-block localization phase which attempts to re-sink the already localized instructions further right before one of the multiple uses. One additional change is to also localize G_GLOBAL_VALUE as they're constants too. However, on some targets like arm64 it takes multiple instructions to materialize the value, so some additional heuristics with a TTI hook have been introduced attempt to prevent code size regressions when localizing these. Overall, these changes improve CTMark code size on arm64 by 1.2%. Full code size results: Program baseline new diff ------------------------------------------------------------------------------ test-suite...-typeset/consumer-typeset.test 1249984 1217216 -2.6% test-suite...:: CTMark/ClamAV/clamscan.test 1264928 1232152 -2.6% test-suite :: CTMark/SPASS/SPASS.test 1394092 1361316 -2.4% test-suite...Mark/mafft/pairlocalalign.test 731320 714928 -2.2% test-suite :: CTMark/lencod/lencod.test 1340592 `1324200` -1.2% test-suite :: CTMark/kimwitu++/kc.test 3853512 3820420 -0.9% test-suite :: CTMark/Bullet/bullet.test 3406036 3389652 -0.5% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8017000 8016992 -0.0% test-suite...TMark/7zip/7zip-benchmark.test 2856588 2856588 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 765704 765704 0.0% Geomean difference -1.2% Differential Revision: https://reviews.llvm.org/D63303 llvm-svn: 363632	2019-06-17 23:20:29 +00:00
Michael Berg	f9bff2a55e	Propagate fmf in IRTranslate for fneg Summary: This case is related to D63405 in that we need to be propagating FMF on negates. Reviewers: volkan, spatel, arsenm Reviewed By: arsenm Subscribers: wdng, javed.absar Differential Revision: https://reviews.llvm.org/D63458 llvm-svn: 363631	2019-06-17 23:19:40 +00:00
Stanislav Mekhanoshin	ca42687d62	[AMDGPU] gfx1010 subvector test. NFC. llvm-svn: 363623	2019-06-17 21:55:06 +00:00
Volkan Keles	689509edab	[test][AArch64] Relax the check line for G_BRJT in legalizer-info-validation.mir Replace the specific number with a pattern to relax the test. llvm-svn: 363621	2019-06-17 21:25:25 +00:00
Philip Reames	44475363e8	Teach getSCEVAtScope how to handle loop phis w/invariant operands in loops w/taken backedges This patch really contains two pieces: Teach SCEV how to fold a phi in the header of a loop to the value on the backedge when a) the backedge is known to execute at least once, and b) the value is safe to use globally within the scope dominated by the original phi. Teach IndVarSimplify's rewriteLoopExitValues to allow loop invariant expressions which already exist (and thus don't need new computation inserted) even in loops where we can't optimize away other uses. Differential Revision: https://reviews.llvm.org/D63224 llvm-svn: 363619	2019-06-17 21:06:17 +00:00
Daniel Sanders	184c8ee920	[globalisel] Fix iterator invalidation in the extload combines Summary: Change the way we deal with iterator invalidation in the extload combines as it was still possible to neglect to visit a use. Even worse, it happened in the in-tree test cases and the checks weren't good enough to detect it. We now take a cheap copy of the use list before iterating over it. This prevents iterator invalidation from occurring and has the nice side effect of making the existing schedule-for-erase/schedule-for-insert mechanism moot. Reviewers: aditya_nandakumar Reviewed By: aditya_nandakumar Subscribers: rovka, kristof.beyls, javed.absar, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61813 llvm-svn: 363616	2019-06-17 20:56:31 +00:00
Stanislav Mekhanoshin	3138278287	[AMDGPU] Propagate function attributes thru bitcasts AMDGPUPropagateAttributes will not work on function bitcatsts, so move AMDGPUFixFunctionBitcasts before it. Differential Revision: https://reviews.llvm.org/D63455 llvm-svn: 363614	2019-06-17 20:42:48 +00:00
Philip Reames	fe8bd96ebd	Fix a bug w/inbounds invalidation in LFTR (recommit) Recommit r363289 with a bug fix for crash identified in pr42279. Issue was that a loop exit test does not have to be an icmp, leading to a null dereference crash when new logic was exercised for that case. Test case previously committed in r363601. Original commit comment follows: This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV. The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying. As such, our potential UB triggering use does not change the semantics of the original program. As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well. (Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.) Differential Revision: https://reviews.llvm.org/D62939 llvm-svn: 363613	2019-06-17 20:32:22 +00:00
Nicolai Haehnle	ae4fcb97dd	AMDGPU/GFX10: Don't generate s_code_end padding in the asm-printer Summary: The purpose of the padding is to guard against stale code being fetched into the instruction cache by the lowest level prefetching. We're generating relocatable ELF here, and so the padding should arguably be added by the linker. This is in fact what Mesa does. This also fixes multi-part shaders for Mesa. Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82 Reviewers: arsenm, rampitec, t-tye Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63427 llvm-svn: 363602	2019-06-17 19:28:43 +00:00
Philip Reames	58c75565f3	Reduced test case for pr42279 in advance of the relevant re-commit + fix llvm-svn: 363601	2019-06-17 19:27:45 +00:00
Nicolai Haehnle	8af7198c6c	AMDGPU: Explicitly define a triple for some tests Summary: This is related to the changes to the groupstaticsize intrinsic in D61494 which would otherwise make the related tests in these files fail or much less useful. Note that for some reason, SOPK generation is less effective in the amdhsa OS, which is why I chose PAL. I haven't investigated this deeper. Change-Id: I6bb99569338f7a433c28b4c9eb1e3e036b00d166 Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63392 llvm-svn: 363600	2019-06-17 19:25:57 +00:00
Joseph Tremoulet	daa1ae6142	[EarlyCSE] Fix hashing of self-compares Summary: Update compare normalization in SimpleValue hashing to break ties (when the same value is being compared to itself) by switching to the swapped predicate if it has a lower numerical value. This brings the hashing in line with isEqual, which already recognizes the self-compares with swapped predicates as equal. Fixes PR 42280. Reviewers: spatel, efriedma, nikic, fhahn, uabelho Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63349 llvm-svn: 363598	2019-06-17 19:11:28 +00:00
Alina Sbirlea	7a0098aa6e	[MemorySSA] Don't use template when the clone is a simplified instruction. Summary: LoopRotate doesn't create a faithful clone of an instruction, it may simplify it beforehand. Hence the clone of an instruction that has a MemoryDef associated may not be a definition, but a use or not a memory alternig instruction. Don't rely on the template when the clone may be simplified. Reviewers: george.burgess.iv Subscribers: jlebar, Prazek, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63355 llvm-svn: 363597	2019-06-17 18:58:40 +00:00
Jessica Paquette	49537bbf74	[GlobalISel][AArch64] Fold G_SUB into G_ICMP when it's safe to do so Basically porting over the behaviour in AArch64ISelLowering to GISel. See emitComparison for reference. When we have something like this: ``` lhs = G_SUB 0, y ... G_ICMP lhs, rhs ``` We can fold away the G_SUB and produce a cmn instead, given that we produce the same value in NZCV. Add a test showing that the transformation works, and also showing that we don't perform the transformation when it's unsafe. Also factor out the CSet emission into emitCSetForICMP. Differential Revision: https://reviews.llvm.org/D63163 llvm-svn: 363596	2019-06-17 18:40:06 +00:00
Simon Pilgrim	835999e48a	[X86][SSE] Scalarize under-aligned XMM vector nt-stores (PR42026) If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64). llvm-svn: 363592	2019-06-17 18:20:04 +00:00
Alina Sbirlea	05f77803f4	[MemorySSA] Add all MemoryPhis before filling their values. Summary: Add all MemoryPhis in IDF before filling in their incomign values. Otherwise, a new Phi can be added that needs to become the incoming value of another Phi. Test fails the verification in verifyPrevDefInPhis. Reviewers: george.burgess.iv Subscribers: jlebar, Prazek, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63353 llvm-svn: 363590	2019-06-17 18:16:53 +00:00
Stanislav Mekhanoshin	a9191c8492	[AMDGPU] gfx1010 wavefrontsize intrinsic folding Differential Revision: https://reviews.llvm.org/D63206 llvm-svn: 363588	2019-06-17 17:57:50 +00:00
Matt Arsenault	6d741f29ec	AMDGPU: Fold readlane/readfirstlane calls llvm-svn: 363587	2019-06-17 17:52:35 +00:00
Stanislav Mekhanoshin	ad04e7ad42	[AMDGPU] Pass to propagate ABI attributes from kernels to the functions The pass works in two modes: Mode 1: Just set attributes starting from kernels. This can work at the very beginning of opt and llc pipeline, but cannot clone functions because it must be a function pass. Mode 2: Actually clone functions for new attributes. This can only work after all function passes in the opt pipeline because it has to be a module pass. Differential Revision: https://reviews.llvm.org/D63208 llvm-svn: 363586	2019-06-17 17:47:28 +00:00
Simon Pilgrim	bb9adfdb4e	[X86][AVX] Split under-aligned vector nt-stores. If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it. llvm-svn: 363582	2019-06-17 17:22:38 +00:00
Warren Ristow	6452bdd29b	[LV] Suppress vectorization in some nontemporal cases When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type DataType, unsigned Alignment) const; bool isLegalNTLoad(Type DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 llvm-svn: 363581	2019-06-17 17:20:08 +00:00
Matt Arsenault	3e140066bc	GlobalISel: Ignore callsite attributes when picking intrinsic type A target intrinsic may be defined as possibly reading memory, but the call site may have additional knowledge that it doesn't read memory. The intrinsic lowering will expect the pessimistic assumption of the intrinsic definition, so the chain should still be used. I fixed the same bug in SelectionDAG in r287593. llvm-svn: 363580	2019-06-17 17:01:35 +00:00
Matt Arsenault	a7f09f3c9e	GlobalISel: Verify intrinsics I keep using the wrong instruction when manually writing tests. This really needs to check the number of operands, but I don't see an easy way to do that right now. llvm-svn: 363579	2019-06-17 17:01:32 +00:00
Stanislav Mekhanoshin	5d00c3060e	[AMDGPU] gfx1010 wave32 metadata Differential Revision: https://reviews.llvm.org/D63207 llvm-svn: 363577	2019-06-17 16:48:56 +00:00
Tom Stellard	8b1c53b528	AMDGPU/GlobalISel: Implement select for G_ICMP and G_SELECT Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60640 llvm-svn: 363576	2019-06-17 16:27:43 +00:00
Francis Visoiu Mistrih	34667519dc	[Remarks] Extend -fsave-optimization-record to specify the format Use -fsave-optimization-record=<format> to specify a different format than the default, which is YAML. For now, only YAML is supported. llvm-svn: 363573	2019-06-17 16:06:00 +00:00
Simon Pilgrim	1c91e63897	[X86][SSE] Add tests for underaligned nt loads Test both 'unaligned' (which we should just use regular unaligned loads) and 'subvector aligned' (which we should split) llvm-svn: 363565	2019-06-17 14:38:17 +00:00
Simon Pilgrim	454e6b9010	[X86][SSE] Prevent misaligned non-temporal vector load/store combines For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway. First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets. Differential Revision: https://reviews.llvm.org/D63246 llvm-svn: 363564	2019-06-17 14:26:10 +00:00
Matt Arsenault	1df203d78e	InferAddressSpaces: Fix cloning original addrspacecast If an addrspacecast needed to be inserted again, this was creating a clone of the original cast for each user. Just use the original, which also saves losing the value name. llvm-svn: 363562	2019-06-17 14:13:29 +00:00
Matt Arsenault	b10f097833	AMDGPU: Ignore subtarget for InferAddressSpaces Even if the target doesn't have flat instructions, addrspace(0) is still flat. It just happens to not work. llvm-svn: 363561	2019-06-17 14:13:24 +00:00
Matt Arsenault	f3b64d80bc	AMDGPU: Mark exp/exp.compr as inaccessiblememonly Should also be marked writeonly, but I think that would require splitting the version with done set to a separate intrinsic Test change is only from renumbering the attribute group numbers, which for some reason the generated check lines consider. llvm-svn: 363560	2019-06-17 13:52:24 +00:00
Sam Parker	1bd3d00e7e	[CodeGen] Check for HardwareLoop Latch ExitBlock The HardwareLoops pass finds exit blocks with a scevable exit count. If the target specifies to update the loop counter in a register, through a phi, we need to ensure that the exit block is a latch so that we can insert the phi with the correct value for the incoming edge. Differential Revision: https://reviews.llvm.org/D63336 llvm-svn: 363556	2019-06-17 13:39:28 +00:00
Simon Pilgrim	f1e2827170	[X86][SSE] Avoid unnecessary stack codegen in NT store codegen tests. llvm-svn: 363552	2019-06-17 12:35:26 +00:00
Bjorn Pettersson	83773b77a5	[LV] Deny irregular types in interleavedAccessCanBeWidened Summary: Avoid that loop vectorizer creates loads/stores of vectors with "irregular" types when interleaving. An example of an irregular type is x86_fp80 that is 80 bits, but that may have an allocation size that is 96 bits. So an array of x86_fp80 is not bitcast compatible with a vector of the same type. Not sure if interleavedAccessCanBeWidened is the best place for this check, but it solves the problem seen in the added test case. And it is the same kind of check that already exists in memoryInstructionCanBeWidened. Reviewers: fhahn, Ayal, craig.topper Reviewed By: fhahn Subscribers: hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63386 llvm-svn: 363547	2019-06-17 12:02:24 +00:00
Sander de Smalen	74ac20158a	Test forward references in IntrinsicEmitter on Neon LD(2\|3\|4) This patch tests the forward-referencing added in D62995 by changing some existing intrinsics to use forward referencing of overloadable parameters, rather than backward referencing. This patch changes the TableGen definition/implementation of llvm.aarch64.neon.ld2lane and llvm.aarch64.neon.ld2lane intrinsics (and similar for ld3 and ld4). This change is intended to be non-functional, since the behaviour of the intrinsics is expected to be the same. Reviewers: arsenm, dmgreen, RKSimon, greened, rnk Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D63189 llvm-svn: 363546	2019-06-17 12:01:53 +00:00
Luis Marques	2e46312ffd	[DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting Some GEPs were not being split, presumably because that split would just be undone by the DAGCombiner. Not performing those splits can prevent important optimizations, such as preventing the element indices / member offsets from being (partially) folded into load/store instruction immediates. This patch: - Makes the splits also occur in the cases where the base address and the GEP are in the same BB. - Ensures that the DAGCombiner doesn't reassociate them back again. Differential Revision: https://reviews.llvm.org/D60294 llvm-svn: 363544	2019-06-17 10:54:12 +00:00
Simon Pilgrim	ef78e55205	[SelectionDAG] Fold insert_subvector(undef, extract_subvector(v, c), c) -> v in getNode This is already done in DAGCombiner::visitINSERT_SUBVECTOR, but this helps a number of shuffles across different vector widths recognise when they come from the same source. llvm-svn: 363542	2019-06-17 10:14:52 +00:00
Sam Parker	60d6fb2a63	[SCEV] Use NoWrapFlags when expanding a simple mul Second functional change following on from rL362687. Pass the NoWrapFlags from the MulExpr to InsertBinop when we're generating a shl or mul. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 363540	2019-06-17 10:05:18 +00:00
Fangrui Song	46f9cbe28d	[llvm-objdump] Use %08 instead of %016 to print leading addresses for 32-bit binaries Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D63398 llvm-svn: 363539	2019-06-17 09:59:55 +00:00
Fangrui Song	ac14f7b10c	[lit] Delete empty lines at the end of lit.local.cfg NFC llvm-svn: 363538	2019-06-17 09:51:07 +00:00
Roman Lebedev	25a043e78a	[NFC][Codegen] Standalone tests for icmp eq/ne (urem %x, C), 0 -> icmp eq/ne %x, 0 fold (D63390) llvm-svn: 363537	2019-06-17 09:50:50 +00:00
Sander de Smalen	5d6ee76c16	Describe stack-id as an enum This patch changes MIR stack-id from an integer to an enum, and adds printing/parsing support for this in MIR files. The default stack-id '0' is now renamed to 'default'. This should make MIR tests that have stack objects with different stack-ids more descriptive. It also clarifies code operating on StackID. Reviewers: arsenm, thegameg, qcolombet Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60137 llvm-svn: 363533	2019-06-17 09:13:29 +00:00
Hans Wennborg	a9e5d2f35d	Re-commit r357452 (take 3): "SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)" Third time's the charm. This was reverted in r363220 due to being suspected of an internal benchmark regression and a test failure, none of which turned out to be caused by this. llvm-svn: 363529	2019-06-17 07:47:28 +00:00
Yevgeny Rouban	ee62c40eae	[SimplifyCFG] Fix prof branch_weights MD while removing unreachable switch cases SimplifyCFG has a bug that results in inconsistent prof branch_weights metadata if unreachable switch cases are removed. This patch fixes this bug by making use of the newly introduced SwitchInstProfUpdateWrapper class (see patch D62122). A new test is created. Differential Revision: https://reviews.llvm.org/D62186 llvm-svn: 363527	2019-06-17 05:55:12 +00:00
Justin Hibbits	1d1cf30b73	PowerPC: Optimize SPE double parameter calling setup Summary: SPE passes doubles the same as soft-float, in register pairs as i32 types. This is all handled by the target-independent layer. However, this is not optimal when splitting or reforming the doubles, as it pushes to the stack and loads from, on either side. For instance, to pass a double argument to a function, assuming the double value is in r5, the sequence currently looks like this: evstdd 5, X(1) lwz 3, X(1) lwz 4, X+4(1) Likewise, to form a double into r5 from args in r3 and r4: stw 3, X(1) stw 4, X+4(1) evldd 5, X(1) This optimizes the fence to use SPE instructions. Now, to pass a double to a function: mr 4, 5 evmergehi 3, 5, 5 And to form a double into r5 from args in r3 and r4: evmergelo 5, 3, 4 This is comparable to the way that gcc generates the double splits. This also fixes a bug with expanding builtins to libcalls, where the LowerCallTo() code path was generating intermediate illegal type nodes. Reviewers: nemanjai, hfinkel, joerg Subscribers: kbarton, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54583 llvm-svn: 363526	2019-06-17 03:15:23 +00:00
Seiya Nuta	4f15732067	[yaml2obj][MachO] Don't fill dummy data for virtual sections Summary: Currently, MachOWriter::writeSectionData writes dummy data (0xdeadbeef) to fill section data areas in the file even if the section is a virtual one. Since virtual sections don't occupy any space in the file, writing dummy data could results the "OS.tell() - fileStart <= Sec.offset" assertion failure. This patch fixes the bug by simply not writing any dummy data for virtual sections. Reviewers: beanz, jhenderson, rupprecht, alexshap Reviewed By: alexshap Subscribers: compnerd, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62991 llvm-svn: 363525	2019-06-17 02:07:20 +00:00
Seiya Nuta	13de174b4c	[llvm-objcopy] Add elf32-sparc and elf32-sparcel target Summary: The "sparc"/"sparcel" architectures appears in ArchMap (used by -B option) but not in OutputFormatMap (used by -I/-O option). Add their targets into OutputFormatMap for consistency. Note that AFAIK there're no targets for 32-bit little-endian SPARC ("elf32-sparcel") in GNU binutils. Reviewers: espindola, alexshap, rupprecht, jhenderson, compnerd, jakehehrlich Reviewed By: jhenderson, compnerd, jakehehrlich Subscribers: jyknight, emaste, arichardson, fedor.sergeev, jakehehrlich, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63238 llvm-svn: 363524	2019-06-17 02:03:45 +00:00
Roman Lebedev	5a663bd77a	[InstSimplify] Fix addo/subo undef folds (PR42209) Fix folds of addo and subo with an undef operand to be: `@llvm.{u,s}{add,sub}.with.overflow` all fold to `{ undef, false }`, as per LLVM undef rules. Same for commuted variants. Based on the original version of the patch by @nikic. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42209 \| PR42209 ]] Differential Revision: https://reviews.llvm.org/D63065 llvm-svn: 363522	2019-06-16 20:39:45 +00:00
Nicolai Haehnle	41abf2766e	AMDGPU: Prepare for explicit absolute relocations in code generation Summary: We will use absolute relocations for LDS symbols. Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61492 llvm-svn: 363517	2019-06-16 17:43:37 +00:00
Nicolai Haehnle	6d71be4e67	AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0 Summary: Instead of encoding a high-word of 0 using a fake TargetGlobalAddress, just use a literal target constant. This simplifies some subsequent changes. The generated assembly is now more explicit about the kind of relocation that is to be used. Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61491 llvm-svn: 363516	2019-06-16 17:32:01 +00:00
Nicolai Haehnle	490e83cd43	AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsic Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539 Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62486 Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0 llvm-svn: 363514	2019-06-16 17:14:12 +00:00
Stanislav Mekhanoshin	5250021672	[AMDGPU] gfx10 conditional registers handling This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513	2019-06-16 17:13:09 +00:00
Sanjay Patel	c8d88ad1a9	[CodeGenPrepare][x86] shift both sides of a vector select when profitable This is based on the example/discussion in PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 Proper vector shift instructions don't appear until AVX2, so we may generate several extra instructions within a loop trying to compensate for that. It's difficult to recover from that shift expansion later than this, so use the existing TLI hook and splat analysis to enable better codegen. This extends CGP functionality introduced with: rL201655 Differential Revision: https://reviews.llvm.org/D63233 llvm-svn: 363511	2019-06-16 15:29:03 +00:00
Sanjay Patel	d14389c0a5	[x86] split 256-bit vector selects if operands are vector concats This is similar logic/motivation to the select splitting in D62969. In D63233, the pattern changes so that we no longer have an extract_subvector of vselect, but the operands of the select are still being concatenated. The closest case is represented in either the first or last test diffs here - we have an extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions. I think that's the right trade-off for most AVX1 targets. In the example based on PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 ...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx). Differential Revision: https://reviews.llvm.org/D63364 llvm-svn: 363508	2019-06-16 14:04:49 +00:00
Simon Pilgrim	fcffc2facc	[X86] CombineShuffleWithExtract - handle cases with different vector extract sources Insert the shorter vector source into an undef vector of the longer vector source's type. llvm-svn: 363507	2019-06-16 08:00:41 +00:00
Simon Pilgrim	90e87af303	[X86][AVX] Handle lane-crossing shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well. Fixes PR34380 llvm-svn: 363500	2019-06-15 18:30:43 +00:00
Simon Pilgrim	990f3ceb67	[X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3) This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0) llvm-svn: 363499	2019-06-15 17:05:24 +00:00
Roman Lebedev	5dd61974f9	[NFC][MCA][X86] Add one more 'clear super register' pattern - movss/movsd load clears high XMM bits llvm-svn: 363498	2019-06-15 16:12:13 +00:00
Roman Lebedev	680c43b73a	[NFC][MCA][X86] Add baseline test coverage for AMD Barcelona (aka K10, fam10h) Looking into sched model for that CPU ... llvm-svn: 363497	2019-06-15 16:12:05 +00:00
Kang Zhang	2d51adcb57	[PowerPC] Set the innermost hot loop to align 32 bytes Summary: If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that we can decrease cache misses and branch-prediction misses. Actual alignment of the loop will depend on the hotness check and other logic in alignBlocks. The old code will only align hot loop to 32 bytes when the LoopSize larger than 16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop to 32 bytes not only for the hot loop whose size is 16~32 bytes. Reviewed By: steven.zhang, jsji Differential Revision: https://reviews.llvm.org/D61228 llvm-svn: 363495	2019-06-15 15:10:24 +00:00
Nikita Popov	8550fb386a	[SCEV] Use unsigned/signed intersection type in SCEV Based on D59959, this switches SCEV to use unsigned/signed range intersection based on the sign hint. This will prefer non-wrapping ranges in the relevant domain. I've left the one intersection in getRangeForAffineAR() to use the smallest intersection heuristic, as there doesn't seem to be any obvious preference there. Differential Revision: https://reviews.llvm.org/D60035 llvm-svn: 363490	2019-06-15 09:15:52 +00:00
Nikita Popov	9145562b48	[SimplifyIndVar] Simplify non-overflowing saturating add/sub If we can detect that saturating math that depends on an IV cannot overflow, replace it with simple math. This is similar to the CVP optimization from D62703, just based on a different underlying analysis (SCEV vs LVI) that catches different cases. Differential Revision: https://reviews.llvm.org/D62792 llvm-svn: 363489	2019-06-15 08:48:52 +00:00
Fangrui Song	e1aa69f755	[RISCV] Regenerate remat.ll and atomic-rmw.ll after D43256 llvm-svn: 363487	2019-06-15 07:49:14 +00:00
Alex Brachet	899a3072f0	[objcopy] Error when --preserve-dates is specified with standard streams Summary: llvm-objcopy/strip now error when -p is specified when reading from stdin or writing to stdout Reviewers: jhenderson, rupprecht, espindola, alexshap Reviewed By: jhenderson, rupprecht Subscribers: emaste, arichardson, jakehehrlich, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63090 llvm-svn: 363485	2019-06-15 05:32:23 +00:00
Michael Berg	ad6bb86b2d	adding more fmf propagation for selects plus updated tests llvm-svn: 363484	2019-06-15 04:53:51 +00:00
Fangrui Song	968b5f84af	Revert "adding more fmf propagation for selects plus tests" This reverts rL363474. -debug-only=isel was added to some tests that don't specify `REQUIRES: asserts`. This causes failures on -DLLVM_ENABLE_ASSERTIONS=off builds. I chose to revert instead of fixing the tests because I'm not sure whether we should add `REQUIRES: asserts` to more tests. llvm-svn: 363482	2019-06-15 03:51:08 +00:00
Huihui Zhang	dc2fd6a14e	[InstCombine] Add tests to show missing fold opportunity for "icmp and shift" (nfc). Summary: For icmp pred (and (sh X, Y), C), 0 When C is signbit, expect to fold (X << Y) & signbit ==/!= 0 into (X << Y) >=/< 0, rather than (X & (signbit >> Y)) != 0. When C+1 is power of 2, expect to fold (X << Y) & ~C ==/!= 0 into (X << Y) </>= C+1, rather than (X & (~C >> Y)) == 0. For icmp pred (and X, (sh signbit, Y)), 0 Expect to fold (X & (signbit l>> Y)) ==/!= 0 into (X << Y) >=/< 0 Expect to fold (X & (signbit << Y)) ==/!= 0 into (X l>> Y) >=/< 0 Reviewers: lebedev.ri, efriedma, spatel, craig.topper Reviewed By: lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63025 llvm-svn: 363479	2019-06-15 00:33:41 +00:00
Matt Arsenault	9487278010	Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect" This reapplies r363410, avoiding null dereference if there is no AltRegBank. llvm-svn: 363478	2019-06-15 00:33:26 +00:00
Mitch Phillips	0d44f129bb	Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect" This patch breaks UBSan build bots. See https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for a guide as to how to reproduce the error. This reverts commit `c2864c0de0`. This reverts rL363410. llvm-svn: 363476	2019-06-14 23:45:34 +00:00
Michael Berg	69394bedc5	adding more fmf propagation for selects plus tests llvm-svn: 363474	2019-06-14 23:30:52 +00:00
Guozhi Wei	d2210af332	[MBP] Move a latch block with conditional exit and multi predecessors to top of loop Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is: * a latch block * it has two successors, one is loop header, another is exit * it has more than one predecessors If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch. Differential Revision: https://reviews.llvm.org/D43256 llvm-svn: 363471	2019-06-14 23:08:59 +00:00
Akira Hatanaka	a704a8f28c	[ObjC][ARC] Delete ObjC runtime calls on global variables annotated with 'objc_arc_inert' Those calls are no-ops, so they can be safely deleted. rdar://problem/49839633 Differential Revision: https://reviews.llvm.org/D62433 llvm-svn: 363468	2019-06-14 22:06:32 +00:00
Matt Arsenault	aa41e92e17	AMDGPU: Avoid most waitcnts before calls Currently you get extra waits, because waits are inserted for the register dependencies of the call, and the function prolog waits on everything. Currently waits are still inserted on returns. It may make sense to not do this, and wait in the caller instead. llvm-svn: 363465	2019-06-14 21:52:26 +00:00
Francis Visoiu Mistrih	5501dda247	[Remarks][NFC] Improve testing and documentation of -foptimization-record-passes This adds: * documentation to the user manual * nicer error message * test for the error case * test for the gold plugin llvm-svn: 363463	2019-06-14 21:38:57 +00:00
Matt Arsenault	282dac717e	SROA: Allow eliminating addrspacecasted allocas There is a circular dependency between SROA and InferAddressSpaces today that requires running both multiple times in order to be able to eliminate all simple allocas and addrspacecasts. InferAddressSpaces can't remove addrspacecasts when written to memory, and SROA helps move pointers out of memory. This should avoid inserting new commuting addrspacecasts with GEPs, since there are unresolved questions about pointer wrapping between different address spaces. For now, don't replace volatile operations that don't match the alloca addrspace, as it would change the address space of the access. It may be still OK to insert an addrspacecast from the new alloca, but be more conservative for now. llvm-svn: 363462	2019-06-14 21:38:31 +00:00
Matt Arsenault	e6efb6433f	SROA: Add baseline test for addrspacecast changes llvm-svn: 363460	2019-06-14 21:22:26 +00:00
Matt Arsenault	bb0a610599	AMDGPU: Fix capitalized register names in asm constraints This was a workaround a long time ago, but the canonical lower case names work now. llvm-svn: 363459	2019-06-14 21:16:06 +00:00
Matt Arsenault	9e5fa33378	AMDGPU: Fix dropping memref for ds append/consume The way SelectionDAG treats memory operands is very frustrating, and by default drops them unless a property is set on the pattern. There is no pattern for manually selected instructions, so this requires manually setting them. llvm-svn: 363455	2019-06-14 21:01:24 +00:00
Matt Arsenault	1509fde891	AMDGPU: Add baseline test for call waitcnt insertion llvm-svn: 363453	2019-06-14 21:01:23 +00:00
Sanjay Patel	501bb982b9	[x86] add test for 256-bit blendv with AVX targets; NFC This is a reduction of the pattern seen in D63233. llvm-svn: 363448	2019-06-14 20:03:42 +00:00
Amara Emerson	f79d3bc724	[GlobalISel] Add a G_BRJT opcode. This is a branch opcode that takes a jump table pointer, jump table index and an index into the table to do an indirect branch. We pass both the table pointer and JTI to allow targets like ARM64 to more easily use the existing jump table compression optimization without having to walk up the block to find a paired G_JUMP_TABLE. Differential Revision: https://reviews.llvm.org/D63159 llvm-svn: 363434	2019-06-14 17:55:48 +00:00
Florian Hahn	dcdd12b68c	Revert Fix a bug w/inbounds invalidation in LFTR Reverting because it breaks a green dragon build: http://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/18208 This reverts r363289 (git commit `eb88badff9`) llvm-svn: 363427	2019-06-14 17:23:09 +00:00
Valery Pykhtin	ffeb01c113	[AMDGPU] Don't constrain callees with inlinehint from inlining on MaxBB check Summary: Function bodies marked inline in an opencl source are eliminated but MaxBB check may prevent inlining them leaving undefined references. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, Anastasia, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63337 llvm-svn: 363418	2019-06-14 16:37:33 +00:00
Kevin P. Neal	fece7c6c83	[FPEnv] Lower STRICT_FP_EXTEND and STRICT_FP_ROUND nodes in preprocess phase of ISelLowering to mirror non-strict nodes on x86. I recently discovered a bug on the x86 platform: The fp80 type was not handled well by x86 for constrained floating point nodes, as their regular counterparts are replaced by extending loads and truncating stores during the preprocess phase. Normally, platforms don't have this issue, as they don't typically attempt to perform such legalizations during instruction selection preprocessing. Before this change, strict_fp nodes survived until they were mutated to normal nodes, which happened shortly after preprocessing on other platforms. This modification lowers these nodes at the same phase while properly utilizing the chain.5 Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Craig Topper, Kevin P. Neal Approved by: Craig Topper Differential Revision: https://reviews.llvm.org/D63271 llvm-svn: 363417	2019-06-14 16:28:55 +00:00
Sanjay Patel	75312aa805	[x86] move vector shift tests for PR37428; NFC As suggested in the post-commit thread for rL363392 - it's wasteful to have so many runs for larger tests. AVX1/AVX2 is what shows the diff and probably what matters most going forward. llvm-svn: 363411	2019-06-14 15:23:09 +00:00
Matt Arsenault	c2864c0de0	GlobalISel: Avoid producing Illegal copies in RegBankSelect Avoid producing illegal register bank copies for reg_sequence and phi. The default implementation assumes it is possible to pick any operand's bank and use that for the result, introducing a copy for operands with a different bank. This does not check for illegal copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR operand requires the result to be a VGPR. The changes in getInstrMappingImpl aren't strictly necessary, since AMDGPU now just bypasses this for reg_sequence/phi. This could be replaced with an assert in case other targets run into this. It is currently responsible for producing the error for unsatisfiable copies, but this will be better served with a verifier check. For phis, for now assume any undetermined operands must be VGPRs. Eventually, this needs to be able to defer mapping these operations. This also does not yet have a way to check for whether the block is in a divergent region. llvm-svn: 363410	2019-06-14 15:22:25 +00:00
Sanjay Patel	7ea378b940	[CodeGenPrepare] propagate debuginfo when copying a shuffle llvm-svn: 363409	2019-06-14 15:05:35 +00:00
Matt Arsenault	492d71cc99	AMDGPU: Fold readlane intrinsics of constants I'm not 100% sure about this, since I'm worried about IR transforms that might end up introducing divergence downstream once replaced with a constant, but I haven't come up with an example yet. llvm-svn: 363406	2019-06-14 14:51:26 +00:00
Mikhail Maltsev	d1cc2e1543	[ARM] Add MVE horizontal accumulation instructions This is the family of vector instructions that combine all the lanes in their input vector(s), and output a value in one or two GPRs. Differential Revision: https://reviews.llvm.org/D62670 llvm-svn: 363403	2019-06-14 14:31:13 +00:00
George Rimar	0aecabae14	Revert "Revert r363377: [yaml2obj] - Allow setting custom section types for implicit sections." LLD test case will be fixed in a following commit. Original commit message: [yaml2obj] - Allow setting custom section types for implicit sections. We were hardcoding the final section type for sections that are usually implicit. The patch fixes that. This also fixes a few issues in existent test cases and removes one precompiled object. Differential revision: https://reviews.llvm.org/D63267 llvm-svn: 363401	2019-06-14 14:25:34 +00:00
Rui Ueyama	9f4e21c69a	Revert r363377: [yaml2obj] - Allow setting custom section types for implicit sections. This reverts commit r363377 because lld's ELF/invalid/undefined-local-symbol-in-dso.test test started failing after this commit. llvm-svn: 363394	2019-06-14 13:57:25 +00:00
Sanjay Patel	e5a78cd90f	[x86] add test for original example in PR37428; NFC The reduced case may avoid complications seen in this larger function. llvm-svn: 363392	2019-06-14 13:44:01 +00:00
Matt Arsenault	74d67c2086	AMDGPU: Fix printing trailing whitespace after s_endpgm llvm-svn: 363384	2019-06-14 13:26:29 +00:00
George Rimar	3b523c0a2e	[yaml2obj] - Allow setting custom section types for implicit sections. We were hardcoding the final section type for sections that are usually implicit. The patch fixes that. This also fixes a few issues in existent test cases and removes one precompiled object. Differential revision: https://reviews.llvm.org/D63267 llvm-svn: 363377	2019-06-14 12:16:59 +00:00
James Henderson	f7cfabb45d	[llvm-readobj] Don't abort printing of dynamic table if string reference is invalid If dynamic table is missing, output "dynamic strtab not found'. If the index is out of range, output "Invalid Offset<..>". https://bugs.llvm.org/show_bug.cgi?id=40807 Reviewed by: jhenderson, grimar, MaskRay Differential Revision: https://reviews.llvm.org/D63084 Patch by Yuanfang Chen. llvm-svn: 363374	2019-06-14 12:02:01 +00:00
George Rimar	d6df7ded6e	[llvm-readobj] - Do not fail to dump the object which has wrong type of .shstrtab. Imagine we have object that has .shstrtab with type != SHT_STRTAB. In this case, we fail to dump the object, though GNU readelf dumps it without any issues and warnings. This patch fixes that. It adds a code to ELFDumper.cpp which is based on the implementation of getSectionName from the ELF.h: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Object/ELF.h#L608 https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Object/ELF.h#L431 https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Object/ELF.h#L539 The difference is that all non critical errors are ommitted what allows us to improve the dumping on a tool side. Also, this opens a road for a follow-up that should allow us to dump the section headers, but drop the section names in case if .shstrtab is completely absent and/or broken. Differential revision: https://reviews.llvm.org/D63266 llvm-svn: 363371	2019-06-14 11:56:10 +00:00
Sjoerd Meijer	3058a62b90	[ARM] MVE VPT Block Pass Initial commit of a new pass to create vector predication blocks, called VPT blocks, that are supported by the Armv8.1-M MVE architecture. This is a first naive implementation. I.e., for 2 consecutive predicated instructions I1 and I2, for example, it will generate 2 VPT blocks: VPST I1 VPST I2 A more optimal implementation would obviously put instructions in the same VPT block when they are predicated on the same condition and when it is allowed to do this: VPTT I1 I2 We will address this optimisation with follow up patches when the groundwork is in. Creating VPT Blocks is very similar to IT Blocks, which is the reason I added this to Thumb2ITBlocks.cpp. This allows reuse of the def use analysis that we need for the more optimal implementation. VPT blocks cannot be nested in IT blocks, and vice versa, and so these 2 passes cannot interact with each other. Instructions allowed in VPT blocks must be MVE instructions that are marked as VPT compatible. Differential Revision: https://reviews.llvm.org/D63247 llvm-svn: 363370	2019-06-14 11:46:05 +00:00
George Rimar	43f62ff17c	[yaml2obj] - Allow setting the custom Address for .strtab Despite the fact that .strtab is non-allocatable, there is no reason to disallow setting the custom address for it. The patch also adds a test case showing we can set any address we want for other implicit sections. Differential revision: https://reviews.llvm.org/D63137 llvm-svn: 363368	2019-06-14 11:13:32 +00:00
George Rimar	cfa1a62a4c	[yaml2obj] - Allow setting cutom Flags for implicit sections. With this patch we get ability to set any flags we want for implicit sections defined in YAML. Differential revision: https://reviews.llvm.org/D63136 llvm-svn: 363367	2019-06-14 11:01:14 +00:00
Sam Parker	0cf9639a9c	[SCEV] Pass NoWrapFlags when expanding an AddExpr InsertBinop now accepts NoWrapFlags, so pass them through when expanding a simple add expression. This is the first re-commit of the functional changes from rL362687, which was previously reverted. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 363364	2019-06-14 09:19:41 +00:00
Eugene Leviant	d46ebd207b	[llvm-objcopy][IHEX] Improve test case formatting. NFC Differential revision: https://reviews.llvm.org/D63258 llvm-svn: 363359	2019-06-14 08:09:10 +00:00
Alex Brachet	d54d4f9905	[llvm-objcopy] Changed command line parsing errors Summary: Tidied up errors during command line parsing to be more consistent with the rest of llvm-objcopy errors. Reviewers: jhenderson, rupprecht, espindola, alexshap Reviewed By: jhenderson, rupprecht Subscribers: emaste, arichardson, MaskRay, llvm-commits, jakehehrlich Tags: #llvm Differential Revision: https://reviews.llvm.org/D62973 llvm-svn: 363350	2019-06-14 02:04:02 +00:00
David Blaikie	4129e3e0f8	DebugInfo: Include enumerators in pubnames This is consistent with GCC's behavior (which is the defacto standard for pubnames). Though I find the presence of enumerators from enum classes to be a bit confusing, possibly a bug on GCC's end (since they can't be named unqualified, unlike the other names - and names nested in classes don't go in pubnames, for instance - presumably because one must name the class first & that's enough to limit the scope of the search) llvm-svn: 363349	2019-06-14 01:58:56 +00:00
Tim Shen	4121bdc3d4	[X86] Add target triple for live-debug-values-fragments.mir llvm-svn: 363348	2019-06-14 01:41:04 +00:00
Douglas Yung	5b188f8dac	Add REQUIRES: zlib to test added in r363325 as the profile uses zlib compression. llvm-svn: 363347	2019-06-14 01:08:50 +00:00
Stanislav Mekhanoshin	c43e67bfff	[AMDGPU] gfx1011/gfx1012 targets Differential Revision: https://reviews.llvm.org/D63307 llvm-svn: 363344	2019-06-14 00:33:31 +00:00
Stanislav Mekhanoshin	68a2fef9ae	[AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32 Differential Revision: https://reviews.llvm.org/D63301 llvm-svn: 363339	2019-06-13 23:47:36 +00:00
Seiya Nuta	b1027a480a	[llvm-objcopy] Fix sparc target endianness Summary: AFAIK, the "sparc" target is big endian and the target for 32-bit little-endian SPARC is denoted as "sparcel". This patch fixes the endianness of "sparc" target and adds "sparcel" target for 32-bit little-endian SPARC. Reviewers: espindola, alexshap, rupprecht, jhenderson Reviewed By: jhenderson Subscribers: jyknight, emaste, arichardson, fedor.sergeev, jakehehrlich, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63251 llvm-svn: 363336	2019-06-13 23:24:12 +00:00
Amy Huang	49275272e3	Use fully qualified name when printing S_CONSTANT records Summary: Before it was using the fully qualified name only for static data members. Now it does for all variable names to match MSVC. Reviewers: rnk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63012 llvm-svn: 363335	2019-06-13 22:53:43 +00:00
Amara Emerson	fb0a40f064	[GlobalISel][IRTranslator] Add debug loc with line 0 to constants emitted into the entry block. Constants, including G_GLOBAL_VALUE, are all emitted into the entry block which lets us use the vreg def assuming it dominates all other users. However, it can cause jumpy debug behaviour since the DebugLoc attached to these MIs are from a user instruction that could be in a different block. Fixes PR40887. Differential Revision: https://reviews.llvm.org/D63286 llvm-svn: 363331	2019-06-13 22:15:35 +00:00
Jinsong Ji	1c88445840	[MachinePiepliner] Don't check boundary node in checkValidNodeOrder This was exposed by PowerPC target enablement. In ScheduleDAG, if we haven't seen any uses in this scheduling region, we will create a dependence edge to ExitSU to model the live-out latency. This is required for vreg defs with no in-region use, and prefetches with no vreg def. When we build NodeOrder in Scheduler, we ignore these boundary nodes. However, when we check Succs in checkValidNodeOrder, we did not skip them, so we still assume all the nodes have been sorted and in order in Indices array. So when we call lower_bound() for ExitSU, it will return Indices.end(), causing memory issues in following Node access. Differential Revision: https://reviews.llvm.org/D63282 llvm-svn: 363329	2019-06-13 21:51:12 +00:00
Vedant Kumar	901d04fc6d	[Coverage] Load code coverage data from archives Support loading code coverage data from regular archives, thin archives, and from MachO universal binaries which contain archives. Testing: check-llvm, check-profile (with {A,UB}San enabled) rdar://51538999 Differential Revision: https://reviews.llvm.org/D63232 llvm-svn: 363325	2019-06-13 20:48:57 +00:00
Shawn Landden	24f4085811	[SimplifyCFG] NFC, update Switch tests as a baseline. Also add baseline tests to show effect of later patches. There were a couple of regressions here that were never caught, but my patch set that this is a preparation to will fix them. This is the third attempt to land this patch. Differential Revision: https://reviews.llvm.org/D61150 llvm-svn: 363319	2019-06-13 19:36:38 +00:00
Cameron McInally	79ec1a2957	Revert "[NFC][CodeGen] Add unary fneg tests to fp-fast.ll fp-fold.ll fp-in-intregs.ll fp-stack-compare-cmov.ll fp-stack-compare.ll fsxor-alignment.ll" This reverts commit `1d85a7518c`. llvm-svn: 363317	2019-06-13 19:25:16 +00:00
Cameron McInally	07514a1b16	Revert "[NFC][CodeGen] Add unary fneg tests to fmul-combines.ll fnabs.ll" This reverts commit `5c01140581`. llvm-svn: 363316	2019-06-13 19:25:12 +00:00
Cameron McInally	8984dbc27c	Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma_patterns_wide.ll" This reverts commit `f1b8c6ac4f`. llvm-svn: 363315	2019-06-13 19:25:09 +00:00
Cameron McInally	5d9271802b	Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma_patterns.ll" This reverts commit `06de52674d`. llvm-svn: 363314	2019-06-13 19:25:06 +00:00
Cameron McInally	d331e71bdb	Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma4-fneg-combine.ll" This reverts commit `f288a0685f`. llvm-svn: 363313	2019-06-13 19:25:03 +00:00
Cameron McInally	31da4f80d5	Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma-scalar-combine.ll" This reverts commit `3d2ee0053a`. llvm-svn: 363312	2019-06-13 19:25:00 +00:00
Cameron McInally	d3eaa332e4	Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma-intrinsics-x86.ll" This reverts commit `169fc2b020`. llvm-svn: 363311	2019-06-13 19:24:57 +00:00
Cameron McInally	2aff82bfa6	Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma4-intrinsics-x86.ll" This reverts commit `66f286845c`. llvm-svn: 363310	2019-06-13 19:24:54 +00:00
Cameron McInally	0a3fe05047	Revert "[NFC][CodeGen] Add unary FNeg tests to some X86/ and XCore/ tests." This reverts commit `4f3cf3853e`. llvm-svn: 363309	2019-06-13 19:24:51 +00:00
Cameron McInally	a0d06a626f	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/fma-intrinsics-canonical.ll" This reverts commit `ee5881a88c`. llvm-svn: 363308	2019-06-13 19:24:47 +00:00
Cameron McInally	a37d925d3d	Revert "[NFC][CodeGen] Forgot 2 unary FNeg tests in X86/fma-intrinsics-canonical.ll" This reverts commit `5f39a3096f`. llvm-svn: 363307	2019-06-13 19:24:44 +00:00
Cameron McInally	e00198f7a8	Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma-fneg-combine.ll" This reverts commit `10c0855542`. llvm-svn: 363306	2019-06-13 19:24:41 +00:00
Cameron McInally	ea28a063fd	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/combine-fcopysign.ll X86/dag-fmf-cse.ll X86/fast-isel-fneg.ll X86/fdiv.ll" This reverts commit `e04c4b6af8`. llvm-svn: 363305	2019-06-13 19:24:38 +00:00
Cameron McInally	4890457196	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll X86/combine-fabs.ll" This reverts commit `6fe46ec25d`. llvm-svn: 363304	2019-06-13 19:24:34 +00:00
Cameron McInally	21a29a9e65	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll" This reverts commit `2aa5ada267`. llvm-svn: 363303	2019-06-13 19:24:31 +00:00
Cameron McInally	7d4e7efd2e	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll" This reverts commit `27a5db9de5`. llvm-svn: 363302	2019-06-13 19:24:28 +00:00
Cameron McInally	8608afa964	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll" This reverts commit `41e0b9f280`. llvm-svn: 363301	2019-06-13 19:24:24 +00:00
Cameron McInally	675be5db46	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll" This reverts commit `aeb89f8b33`. llvm-svn: 363300	2019-06-13 19:24:21 +00:00
Stanislav Mekhanoshin	335f9883f0	[AMDGPU] gfx1010: small test change for wave32. NFC llvm-svn: 363297	2019-06-13 19:05:04 +00:00
Sanjay Patel	5bf7f81aa8	[InstCombine] add test for failed libfunction prototype matching; NFC llvm-svn: 363291	2019-06-13 18:26:10 +00:00
Philip Reames	eb88badff9	Fix a bug w/inbounds invalidation in LFTR This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV. The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying. As such, our potential UB triggering use does not change the semantics of the original program. As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well. (Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.) Differential Revision: https://reviews.llvm.org/D62939 llvm-svn: 363289	2019-06-13 18:23:13 +00:00
Sanjay Patel	4d93fb528e	[InstCombine] auto-generate complete test checks; NFC llvm-svn: 363286	2019-06-13 18:14:49 +00:00
David Bolvansky	a9d8388e80	[NFC] Updated testcase for D54411/rL363284 llvm-svn: 363285	2019-06-13 18:13:03 +00:00
David Bolvansky	896ece41e4	[Codegen] Merge tail blocks with no successors after block placement Summary: I found the following case having tail blocks with no successors merging opportunities after block placement. Before block placement: bb0: ... bne a0, 0, bb2: bb1: mv a0, 1 ret bb2: ... bb3: mv a0, 1 ret bb4: mv a0, -1 ret The conditional branch bne in bb0 is opposite to beq. After block placement: bb0: ... beq a0, 0, bb1 bb2: ... bb4: mv a0, -1 ret bb1: mv a0, 1 ret bb3: mv a0, 1 ret After block placement, that appears new tail merging opportunity, bb1 and bb3 can be merged as one block. So the conditional constraint for merging tail blocks with no successors should be removed. In my experiment for RISC-V, it decreases code size. Author of original patch: Jim Lin Reviewers: haicheng, aheejin, craig.topper, rnk, RKSimon, Jim, dmgreen Reviewed By: Jim, dmgreen Subscribers: xbolva00, dschuff, javed.absar, sbc100, jgravelle-google, aheejin, kito-cheng, dmgreen, PkmX, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54411 llvm-svn: 363284	2019-06-13 18:11:32 +00:00
Stanislav Mekhanoshin	2bda177da0	[AMDGPU] ImmArg and SourceOfDivergence for permlane/dpp Added missing ImmArg and SourceOfDivergence to the crosslane intrinsics. Differential Revision: https://reviews.llvm.org/D63216 llvm-svn: 363276	2019-06-13 16:31:51 +00:00
Cameron McInally	aeb89f8b33	[NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll Patch 2 of n. llvm-svn: 363275	2019-06-13 15:54:20 +00:00
Joseph Tremoulet	3bc6e2a7aa	[EarlyCSE] Ensure equal keys have the same hash value Summary: The logic in EarlyCSE that looks through 'not' operations in the predicate recognizes e.g. that `select (not (cmp sgt X, Y)), X, Y` is equivalent to `select (cmp sgt X, Y), Y, X`. Without this change, however, only the latter is recognized as a form of `smin X, Y`, so the two expressions receive different hash codes. This leads to missed optimization opportunities when the quadratic probing for the two hashes doesn't happen to collide, and assertion failures when probing doesn't collide on insertion but does collide on a subsequent table grow operation. This change inverts the order of some of the pattern matching, checking first for the optional `not` and then for the min/max/abs patterns, so that e.g. both expressions above are recognized as a form of `smin X, Y`. It also adds an assertion to isEqual verifying that it implies equal hash codes; this fires when there's a collision during insertion, not just grow, and so will make it easier to notice if these functions fall out of sync again. A new flag --earlycse-debug-hash is added which can be used when changing the hash function; it forces hash collisions so that any pair of values inserted which compare as equal but hash differently will be caught by the isEqual assertion. Reviewers: spatel, nikic Reviewed By: spatel, nikic Subscribers: lebedev.ri, arsenm, craig.topper, efriedma, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62644 llvm-svn: 363274	2019-06-13 15:24:11 +00:00
Diogo N. Sampaio	0be2d25ecc	[FIX] Forces shrink wrapping to consider any memory access as aliasing with the stack Summary: Relate bug: https://bugs.llvm.org/show_bug.cgi?id=37472 The shrink wrapping pass prematurally restores the stack, at a point where the stack might still be accessed. Taking an exception can cause the stack to be corrupted. As a first approach, this patch is overly conservative, assuming that any instruction that may load or store could access the stack. Reviewers: dmgreen, qcolombet Reviewed By: qcolombet Subscribers: simpal01, efriedma, eli.friedman, javed.absar, llvm-commits, eugenis, chill, carwil, thegameg Tags: #llvm Differential Revision: https://reviews.llvm.org/D63152 llvm-svn: 363265	2019-06-13 13:56:19 +00:00
Simon Tatham	286e1d2c2d	[ARM] Set up infrastructure for MVE vector instructions. This commit prepares the way to start adding the main collection of MVE instructions, which operate on the 128-bit vector registers. The most obvious thing that's needed, and the simplest, is to add the MQPR register class, which is like the existing QPR except that it has fewer registers in it. The more complicated part: MVE defines a system of vector predication, in which instructions operating on 128-bit vector registers can be constrained to operate on only a subset of the lanes, using a system of prefix instructions similar to the existing Thumb IT, in that you have one prefix instruction which designates up to 4 following instructions as subject to predication, and within that sequence, the predicate can be inverted by means of T/E suffixes ('Then' / 'Else'). To support instructions of this type, we've added two new Tablegen classes `vpred_n` and `vpred_r` for standard clusters of MC operands to add to a predicated instruction. Both include a flag indicating how the instruction is predicated at all (options are T, E and 'not predicated'), and an input register field for the register controlling the set of active lanes. They differ from each other in that `vpred_r` also includes an input operand for the previous value of the output register, for instructions that leave inactive lanes unchanged. `vpred_n` lacks that extra operand; it will be used for instructions that don't preserve inactive lanes in their output register (either because inactive lanes are zeroed, as the MVE load instructions do, or because the output register isn't a vector at all). This commit also adds the family of prefix instructions themselves (VPT / VPST), and all the machinery needed to work with them in assembly and disassembly (e.g. generating the 't' and 'e' mnemonic suffixes on disassembled instructions within a predicated block) I've added a couple of demo instructions that derive from the new Tablegen base classes and use those two operand clusters. The bulk of the vector instructions will come in followup commits small enough to be manageable. (One exception is that I've added the full version of `isMnemonicVPTPredicable` in the AsmParser, because it seemed pointless to carefully split it up.) Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62669 llvm-svn: 363258	2019-06-13 13:11:13 +00:00
Jeremy Morse	bf2b2f08b0	[DebugInfo] Honour variable fragments in LiveDebugValues This patch makes the LiveDebugValues pass consider fragments when propagating DBG_VALUE insts between blocks, fixing PR41979. Fragment info for a variable location is added to the open-ranges key, which allows distinct fragments to be tracked separately. To handle overlapping fragments things become slightly funkier. To avoid excessive searching for overlaps in the data-flow part of LiveDebugValues, this patch: * Pre-computes pairings of fragments that overlap, for each DILocalVariable * During data-flow, whenever something happens that causes an open range to be terminated (via erase), any fragments pre-determined to overlap are also terminated. The effect of which is that when encountering a DBG_VALUE fragment that overlaps others, the overlapped fragments do not get propagated to other blocks. We still rely on later location-list building to correctly handle overlapping fragments within blocks. It's unclear whether a mixture of DBG_VALUEs with and without fragmented expressions are legitimate. To avoid suprises, this patch interprets a DBG_VALUE with no fragment as overlapping any DBG_VALUE _with_ a fragment. Differential Revision: https://reviews.llvm.org/D62904 llvm-svn: 363256	2019-06-13 12:51:57 +00:00
Dmitry Preobrazhensky	1fca3b1972	[AMDGPU][MC] Enabled constant expressions as operands of s_getreg/s_setreg See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61125 llvm-svn: 363255	2019-06-13 12:46:37 +00:00
Simon Pilgrim	a284f4fa7c	[X86][AVX] Add broadcast(v4f64 hadd) test llvm-svn: 363252	2019-06-13 11:42:32 +00:00
Simon Pilgrim	0baf136a4d	[X86][SSE] Avoid assert for broadcast(horiz-op()) cases for non-f64 cases. Based on fuzz test from @craig.topper llvm-svn: 363251	2019-06-13 11:26:21 +00:00
Simon Pilgrim	a6b87aa7ee	[X86][SSE] Add tests for underaligned nt stores Test both 'unaligned' (which we should scalarize) and 'subvector aligned' (which we should split) llvm-svn: 363249	2019-06-13 10:41:56 +00:00
Chris Jackson	7b39513302	[llvm-nm] Additional lit tests for command line options Differential Revision: https://reviews.llvm.org/D62955 llvm-svn: 363248	2019-06-13 10:39:36 +00:00
Simon Pilgrim	e1aea85896	[X86][SSE] Add SSE4A nt store tests on X86 as well as X64 We should be able to use MOVNTSD (f64) instead of MOVNTI (i32) to reduce the number of ops 32-bit targets Pulled out of D63246 llvm-svn: 363247	2019-06-13 10:30:12 +00:00
Jeremy Morse	181bf0cefb	[DebugInfo] Use FrameDestroy to extend stack locations to end-of-function We aim to ignore changes in variable locations during the prologue and epilogue of functions, to avoid using space documenting location changes that aren't visible. However in D61940 / r362951 this got ripped out as the previous implementation was unsound. Instead, use the FrameDestroy flag to identify when we're in the epilogue of a function, and ignore variable location changes accordingly. This fits in with existing code that examines the FrameSetup flag. Some variable locations get shuffled in modified tests as they now cover greater ranges, which is what would be expected. Some additional single-location variables are generated too. Two tests are un-xfailed, they were only xfailed due to r362951 deleting functionality they depended on. Apparently some out-of-tree backends don't accurately maintain FrameDestroy flags -- if you're an out-of-tree maintainer and see changes in variable locations disappear due to a faulty FrameDestroy flag, it's safe to back this change out. The impact is just slightly more debug info than necessary. Differential Revision: https://reviews.llvm.org/D62314 llvm-svn: 363245	2019-06-13 10:03:17 +00:00
Eugene Leviant	86b7f865ac	[llvm-objcopy] Implement IHEX reader This is the final part of IHEX format support in llvm-objcopy Differential revision: https://reviews.llvm.org/D62583 llvm-svn: 363243	2019-06-13 09:56:14 +00:00
Sander de Smalen	51c2fa0e2a	Improve reduction intrinsics by overloading result value. This patch uses the mechanism from D62995 to strengthen the definitions of the reduction intrinsics by letting the scalar result/accumulator type be overloaded from the vector element type. For example: ; The LLVM LangRef specifies that the scalar result must equal the ; vector element type, but this is not checked/enforced by LLVM. declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a) This patch changes that into: declare i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> %a) Which has the type-constraint more explicit and causes LLVM to check the result type with the vector element type. Reviewers: RKSimon, arsenm, rnk, greened, aemerson Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D62996 llvm-svn: 363240	2019-06-13 09:37:38 +00:00
Owen Reynolds	8d59f5370d	Revert [llvm-ar][test] Add to MRI test coverage This reverts 363232 due to mru-utf8.test buildbot test failure Differential Revision: https://reviews.llvm.org/D63197 llvm-svn: 363239	2019-06-13 09:02:33 +00:00
Sam Parker	9d28473a35	[ARM][TTI] Scan for existing loop intrinsics TTI should report that it's not profitable to generate a hardware loop if it, or one of its child loops, has already been converted. Differential Revision: https://reviews.llvm.org/D63212 llvm-svn: 363234	2019-06-13 08:28:46 +00:00
Owen Reynolds	02eac87ba3	[llvm-ar][test] Add to MRI test coverage This change adds tests to cover existing MRI script functionality. Differential Revision: https://reviews.llvm.org/D63197 llvm-svn: 363232	2019-06-13 07:45:12 +00:00
Craig Topper	b1daec0eae	[X86] Correct instruction operands in evex-to-vex-compress.mir to be closer to real instructions. $noreg was being used way more than it should have. We also had xmm registers in addressing modes. Mostly found by hacking the machine verifier to do some stricter checking that happened to work for this test, but not sure if generally applicable for other tests or other targets. llvm-svn: 363231	2019-06-13 07:11:02 +00:00
Shawn Landden	8b142bcc3f	[SimplifyCFG] reverting preliminary Switch patches again This reverts 363226 and 363227, both NFC intended I swear I fixed the test case that is failing, and ran the tests, but I will look into it again. llvm-svn: 363229	2019-06-13 05:26:17 +00:00
Shawn Landden	c54b2011bd	[SimplifyCFG] NFC, update Switch tests to better examine successive patches Also add baseline tests to show effect of later patches. There were a couple of regressions here that were never caught, but my patch set that this is a preparation to will fix them. Differential Revision: https://reviews.llvm.org/D61150 llvm-svn: 363226	2019-06-13 04:51:35 +00:00
Craig Topper	387acd64f3	[X86] Add tests for some the special cases in EVEX to VEX to the evex-to-vex-compress.mir test. llvm-svn: 363224	2019-06-13 04:10:08 +00:00
Shawn Landden	c6cba2957d	[SimplifyCFG] revert the last commit. I ran ALL the test suite locally, so I will look into this... llvm-svn: 363223	2019-06-13 02:47:47 +00:00
Shawn Landden	f93b99b2b6	[SimplifyCFG] NFC, update Switch tests to HEAD so I can see if my changes change anything Also add baseline tests to show effect of later patches. Differential Revision: https://reviews.llvm.org/D61150 llvm-svn: 363222	2019-06-13 02:24:24 +00:00
David L. Jones	c73fadaa84	Revert r361811: 'Re-commit r357452 (take 2): "SimplifyCFG SinkCommonCodeFromPredecessors ...' We have observed some failures with internal builds with this revision. - Performance regressions: - llvm's SingleSource/Misc evalloop shows performance regressions (although these may be red herrings). - Benchmarks for Abseil's SwissTable. - Correctness: - Failures for particular libicu tests when building the Google AppEngine SDK (for PHP). hwennborg has already been notified, and is aware of reproducer failures. llvm-svn: 363220	2019-06-13 02:04:45 +00:00
Dinar Temirbulatov	b2f45ba1e8	[SLP] Update propagate_ir_flags.ll test to check that we do retain the common subset, NFC. llvm-svn: 363218	2019-06-13 00:19:50 +00:00
Philip Reames	0bded8442f	[Tests] Highlight impact of multiple exit LFTR (D62625) as requested by reviewer llvm-svn: 363217	2019-06-12 23:39:49 +00:00
Cameron McInally	41e0b9f280	[NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll Patch 1 of n. llvm-svn: 363215	2019-06-12 22:50:44 +00:00
Sanjay Patel	a1421e8347	[x86] add tests for vector shifts; NFC llvm-svn: 363203	2019-06-12 21:30:06 +00:00
Cameron McInally	27a5db9de5	[NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll Patch 3 of 3 for X86/avx512vl-intrinsics-fast-isel.ll llvm-svn: 363200	2019-06-12 20:56:59 +00:00
Jordan Rupprecht	565f1e2298	[llvm-readobj] Fix output interleaving issue caused by using multiple streams at the same time. Summary: Use llvm::fouts() as the default stream for outputing. No new stream should be constructed to output at the same time. https://bugs.llvm.org/show_bug.cgi?id=42140 Reviewers: jhenderson, grimar, MaskRay, phosek, rupprecht Reviewed By: rupprecht Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63115 Patch by Yuanfang Chen! llvm-svn: 363198	2019-06-12 20:16:22 +00:00
Cameron McInally	2aa5ada267	[NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll Patch 2 of 3 for X86/avx512vl-intrinsics-fast-isel.ll llvm-svn: 363194	2019-06-12 19:39:42 +00:00
Philip Reames	00e481b75d	[Tests] Autogen RLEV test and add tests for a future enhancement llvm-svn: 363193	2019-06-12 19:23:10 +00:00
Philip Reames	851adc000c	[Tests] Add tests to highlight sibling loop optimization order issue for exit rewriting The issue addressed in r363180 is more broadly relevant. For the moment, we don't actually get any of these cases because we a) restrict SCEV formation due to SCEExpander needing to preserve LCSSA, and b) don't iterate between loops. llvm-svn: 363192	2019-06-12 19:04:51 +00:00
Stanislav Mekhanoshin	000f9cc62a	[AMDGPU] more gfx1010 tests. NFC. llvm-svn: 363190	2019-06-12 18:44:11 +00:00
Jordan Rupprecht	146a154e61	[llvm-ar][test] Relax lit directory assumptions in thin-archive.test Summary: thin-archive.test assumes the Output/<testname> structure that lit creates. Rewrite the test in a way that still tests the same thing (creating via relative path and adding via absolute path) but doesn't assume this specific lit structure, making it possible to run in a lit emulator. Reviewers: gbreynoo Reviewed By: gbreynoo Subscribers: llvm-commits, bkramer Tags: #llvm Differential Revision: https://reviews.llvm.org/D62930 llvm-svn: 363189	2019-06-12 18:41:27 +00:00
Stanislav Mekhanoshin	245b5ba344	[AMDGPU] gfx1010 dpp16 and dpp8 Differential Revision: https://reviews.llvm.org/D63203 llvm-svn: 363186	2019-06-12 18:02:41 +00:00
Stanislav Mekhanoshin	5f581c9f08	[AMDGPU] gfx1010 premlane instructions Differential Revision: https://reviews.llvm.org/D63202 llvm-svn: 363185	2019-06-12 17:52:51 +00:00
Simon Atanasyan	efc0d1a298	[Mips] Add s.d instruction alias for Mips1 Add support for s.d instruction for Mips1 which expands into two swc1 instructions. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D63199 llvm-svn: 363184	2019-06-12 17:52:05 +00:00
Simon Pilgrim	ef7d4fbe80	[X86][SSE] Avoid unnecessary stack codegen in NT merge-consecutive-stores codegen tests. llvm-svn: 363181	2019-06-12 17:28:48 +00:00
Philip Reames	e51c3d8b82	[SCEV] Teach computeSCEVAtScope benefit from one-input Phi. PR39673 SCEV does not propagate arguments through one-input Phis so as to make it easy for the SCEV expander (and related code) to preserve LCSSA. It's not entirely clear this restriction is neccessary, but for the moment it exists. For this reason, we don't analyze single-entry phi inputs. However it is possible that when an this input leaves the loop through LCSSA Phi, it is a provable constant. Missing that results in an order of optimization issue in loop exit value rewriting where we miss some oppurtunities based on order in which we visit sibling loops. This patch teaches computeSCEVAtScope about this case. We can generalize it later, but so far we can only replace LCSSA Phis with their constant loop-exiting values. We should probably also add similiar logic directly in the SCEV construction path itself. Patch by: mkazantsev (with revised commit message by me) Differential Revision: https://reviews.llvm.org/D58113 llvm-svn: 363180	2019-06-12 17:21:47 +00:00
Simon Pilgrim	5b0e0dd709	[X86][AVX] Fold concat(vpermilps(x,c),vpermilps(y,c)) -> vpermilps(concat(x,y),c) Handles PSHUFD/PSHUFLW/PSHUFHW (AVX2) + VPERMILPS (AVX1). An extra AVX1 PSHUFD->VPERMILPS combine will be added in a future commit. llvm-svn: 363178	2019-06-12 16:38:20 +00:00
Sanjay Patel	64006896ac	[InstCombine] add tests for fmin/fmax libcalls; NFC llvm-svn: 363175	2019-06-12 15:29:40 +00:00
Sam Parker	3d42959dd8	Revert rL363156. The patch was to fix buildbots, but rL363157 should now be fixing it in a cleaner way. llvm-svn: 363174	2019-06-12 15:28:00 +00:00
David Bolvansky	48365ec3e1	[NFC[ Updated tests for D54411 llvm-svn: 363173	2019-06-12 15:01:36 +00:00
Matt Arsenault	f29366b1f5	StackProtector: Use PointerMayBeCaptured This was using its own, outdated list of possible captures. This was at minimum not catching cmpxchg and addrspacecast captures. One change is now any volatile access is treated as capturing. The test coverage for this pass is quite inadequate, but this required removing volatile in the lifetime capture test. Also fixes some infrastructure issues to allow running just the IR pass. Fixes bug 42238. llvm-svn: 363169	2019-06-12 14:23:33 +00:00
Matt Arsenault	61f6395fd0	AMDGPU/GlobalISel: Fix using illegal situations in tests These were using illegal copies as the side effecting use, so make them legal. llvm-svn: 363168	2019-06-12 14:23:28 +00:00
Matt Arsenault	aa6bdf9dcd	LoopVersioning: Respect convergent This changes the standalone pass only. Arguably the utility class itself should assert there are no convergent calls. However, a target pass with additional context may still be able to version a loop if all of the dynamic conditions are sufficiently uniform. llvm-svn: 363165	2019-06-12 14:05:58 +00:00
Anton Afanasyev	339b39b773	[MIR] Skip hoisting to basic block which may throw exception or return Summary: Fix hoisting to basic block which are not legal for hoisting cause it can be terminated by exception or it is return block. Reviewers: john.brawn, RKSimon, MatzeB Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63148 llvm-svn: 363164	2019-06-12 13:51:44 +00:00
Sanjay Patel	082a41994a	[InstCombine] add tests for fcmp+select with FMF (minnum/maxnum); NFC llvm-svn: 363163	2019-06-12 13:51:33 +00:00
Matt Arsenault	86325be3d7	LoopLoadElim: Respect convergent llvm-svn: 363162	2019-06-12 13:50:47 +00:00
Jeremy Morse	e2f94974df	[DebugInfo] Add a test that fell out of an earlier commit r362951 was supposed to contain this test, however it didn't get committed due to operator error. This was originally part of D59431. llvm-svn: 363161	2019-06-12 13:41:56 +00:00
Matt Arsenault	2466ba97bc	LoopDistribute/LAA: Respect convergent This case is slightly tricky, because loop distribution should be allowed in some cases, and not others. As long as runtime dependency checks don't need to be introduced, this should be OK. This is further complicated by the fact that LoopDistribute partially ignores if LAA says that vectorization is safe, and then does its own runtime pointer legality checks. Note this pass still does not handle noduplicate correctly, as this should always be forbidden with it. I'm not going to bother trying to fix it, as it would require more effort and I think noduplicate should be removed. https://reviews.llvm.org/D62607 llvm-svn: 363160	2019-06-12 13:34:19 +00:00
Matt Arsenault	1e21181aee	LoopDistribute/LAA: Add tests to catch regressions I broke 2 of these with a patch, but were not covered by existing tests. https://reviews.llvm.org/D63035 llvm-svn: 363158	2019-06-12 13:15:59 +00:00
Sam Parker	52d7326f32	[NFC] Add HardwareLoops lit.local.cfg file Set Transforms/HardwareLoops/ARM/ tests as unsupported if there isn't an arm target. llvm-svn: 363157	2019-06-12 12:54:19 +00:00
Sam Parker	ece316b56a	Attempt to fix non-Arm buildbots Adding REQUIRES: arm to failing tests llvm-svn: 363156	2019-06-12 12:47:35 +00:00
Simon Pilgrim	a4db4bb023	[X86][AVX] Tests showing missing concat(shuffle,shuffle) -> shuffle(concat) folds. NFCI. llvm-svn: 363153	2019-06-12 12:40:03 +00:00
Sam Parker	757ac02dc8	[ARM] Implement TTI::isHardwareLoopProfitable Implement the backend target hook to drive the HardwareLoops pass. The low-overhead branch extension for Arm M-class cores is flexible enough that we don't have to ensure correctness at this point, except checking that the loop counter variable can be stored in LR - a 32-bit register. For it to be profitable, we want to avoid loops that contain function calls, or any other instruction that alters the PC. This implementation uses TargetLoweringInfo, to query type and operation actions, looks at intrinsic calls and also performs some manual checks for remainder/division and FP operations. I think this should be a good base to start and extra details can be filled out later. Differential Revision: https://reviews.llvm.org/D62907 llvm-svn: 363149	2019-06-12 12:00:42 +00:00
Ben Dunbobbin	564d248ec2	[ThinLTO]LTO]Legacy] Fix dependent libraries support by adding querying of the IRSymtab Dependent libraries support for the legacy api was committed in a broken state (see: https://reviews.llvm.org/D60274). This was missed due to the painful nature of having to integrate the changes into a linker in order to test. This change implements support for dependent libraries in the legacy LTO api: - I have removed the current api function, which returns a single string, and added functions to access each dependent library specifier individually. - To reduce the testing pain, I have made the api functions as thin as possible to maximize coverage from llvm-lto. - When doing ThinLTO the system linker will load the modules lazily when scanning the input files. Unfortunately, when modules are lazily loaded there is no access to module level named metadata. To fix this I have added api functions that allow querying the IRSymtab for the dependent libraries. I hope to expand the api in the future so that, eventually, all the information needed by a client linker during scan can be retrieved from the IRSymtab. Differential Revision: https://reviews.llvm.org/D62935 llvm-svn: 363140	2019-06-12 11:07:56 +00:00
Orlando Cazalet-Hyams	a947156396	Revert "[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion" This reverts commit `1a0f7a2077`. See phabricator thread for D60831. llvm-svn: 363132	2019-06-12 08:34:51 +00:00
Dylan McKay	f8b4e60c7f	[AVR] Fix the 'avr-tiny.ll' and 'avr25.ll' subtarget feature tests When these tests were originally written, the middle end would introduce an unnecessary copy from r24:r23->GPR16->r24:r23, and these tests mistakenly relied on it. The most optimal codegen for the functions in the test cases before this patch would be NOPs. This is because the first i16 argument always gets the same register allocation as an i16 return value in the AVR calling convention. These tests broke in r362963 when the codegen was improved and the redundant copy was eliminated. After this, the test functions were lowered to their optimal form - a 'ret' and nothing else. This patch prepends an extra i16 operand to each of the test functions so that a 16-bit copy must be inserted for the program to be correct. llvm-svn: 363131	2019-06-12 08:31:07 +00:00
Sjoerd Meijer	de73404b8c	[AArch64] Merge globals when optimising for size Extern global merging is good for code-size. There's definitely potential for performance too, but there's one regression in a benchmark that needs investigating, so that's why we enable it only when we optimise for size for now. Patch by Ramakota Reddy and Sjoerd Meijer. Differential Revision: https://reviews.llvm.org/D61947 llvm-svn: 363130	2019-06-12 08:28:35 +00:00
Alex Bradbury	aa6f2af4e6	[RISCV] Fix inline-asm.ll test by adding nounwind attribute This test failed since CFI directive support was added in r361320. llvm-svn: 363123	2019-06-12 05:32:30 +00:00
Hsiangkai Wang	04ddf39b44	[RISCV] Add CFI directives for RISCV prologue/epilog. In order to generate correct debug frame information, it needs to generate CFI information in prologue and epilog. Differential Revision: https://reviews.llvm.org/D61773 llvm-svn: 363120	2019-06-12 03:04:22 +00:00
Kai Luo	8faff5606e	[PowerPC][NFC] Added test for sext/shl combination after isel. llvm-svn: 363118	2019-06-12 02:45:27 +00:00
Cameron McInally	6fe46ec25d	[NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll X86/combine-fabs.ll X86/avx512vl-intrinsics-fast-isel.ll is only partially complete. llvm-svn: 363114	2019-06-12 00:18:54 +00:00
Philip Reames	082cd30327	Generalize icmp matching in IndVars' eliminateTrunc We were only matching RHS being a loop invariant value, not the inverse. Since there's nothing which appears to canonicalize loop invariant values to RHS, this means we missed cases. Differential Revision: https://reviews.llvm.org/D63112 llvm-svn: 363108	2019-06-11 22:43:25 +00:00
Jinsong Ji	898d481174	[PowerPC][NFC]Remove sms-simple.ll test temporarily. Looks like a MachinePipeliner algorithm problem found by sanitizer-x86_64-linux-fast. I will backout this test first while investigating the problem to unblock buildbot. ==49637==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x614000002e08 at pc 0x000004364350 bp 0x7ffe228a3bd0 sp 0x7ffe228a3bc8 READ of size 4 at 0x614000002e08 thread T0 #0 0x436434f in llvm::SwingSchedulerDAG::checkValidNodeOrder(llvm::SmallVector<llvm::NodeSet, 8u> const&) const /b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:3736:11 #1 0x4342cd0 in llvm::SwingSchedulerDAG::schedule() /b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:486:3 #2 0x434042d in llvm::MachinePipeliner::swingModuloScheduler(llvm::MachineLoop&) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:385:7 #3 0x433eb90 in llvm::MachinePipeliner::runOnMachineFunction(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:207:5 #4 0x428b7ea in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachineFunctionPass.cpp:73:13 #5 0x4d1a913 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1648:27 #6 0x4d1b192 in llvm::FPPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1685:16 #7 0x4d1c06d in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1752:27 #8 0x4d1c06d in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1865 #9 0xa48ca3 in compileModule(char**, llvm::LLVMContext&) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/llc/llc.cpp:611:8 #10 0xa4270f in main /b/sanitizer-x86_64-linux-fast/build/llvm/tools/llc/llc.cpp:365:22 #11 0x7fec902572e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) #12 0x971b69 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan/bin/llc+0x971b69) llvm-svn: 363105	2019-06-11 22:09:33 +00:00
Amara Emerson	d133c15925	[GlobalISel] Add a G_JUMP_TABLE opcode. This opcode generates a pointer to the address of the jump table specified by the source operand, which is a jump table index. It will be used in conjunction with an upcoming G_BRJT opcode to support jump table codegen with GlobalISel. Differential Revision: https://reviews.llvm.org/D63111 llvm-svn: 363096	2019-06-11 19:58:06 +00:00
Alina Sbirlea	cb4ed8a7bc	[MemorySSA] When applying updates, clean unnecessary Phis. Summary: After applying a set of insert updates, there may be trivial Phis left over. Clean them up. Reviewers: george.burgess.iv Subscribers: jlebar, Prazek, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63033 llvm-svn: 363094	2019-06-11 19:09:34 +00:00
Cameron McInally	e04c4b6af8	[NFC][CodeGen] Add unary FNeg tests to X86/combine-fcopysign.ll X86/dag-fmf-cse.ll X86/fast-isel-fneg.ll X86/fdiv.ll llvm-svn: 363093	2019-06-11 18:55:13 +00:00
Jinsong Ji	ef2d6d99c0	[PowerPC] Enable MachinePipeliner for P9 with -ppc-enable-pipeliner Implement necessary target hooks to enable MachinePipeliner for P9 only. The pass is off by default, can be enabled with -ppc-enable-pipeliner for P9. Differential Revision: https://reviews.llvm.org/D62164 llvm-svn: 363085	2019-06-11 17:40:39 +00:00
Cameron McInally	10c0855542	[NFC][CodeGen] Add unary fneg tests to X86/fma-fneg-combine.ll llvm-svn: 363084	2019-06-11 17:05:36 +00:00
Cameron McInally	08200d6d26	[InstCombine] Handle -(X-Y) --> (Y-X) for unary fneg when NSZ Differential Revision: https://reviews.llvm.org/D62612 llvm-svn: 363082	2019-06-11 16:21:21 +00:00
Cameron McInally	796de11331	[InstCombine] Update fptrunc (fneg x)) -> (fneg (fptrunc x) for unary FNeg Differential Revision: https://reviews.llvm.org/D62629 llvm-svn: 363080	2019-06-11 15:45:41 +00:00
Simon Pilgrim	f370831885	[X86] Regenerate CmpISel test for future patch llvm-svn: 363077	2019-06-11 15:13:11 +00:00
Ilya Biryukov	b37ccc5fec	[ARM] Fix a typo in the test from r363039 llvm-svn: 363063	2019-06-11 13:36:06 +00:00
Lewis Revill	a5240361dd	[RISCV] Add lowering of addressing sequences for PIC This patch allows lowering of PIC addresses by using PC-relative addressing for DSO-local symbols and accessing the address through the global offset table for non-DSO-local symbols. Differential Revision: https://reviews.llvm.org/D55303 llvm-svn: 363058	2019-06-11 12:57:47 +00:00
Lewis Revill	6970755c58	[RISCV][NFC] Add missing test file for D54093 llvm-svn: 363057	2019-06-11 12:52:05 +00:00
Lewis Revill	28a5cadb3a	[RISCV] Lower inline asm constraints I, J & K for RISC-V This validates and lowers arguments to inline asm nodes which have the constraints I, J & K, with the following semantics (equivalent to GCC): I: Any 12-bit signed immediate. J: Immediate integer zero only. K: Any 5-bit unsigned immediate. Differential Revision: https://reviews.llvm.org/D54093 llvm-svn: 363054	2019-06-11 12:42:13 +00:00
Mikhail Maltsev	7bd5c55cad	[ARM] First MVE instructions: scalar shifts. This introduces a new decoding table for MVE instructions, and starts by adding the family of scalar shift instructions that are part of the MVE architecture extension: saturating shifts within a single GPR, and long shifts across a pair of GPRs (both saturating and normal). Some of these shift instructions have only 3-bit register fields in the encoding, with the low bit fixed. So they can only address an odd or even numbered GPR (depending on the operand), and therefore I add two new register classes, GPREven and GPROdd. Differential Revision: https://reviews.llvm.org/D62668 Change-Id: Iad95d5f83d26aef70c674027a184a6b1e0098d33 llvm-svn: 363051	2019-06-11 12:04:32 +00:00
Nico Weber	dd6019526d	Let writeWindowsResourceCOFF() take a TimeStamp parameter For lld, pass in Config->Timestamp (which is set based on lld's /timestamp: and /Brepro flags). Since the writeWindowsResourceCOFF() data is only used in-memory by LLD and the obj's timestamp isn't used for anything in the output, this doesn't change behavior. For llvm-cvtres, add an optional /timestamp: parameter, and use the current behavior of calling time() if the parameter is not passed in. This doesn't really change observable behavior (unless someone passes /timestamp: to llvm-cvtres, which wasn't possible before), but it removes the last unqualified call to time() from llvm/lib, which seems like a good thing. Differential Revision: https://reviews.llvm.org/D63116 llvm-svn: 363050	2019-06-11 11:26:50 +00:00
David Bolvansky	bc888f059d	[NFC] Fixed arm/aarch64 test llvm-svn: 363049	2019-06-11 11:09:25 +00:00
Orlando Cazalet-Hyams	1a0f7a2077	[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. I have set up a separate review D61933 for a fix which is required for this patch. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse Reviewed By: hfinkel, jmorse Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 llvm-svn: 363046	2019-06-11 10:37:20 +00:00
George Rimar	fc7b911313	[llvm-readobj] - Do not use precompiled binary in elf-broken-dynsym-link.test Now we can remove the "TODO" since https://bugs.llvm.org/show_bug.cgi?id=42216 was fixed. llvm-svn: 363045	2019-06-11 10:28:15 +00:00
James Henderson	d5f38dae59	[llvm-dwarfdump] Add -o to help text and remove --out-file from doc -o is in the documentation, but not in the llvm-dwarfdump help text. This patch adds it by inverting the -o and --out-file aliasing. It also removes --out-file from the documentation, since we don't really want people to be using this switch in practice. Reviewed by: aprantl, JDevlieghere, dblaikie Differential Revision: https://reviews.llvm.org/D63013 llvm-svn: 363044	2019-06-11 10:20:07 +00:00
George Rimar	ffb3c72a74	[yaml2elf] - Check we are able to set custom sh_link for .symtab/.dynsym Allow using both custom numeric and string values for Link field of the dynamic and regular symbol tables. Differential revision: https://reviews.llvm.org/D63077 llvm-svn: 363042	2019-06-11 10:00:51 +00:00
Simon Pilgrim	287e78c82b	[DAGCombine] GetNegatedExpression - constant float vector support (PR42105) Add support for negation of constant build vectors. Differential Revision: https://reviews.llvm.org/D62963 llvm-svn: 363040	2019-06-11 09:44:33 +00:00
Simon Tatham	8c865cacda	[ARM] Add the non-MVE instructions in Arm v8.1-M. This adds support for the new family of conditional selection / increment / negation instructions; the low-overhead branch instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole list of registers at once; the new VMRS/VMSR and VLDR/VSTR instructions to get data in and out of 8.1-M system registers, particularly including the new VPR register used by MVE vector predication. To support this, we also add a register name 'zr' (used by the CSEL family to force one of the inputs to the constant 0), and operand types for lists of registers that are also allowed to include APSR or VPR (used by CLRM). The VLDR/VSTR instructions also need a new addressing mode. The low-overhead branch instructions exist in their own separate architecture extension, which we treat as enabled by default, but you can say -mattr=-lob or equivalent to turn it off. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Reviewed By: samparker Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62667 llvm-svn: 363039	2019-06-11 09:29:18 +00:00
Sander de Smalen	cbeb563cfb	Change semantics of fadd/fmul vector reductions. This patch changes how LLVM handles the accumulator/start value in the reduction, by never ignoring it regardless of the presence of fast-math flags on callsites. This change introduces the following new intrinsics to replace the existing ones: llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul and adds functionality to auto-upgrade existing LLVM IR and bitcode. Reviewers: RKSimon, greened, dmgreen, nikic, simoll, aemerson Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D60261 llvm-svn: 363035	2019-06-11 08:22:10 +00:00
Craig Topper	627d8168e7	[X86] Add load folding isel patterns to scalar_math_patterns and AVX512_scalar_math_fp_patterns. Also add a FIXME for the peephole pass not being able to handle this. llvm-svn: 363032	2019-06-11 04:30:53 +00:00
Peter Collingbourne	e5bdedac9d	Symbolize: Make DWPName a symbolizer option instead of an argument to symbolize{,Inlined}Code. This makes the interface simpler and more consistent with the interface for .dSYM files and fixes a bug where llvm-symbolizer would not read the dwp if it was asked to symbolize data before symbolizing code. Differential Revision: https://reviews.llvm.org/D63114 llvm-svn: 363025	2019-06-11 02:32:27 +00:00
Matt Arsenault	c5830f5f05	AtomicExpand: Don't crash on non-0 alloca This now produces garbage on AMDGPU with a call to an nonexistent, anonymous libcall but won't assert. llvm-svn: 363022	2019-06-11 01:35:07 +00:00
Matt Arsenault	383e72fcfe	AMDGPU: Expand < 32-bit atomics Also fix AtomicExpand asserting on atomicrmw fadd/fsub. llvm-svn: 363021	2019-06-11 01:35:00 +00:00
Nico Weber	b941fa8821	llvm-lib: Implement /machine: argument And share some code with lld-link. While here, also add a FIXME about PR42180 and merge r360150 to llvm-lib. Differential Revision: https://reviews.llvm.org/D63021 llvm-svn: 363016	2019-06-11 01:13:41 +00:00
Philip Reames	efb14f9005	[Tests] Adjust LFTR dead-iv tests to bypass undef cases As pointed out by Nikita in review, undef and poison need to be handled separately. Since we're no longer expecting any test improvements - just fixes for miscompiles - update the tests to bypass the existing undef check. llvm-svn: 363002	2019-06-10 23:17:10 +00:00
Cameron McInally	5f39a3096f	[NFC][CodeGen] Forgot 2 unary FNeg tests in X86/fma-intrinsics-canonical.ll Follow-up to r362999. llvm-svn: 363001	2019-06-10 23:02:36 +00:00
Cameron McInally	ee5881a88c	[NFC][CodeGen] Add unary FNeg tests to X86/fma-intrinsics-canonical.ll llvm-svn: 362999	2019-06-10 22:45:54 +00:00
Rong Xu	e44fa83c37	[PGO] Handle cases of non-instrument BBs As shown in PR41279, some basic blocks (such as catchswitch) cannot be instrumented. This patch filters out these BBs in PGO instrumentation. It also sets the profile count to the fail-to-instrument edge, so that we can propagate the counts in the CFG. Differential Revision: https://reviews.llvm.org/D62700 llvm-svn: 362995	2019-06-10 22:36:27 +00:00
Philip Reames	1d322ccaac	[Tests] Split an LFTR dead-iv case There are two interesting sub-cases here. 1) Switching IVs is legal, but only in pre-increment form. and 2) Switching IVs is legal, and so is post-increment form. llvm-svn: 362993	2019-06-10 22:33:20 +00:00
Jessica Paquette	b22954384e	[GlobalISel] Translate memset/memmove/memcpy from undef ptrs into nops If the source is undef, then just don't do anything. This matches SelectionDAG's behaviour in SelectionDAG.cpp. Also add a test showing that we do the right thing here. (irtranslator-memfunc-undef.ll) Differential Revision: https://reviews.llvm.org/D63095 llvm-svn: 362989	2019-06-10 21:53:56 +00:00
Cameron McInally	4f3cf3853e	[NFC][CodeGen] Add unary FNeg tests to some X86/ and XCore/ tests. llvm-svn: 362987	2019-06-10 21:31:59 +00:00
Philip Reames	78c0d75697	[Tests] Add tests for D62939 (miscompiles around dead pointer IVs) Flesh out a collection of tests for switching to a dead IV within LFTR, both for the current miscompile, and for some cases which we should be able to handle via simple reasoning. llvm-svn: 362976	2019-06-10 19:45:59 +00:00
Philip Reames	a9633d5f0b	[LFTR] Use recomputed BE count This was discussed as part of D62880. The basic thought is that computing BE taken count after widening should produce (on average) an equally good backedge taken count as the one before widening. Since there's only one test in the suite which is impacted by this change, and it's essentially equivelent codegen, that seems to be a reasonable assertion. This change was separated from r362971 so that if this turns out to be problematic, the triggering piece is obvious and easily revertable. For the nestedIV example from elim-extend.ll, we end up with the following BE counts: BEFORE: (-2 + (-1 * %innercount) + %limit) AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>) Note that before is an i32 type, and the after is an i64. Truncating the i64 produces the i32. llvm-svn: 362975	2019-06-10 19:18:53 +00:00
Jinsong Ji	9c7f93e914	[PowerPC][HTM]Fix $zero is not a GPRC register for builtin_ttest This was found during HTM cleanup. Adding a test for builtin_ttest would expose following issue. * Bad machine code: Illegal physical register for instruction * - function: test10 - basic block: %bb.0 entry (0xf0e57497b58) - instruction: %5:crrc0 = TABORTWCI 0, $zero, 0 - operand 2: $zero $zero is not a GPRC register. LLVM ERROR: Found 1 machine code errors. Differential Revision: https://reviews.llvm.org/D63079 llvm-svn: 362974	2019-06-10 19:04:14 +00:00
Jordan Rupprecht	f8f9d65f85	[llvm-objcopy] Fix SHT_GROUP ordering. Summary: When llvm-objcopy sorts sections during finalization, it only sorts based on the offset, which can cause the group section to come after the sections it contains. This causes link failures when using gold to link objects created by llvm-objcopy. Fix this for now by copying GNU objcopy's behavior of placing SHT_GROUP sections first. In the future, we may want to remove this sorting entirely to more closely preserve the input file layout. This fixes https://bugs.llvm.org/show_bug.cgi?id=42052. Reviewers: jakehehrlich, jhenderson, MaskRay, espindola, alexshap Reviewed By: MaskRay Subscribers: phuongtrang148993, emaste, arichardson, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62620 llvm-svn: 362973	2019-06-10 18:35:01 +00:00
Wolfgang Pieb	54cbae1e8d	[ELF][llvm-objdump] Treat dynamic tag values as virtual addresses instead of offsets The ELF gABI requires the tag values of DT_REL, DT_RELA and DT_JMPREL to be treated as virtual addresses. They were treated as offsets. Fixes PR41832. Differential Revision: https://reviews.llvm.org/D62972 llvm-svn: 362969	2019-06-10 17:50:24 +00:00
Andrea Di Biagio	c650a9084f	[llvm-mca] Enable bottleneck analysis when flag -all-views is specified. Bottleneck Analysis is one of the many views available in llvm-mca. Therefore, it should be enabled when flag -all-views is passed in input to the tool. llvm-svn: 362964	2019-06-10 16:56:25 +00:00
Francis Visoiu Mistrih	a438432acc	[FastISel] Skip creating unnecessary vregs for arguments This behavior was added in r130928 for both FastISel and SD, and then disabled in r131156 for FastISel. This re-enables it for FastISel with the corresponding fix. This is triggered only when FastISel can't lower the arguments and falls back to SelectionDAG for it. FastISel contains a map of "register fixups" where at the end of the selection phase it replaces all uses of a register with another register that FastISel sometimes pre-assigned. Code at the end of SelectionDAGISel::runOnMachineFunction is doing the replacement at the very end of the function, while other pieces that come in before that look through the MachineFunction and assume everything is done. In this case, the real issue is that the code emitting COPY instructions for the liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg assigned to the physreg is used, and if it's not, it will skip the COPY. If a register wasn't replaced with its assigned fixup yet, the copy will be skipped and we'll end up with uses of undefined registers. This fix moves the replacement of registers before the emission of copies for the live-ins. The initial motivation for this fix is to enable tail calls for swiftself functions, which were blocked because we couldn't prove that the swiftself argument (which is callee-save) comes from a function argument (live-in), because there was an extra copy (vreg to vreg). A few tests are affected by this: * llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21 (callee-save) but never reload it because it's attached to the return. We now don't even spill it anymore. * llvm/test/CodeGen//swiftself.ll: we tail-call now. llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this test was not really testing the right thing, but it worked because the same registers were re-used. * llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes * llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy * llvm/test/CodeGen/Mips/: get rid of spills and copies llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack * llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack * llvm/test/CodeGen/X86/swifterror.ll: same as AArch64 * llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed Differential Revision: https://reviews.llvm.org/D62361 llvm-svn: 362963	2019-06-10 16:53:37 +00:00
Piotr Sobczak	9b11e93d90	[AMDGPU] Optimize image_[load\|store]_mip Summary: Replace image_load_mip/image_store_mip with image_load/image_store if lod is 0. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63073 llvm-svn: 362957	2019-06-10 15:58:51 +00:00
Simon Tatham	67065c5c70	Revert rL362953 and its followup rL362955. These caused a build failure because I managed not to notice they depended on a later unpushed commit in my current stack. Sorry about that. llvm-svn: 362956	2019-06-10 15:58:19 +00:00
Simon Tatham	42078d41d5	[ARM] Add the non-MVE instructions in Arm v8.1-M. This should have been part of r362953, but I had a finger-trouble incident and committed the old rather than new version of the patch. Sorry. llvm-svn: 362955	2019-06-10 15:41:58 +00:00
Sanjay Patel	9650c95b7e	[InstCombine] allow unordered preds when canonicalizing to fabs() We have a known-never-nan value via 'nnan', so an unordered predicate is the same as its ordered sibling. Similar to: rL362937 llvm-svn: 362954	2019-06-10 15:39:00 +00:00
Simon Tatham	baeea91933	[ARM] Add the non-MVE instructions in Arm v8.1-M. This adds support for the new family of conditional selection / increment / negation instructions; the low-overhead branch instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole list of registers at once; the new VMRS/VMSR and VLDR/VSTR instructions to get data in and out of 8.1-M system registers, particularly including the new VPR register used by MVE vector predication. To support this, we also add a register name 'zr' (used by the CSEL family to force one of the inputs to the constant 0), and operand types for lists of registers that are also allowed to include APSR or VPR (used by CLRM). The VLDR/VSTR instructions also need some new addressing modes. The low-overhead branch instructions exist in their own separate architecture extension, which we treat as enabled by default, but you can say -mattr=-lob or equivalent to turn it off. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Reviewed By: samparker Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62667 llvm-svn: 362953	2019-06-10 15:36:34 +00:00
Whitney Tsang	05bf5f9328	[DA] Add an option to control delinearization validity checks Summary: Dependence Analysis performs static checks to confirm validity of delinearization. These checks often fail for 64-bit targets due to type conversions and integer wrapping that prevent simplification of the SCEV expressions. These checks would also fail at compile-time if the lower bound of the loops are compile-time unknown. Author: bmahjour Reviewer: Meinersbur, jdoerfert, kbarton, dmgreen, fhahn Reviewed By: Meinersbur, jdoerfert, dmgreen Subscribers: fhahn, hiraditya, javed.absar, llvm-commits, Whitney, etiotto Tag: LLVM Differential Revision: https://reviews.llvm.org/D62610 llvm-svn: 362952	2019-06-10 15:29:07 +00:00
Jeremy Morse	bcff417292	[DebugInfo] Terminate all location-lists at end of block This commit reapplies r359426 (which was reverted in r360301 due to performance problems) and rolls in D61940 to address the performance problem. I've combined the two to avoid creating a span of slow-performance, and to ease reverting if more problems crop up. The summary of D61940: This patch removes the "ChangingRegs" facility in DbgEntityHistoryCalculator, as its overapproximate nature can produce incorrect variable locations. An unchanging register doesn't mean a variable doesn't change its location. The patch kills off everything that calculates the ChangingRegs vector. Previously ChangingRegs spotted epilogues and marked registers as unchanging if they weren't modified outside the epilogue, increasing the chance that we can emit a single-location variable record. Without this feature, debug-loc-offset.mir and pr19307.mir become temporarily XFAIL. They'll be re-enabled by D62314, using the FrameDestroy flag to identify epilogues, I've split this into two steps as FrameDestroy isn't necessarily supported by all backends. The logic for terminating variable locations at the end of a basic block now becomes much more enjoyably simple: we just terminate them all. Other test changes: inlined-argument.ll becomes XFAIL, but for a longer term. The current algorithm for detecting that a variable has a single-location doesn't work in this scenario (inlined function in multiple blocks), only other bugs were making this test work. fission-ranges.ll gets slightly refreshed too, as the location of "p" is now correctly determined to be a single location. Differential Revision: https://reviews.llvm.org/D61940 llvm-svn: 362951	2019-06-10 15:23:46 +00:00
Sanjay Patel	07bba68889	[InstCombine] add tests for fabs() with unordered preds; NFC llvm-svn: 362949	2019-06-10 15:08:22 +00:00
Sanjay Patel	85de9634e6	[InstCombine] fix bug in canonicalization to fabs() Forgot to translate the predicate clauses in rL362943. llvm-svn: 362945	2019-06-10 14:57:45 +00:00
Sanjay Patel	8b6d9f60ed	[InstCombine] change canonicalization to fabs() to use FMF on fsub Similar to rL362909: This isn't the ideal fix (use FMF on the select), but it's still an improvement until we have better FMF propagation to selects and other FP math operators. I don't think there's much risk of regression from this change by not including the FMF on the fcmp any more. The nsz/nnan FMF should be the same on the fcmp and the fsub because they have the same operand. llvm-svn: 362943	2019-06-10 14:46:36 +00:00
Simon Tatham	b87669f166	[ARM] Disallow PC, and optionally SP, in VMOVRH and VMOVHR. Arm v8.1-M supports the VMOV instructions that move a half-precision value to and from a GPR, but not if the GPR is SP or PC. To fix this, I've changed those instructions to use the rGPR register class instead of GPR. rGPR always excludes PC, and it excludes SP except in the presence of the HasV8Ops target feature (i.e. Arm v8-A). So the effect is that VMOV.F16 to and from PC is now illegal everywhere, but VMOV.F16 to and from SP is illegal only on non-v8-A cores (which I believe is all as it should be). Reviewers: dmgreen, samparker, SjoerdMeijer, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60704 llvm-svn: 362942	2019-06-10 14:43:55 +00:00
Cameron McInally	ce49e2231b	[ExecutionEngine] Add UnaryOperator visitor to the interpreter This is to support the unary FNeg instruction. Differential Revision: https://reviews.llvm.org/D62881 llvm-svn: 362941	2019-06-10 14:38:48 +00:00
George Rimar	286a47116a	[yaml2obj] - Remove TODOs from dynsymtab-implicit-sections-size-content.yaml. NFCI. Now when https://bugs.llvm.org/show_bug.cgi?id=42215 is fixed, we can remove these TODOs. llvm-svn: 362940	2019-06-10 14:33:24 +00:00
George Rimar	dd4f253c4d	[llvm-readobj/llvm-readelf] - Don't fail to dump the object if .dynsym has broken sh_link field. This is https://bugs.llvm.org/show_bug.cgi?id=42215. GNU readelf allows to dump the objects in that case, but llvm-readobj/llvm-readelf reports an error and stops. The patch fixes that. Differential revision: https://reviews.llvm.org/D63074 llvm-svn: 362938	2019-06-10 14:23:46 +00:00
Sanjay Patel	8cd8c5784b	[InstCombine] allow unordered preds when canonicalizing to fabs() PR42179: https://bugs.llvm.org/show_bug.cgi?id=42179 llvm-svn: 362937	2019-06-10 14:14:51 +00:00
Sanjay Patel	4cdd3ceb57	[InstCombine] add tests for fcmp unordered pred -> fabs (PR42179); NFC llvm-svn: 362936	2019-06-10 14:04:10 +00:00
George Rimar	1e41007aeb	[yaml2obj/obj2yaml] - Make RawContentSection::Content and RawContentSection::Size optional This is a follow-up for D62809. Content and Size fields should be optional as was discussed in comments of the D62809's thread. With that, we can describe a specific string table and symbol table sections in a more correct way and also show appropriate errors. The patch adds lots of test cases where the behavior is described in details. Differential revision: https://reviews.llvm.org/D62957 llvm-svn: 362931	2019-06-10 12:43:18 +00:00
George Rimar	379aa18a39	[yaml2obj] - Do not assert when .dynsym is specified explicitly, but .dynstr is not present. We have a code in buildSectionIndex() that adds implicit sections: // Add special sections after input sections, if necessary. for (StringRef Name : implicitSectionNames()) if (SN2I.addName(Name, SecNo)) { // Account for this section, since it wasn't in the Doc ++SecNo; DotShStrtab.add(Name); } The problem arises when .dynsym is specified explicitly and no DynamicSymbols is used. In that case, we do not add .dynstr implicitly and will assert later when will try to set Link for .dynsym. Seems, in this case, reasonable behavior is to allow Link field to be zero. This is what this patch does. Differential revision: https://reviews.llvm.org/D63001 llvm-svn: 362929	2019-06-10 11:38:06 +00:00
David Green	d847aa573b	[ARM] Enable Unroll UpperBound This option allows loops with small max trip counts to be fully unrolled. This can help with code like the remainder loops from manually unrolled loops like those that appear in the cmsis dsp library. We would apparently previously runtime unroll them with the default unroll count (4). Differential Revision: https://reviews.llvm.org/D63064 llvm-svn: 362928	2019-06-10 10:22:14 +00:00
Nikola Prica	abc1dff7e4	[DebugInfo] More strict debug range for stack variables Variable's stack location can stretch longer than it should. If a variable is placed at the stack in a some nested basic block its range can be calculated to be up to the next occurrence of the variable's DBG_VALUE, or up to the end of the function, thus covering a basic blocks that should not be included in the variable’s location range. This happens because the DbgEntityHistoryCalculator ends register locations at the end of a basic block only if the variable’s location register has been changed throughout the function, which is not the case for the register used to reference stack objects. This patch also tries to produce a single value location if the location list builder managed to merge all the locations into one. Reviewers: aprantl, dstenb, jmorse Reviewed By: aprantl, dstenb, jmorse Subscribers: djtodoro, ivanbaev, asowda Tags: #debug-info Differential Revision: https://reviews.llvm.org/D61600 llvm-svn: 362923	2019-06-10 08:41:06 +00:00
QingShan Zhang	ab846da7e8	[DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D62897 llvm-svn: 362921	2019-06-10 05:40:21 +00:00
Craig Topper	9000a72a4b	[X86] When promoting i16 compare with immediate to i32, try to use sign_extend for eq/ne if the input is truncated from a type with enough sign its. Summary: Our default behavior is to use sign_extend for signed comparisons and zero_extend for everything else. But for equality we have the freedom to use either extension. If we can prove the input has been truncated from something with enough sign bits, we can use sign_extend instead and let DAG combine optimize it out. A similar rule is used by type legalization in LegalizeIntegerTypes. This gets rid of the movzx in PR42189. The immediate will still take 4 bytes instead of the 2 bytes plus 0x66 prefix a cmp di, 32767 would get, but it avoids a length changing prefix. Reviewers: RKSimon, spatel, xbolva00 Reviewed By: xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63032 llvm-svn: 362920	2019-06-10 04:50:12 +00:00
Vivek Pandya	11cb15f8ed	Do not derive no-recurse attribute if function does not have exact definition. This is fix for https://bugs.llvm.org/show_bug.cgi?id=41336 Reviewers: jdoerfert Reviewed by: jdoerfert Differential Revision: https://reviews.llvm.org/D63045 llvm-svn: 362918	2019-06-10 04:16:04 +00:00
Kai Luo	3f3bae33a2	[NFC] Test if commit access granted. llvm-svn: 362917	2019-06-10 03:20:33 +00:00
Nico Weber	c5d67b5207	Make test not write to source directory llvm-svn: 362916	2019-06-10 01:47:04 +00:00
Craig Topper	f7ba8b808a	[X86] Convert f32/f64 FANDN/FAND/FOR/FXOR to vector logic ops and scalar_to_vector/extract_vector_elts to reduce isel patterns. Previously we did the equivalent operation in isel patterns with COPY_TO_REGCLASS operations to transition. By inserting scalar_to_vetors and extract_vector_elts before isel we can allow each piece to be selected individually and accomplish the same final result. I ideally we'd use vector operations earlier in lowering/combine, but that looks to be more difficult. The scalar-fp-to-i64.ll changes are because we have a pattern for using movlpd for store+extract_vector_elt. While an f64 store uses movsd. The encoding sizes are the same. llvm-svn: 362914	2019-06-10 00:41:07 +00:00
Nico Weber	80fee25776	Revert r361953 "[SVE][IR] Scalable Vector IR Type" This reverts commit `f4fc01f8dd`. It caused a 3-4x slowdown when doing thinlto links, PR42210. llvm-svn: 362913	2019-06-09 19:27:50 +00:00
David Bolvansky	dcf5e6abdf	[TargetLowering] Simplify (ctpop x) == 1 Reviewers: craig.topper, spatel, RKSimon, bkramer Reviewed By: spatel Subscribers: javed.absar, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63004 llvm-svn: 362912	2019-06-09 18:18:57 +00:00
Roman Lebedev	d669758d84	[InstCombine] foldICmpWithLowBitMaskedVal(): 'icmp sgt/sle': avoid miscompiles A precondition 'x != 0' was forgotten by me: https://rise4fun.com/Alive/JFNP https://rise4fun.com/Alive/jHvL These 4 folds with non-constants could be re-enabled, but for now let's go for the simplest solution. https://bugs.llvm.org/show_bug.cgi?id=42198 llvm-svn: 362911	2019-06-09 16:30:42 +00:00
Roman Lebedev	ff0c99b017	[NFC][InstCombine] Revisit canonicalize-constant-low-bit-mask-and-icmp-s* tests in preparatio for PR42198. The `icmp sgt`/`icmp sle` variants are, too, miscompiles: https://rise4fun.com/Alive/JFNP https://rise4fun.com/Alive/jHvL A precondition 'x != 0' was forgotten by me. While ensuring test coverage for `-1`, also add test coverage for `0` mask. Mask `0` is allowed for all the folds, mask `-1` is allowed for all the folds with unsigned `icmp` pred. Constant mask `0` is missed though. https://bugs.llvm.org/show_bug.cgi?id=42198 llvm-svn: 362910	2019-06-09 16:30:14 +00:00
Sanjay Patel	87cd16a86e	[InstCombine] change canonicalization to fabs() to use FMF on fneg This isn't the ideal fix (use FMF on the select), but it's still an improvement until we have better FMF propagation to selects and other FP math operators. I don't think there's much risk of regression from this change by not including the FMF on the fcmp any more. The nsz/nnan FMF should be the same on the fcmp and the fneg (fsub) because they have the same operand. This works around the most glaring FMF logical inconsistency cited in PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 llvm-svn: 362909	2019-06-09 16:22:01 +00:00
David Bolvansky	96ccd690f8	[NFC] Adjust test for D63004 llvm-svn: 362908	2019-06-09 16:15:08 +00:00
David Bolvansky	16ca1fee5e	[NFC] Added test from PR19758 llvm-svn: 362907	2019-06-09 15:12:46 +00:00
David Bolvansky	4e95b36b6d	[NFC] Added test from PR42084 for D63058 llvm-svn: 362906	2019-06-09 14:56:46 +00:00
Nikita Popov	06beb48229	[InstCombine] Add tests for usub.sat(x,y)+y etc; NFC For PR42178. llvm-svn: 362905	2019-06-09 14:39:47 +00:00
Sanjay Patel	73f5a855b3	[InstSimplify] enhance fcmp fold with never-nan operand This is another step towards correcting our usage of fast-math-flags when applied on an fcmp. In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of the fcmp. But I'm leaving that clause in until we're more confident that we can stop relying on fcmp's FMF. By using the more general "isKnownNeverNaN()", we gain a simplification shown on the tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN). On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction in addition to the FMF on the fcmp. This is a continuation of D62979 / rL362879. llvm-svn: 362903	2019-06-09 13:48:59 +00:00
Sanjay Patel	de4d4d5049	[InstSimplify] add tests for fcmp with known-never-nan operands; NFC Opposite predicate for rL362742 / rL362879 / D62979 llvm-svn: 362902	2019-06-09 13:30:14 +00:00
Anton Afanasyev	623d9ba068	[MIR] Add simple PRE pass to MachineCSE This is the second part of the commit fixing PR38917 (hoisting partitially redundant machine instruction). Most of PRE (partitial redundancy elimination) and CSE work is done on LLVM IR, but some of redundancy arises during DAG legalization. Machine CSE is not enough to deal with it. This simple PRE implementation works a little bit intricately: it passes before CSE, looking for partitial redundancy and transforming it to fully redundancy, anticipating that the next CSE step will eliminate this created redundancy. If CSE doesn't eliminate this, than created instruction will remain dead and eliminated later by Remove Dead Machine Instructions pass. The third part of the commit is supposed to refactor MachineCSE, to make it more clear and to merge MachinePRE with MachineCSE, so one need no rely on further Remove Dead pass to clear instrs not eliminated by CSE. First step: https://reviews.llvm.org/D54839 Fixes llvm.org/PR38917 This is fixed recommit of r361356 after PowerPC64 multistage build failure. llvm-svn: 362901	2019-06-09 12:15:47 +00:00
Ayke van Laethem	f18cf230e4	[CaptureTracking] Don't let comparisons against null escape inbounds pointers Pointers that are in-bounds (either through dereferenceable_or_null or thorough a getelementptr inbounds) cannot be captured with a comparison against null. There is no way to construct a pointer that is still in bounds but also NULL. This helps safe languages that insert null checks before load/store instructions. Without this patch, almost all pointers would be considered captured even for simple loads. With this patch, an icmp with null will not be seen as escaping as long as certain conditions are met. There was a lot of discussion about this patch. See the Phabricator thread for detals. Differential Revision: https://reviews.llvm.org/D60047 llvm-svn: 362900	2019-06-09 10:20:33 +00:00
Amara Emerson	0d20969dea	[AArch64][GlobalISel] Select immediate forms of cmp instructions. A simple re-use of the immediate operand matcher and renderer functions. rdar://43795178 llvm-svn: 362896	2019-06-09 07:31:25 +00:00
Roman Lebedev	2aa0c5515f	[X86][Codegen] Add missed pattern that may be a lea+neg llvm-svn: 362886	2019-06-08 19:38:14 +00:00
Sanjay Patel	4329c15f11	[InstSimplify] enhance fcmp fold with never-nan operand This is 1 step towards correcting our usage of fast-math-flags when applied on an fcmp. In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of the fcmp. But I'm leaving that clause in until we're more confident that we can stop relying on fcmp's FMF. By using the more general "isKnownNeverNaN()", we gain a simplification shown on the tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN). On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction in addition to the FMF on the fcmp. I'll update the 'ult' case below here as a follow-up assuming no problems here. Differential Revision: https://reviews.llvm.org/D62979 llvm-svn: 362879	2019-06-08 15:12:33 +00:00
David Bolvansky	54b1044983	[NFC] Added tests for D63038 llvm-svn: 362875	2019-06-08 12:07:59 +00:00
David Green	c5471c2a57	[ARM] Adjust isLegalT1AddressImmediate for non-legal types Types such as float and i64's do not have legal loads in Thumb1, but will still be loaded with a LDR (or potentially multiple LDR's). As such we can treat the cost of addressing mode calculations the same as an i32 and get some optimisation benefits. Differential Revision: https://reviews.llvm.org/D62968 llvm-svn: 362874	2019-06-08 10:32:53 +00:00
David Green	342d1b81a3	[ARM] Add MVE addressing to isLegalT2AddressImmediate Now with MVE being added, we can add the vector addressing mode costs for it. These are generally imm7 multiplied by the size of the type being loaded / stored. Differential Revision: https://reviews.llvm.org/D62967 llvm-svn: 362873	2019-06-08 10:18:23 +00:00
David Green	4ecce205d5	[ARM] Add fp16 addressing to isLegalT2AddressImmediate The fp16 version of VLDR takes a imm8 multiplied by 2. This updates the costs to account for those, and adds extra testing. It is dependant upon hasFPRegs16 as this is what the load/store instructions require. Differential Revision: https://reviews.llvm.org/D62966 llvm-svn: 362872	2019-06-08 10:09:02 +00:00
David Green	990eb2d1e8	[ARM] Add extra gep costmodel tests for MVE and half float. NFC llvm-svn: 362871	2019-06-08 09:58:05 +00:00
Jonas Paulsson	fdc4ea34e3	[SystemZ, RegAlloc] Favor 3-address instructions during instruction selection. This patch aims to reduce spilling and register moves by using the 3-address versions of instructions per default instead of the 2-address equivalent ones. It seems that both spilling and register moves are improved noticeably generally. Regalloc hints are passed to increase conversions to 2-address instructions which are done in SystemZShortenInst.cpp (after regalloc). Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are the same), foldMemoryOperandImpl() can no longer trivially fold a spilled source register since the reg/reg instruction is now 3-address. In order to remedy this, new 3-address pseudo memory instructions are used to perform the folding only when the dst and lhs virtual registers are known to be allocated to the same physreg. In order to not let MachineCopyPropagation run and change registers on these transformed instructions (making it 3-address), a new target pass called SystemZPostRewrite.cpp is run just after VirtRegRewriter, that immediately lowers the pseudo to a target instruction. If it would have been possibe to insert a COPY instruction and change a register operand (convert to 2-address) in foldMemoryOperandImpl() while trusting that the caller (e.g. InlineSpiller) would update/repair the involved LiveIntervals, the solution involving pseudo instructions would not have been needed. This is perhaps a potential improvement (see Phabricator post). Common code changes: * A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a target pass immediately before MachineCopyPropagation. * VirtRegMap is passed as an argument to foldMemoryOperand(). Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D60888 llvm-svn: 362868	2019-06-08 06:19:15 +00:00
Seiya Nuta	b728e53b95	[llvm-objcopy][MachO] Recompute and update offset/size fields in the writer Summary: Recompute and update offset/size fields so that we can implement llvm-objcopy options like --only-section. This patch is the first step and focuses on supporting load commands that covered by existing tests: executable files and dynamic libraries are not supported. Reviewers: alexshap, rupprecht, jhenderson Reviewed By: alexshap, rupprecht Subscribers: compnerd, jakehehrlich, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62652 llvm-svn: 362863	2019-06-08 01:22:54 +00:00
Matt Arsenault	a59aeb3f29	LoopDistribute: Add testcase where SCEV wants to insert a runtime check. Only the memory based checks were being tested. Prepare for fix in convergent handling. llvm-svn: 362854	2019-06-07 23:17:38 +00:00
Matt Arsenault	ddd2c9ac86	AMDGPU: Force skips around traps llvm-svn: 362852	2019-06-07 23:02:52 +00:00
Jordan Rupprecht	7dd813fea1	[llvm-objdump] Fix Bugzilla ID 41862 to support checking addresses of disassembled object Summary: This fixes the bugzilla id,41862 to support dealing with checking stop address against start address to support this not being a proper object to check the disasembly against like gnu objdump currently does. Reviewers: jakehehrlich, rupprecht, echristo, jhenderson, grimar Reviewed By: jhenderson Subscribers: MaskRay, smeenai, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61969 Patch by Nicholas Krause! llvm-svn: 362847	2019-06-07 21:49:26 +00:00
Shoaib Meenai	61f7df54e3	[llvm-lipo] Implement -archs Displays the architecture names of an input file. Unknown architectures are represented by unknown(cputype,cpusubtype). Patch by Anusha Basana <anusha.basana@gmail.com> Differential Revision: https://reviews.llvm.org/D62753 llvm-svn: 362840	2019-06-07 20:47:58 +00:00
Michael Pozulp	c3c18f4a0d	[llvm-objdump] Add warning if --disassemble-functions specifies an unknown symbol Summary: Fixes Bug 41904 https://bugs.llvm.org/show_bug.cgi?id=41904 Re-land r362768 after it was reverted in r362826. Reviewers: jhenderson, rupprecht, grimar, MaskRay Reviewed By: jhenderson, rupprecht, MaskRay Subscribers: dexonsmith, rupprecht, kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62275 llvm-svn: 362838	2019-06-07 20:34:31 +00:00
Volkan Keles	97204a6788	[GlobalISel] IRTranslator: Translate the intrinsics ignored by CodeGen Summary: Translate `llvm.assume`, `llvm.var.annotation` and `llvm.sideeffect` to nothing as they have no effect on CodeGen. Reviewers: qcolombet, aditya_nandakumar, dsanders, paquette, aemerson, arsenm Reviewed By: arsenm Subscribers: hiraditya, wdng, rovka, kristof.beyls, javed.absar, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63022 llvm-svn: 362834	2019-06-07 20:19:27 +00:00
Vlad Tsyrklevich	e67f6206ac	Revert "[llvm-objdump] Add warning if --disassemble-functions specifies an unknown symbol" This reverts commit `50f61af3f3`, it used the function introduced in the previous revert of `0bddef7901`. llvm-svn: 362826	2019-06-07 18:55:12 +00:00
Peter Collingbourne	8d58a98c59	llvm-objcopy: Implement --extract-partition and --extract-main-partition. This implements the functionality described in https://lld.llvm.org/Partitions.html. It works as follows: - Reads the section headers using the ELF header at file offset 0; - If extracting a loadable partition: - Finds the section containing the required partition ELF header by looking it up in the section table; - Reads the ELF and program headers from the section. - If extracting the main partition: - Reads the ELF and program headers from file offset 0. - Filters the section table according to which sections are in the program headers that it read: - If ParentSegment != nullptr or section is not SHF_ALLOC, then it goes in. - Sections containing partition ELF headers or program headers are excluded as there are no headers for these in ordinary ELF files. Differential Revision: https://reviews.llvm.org/D62364 llvm-svn: 362818	2019-06-07 17:57:48 +00:00
Matt Arsenault	076ad57f8d	AMDGPU: Fix MIR test verifier error llvm-svn: 362817	2019-06-07 17:55:07 +00:00
Nico Weber	ad6a9f81ae	Attempt to fix nm-archive.test after r362798 llvm-lib now needs a `target triple` for bitcode, so add a new file that's like trivial.ll but has one, and use that in the test. (trivial.ll had a comment that looked like it wasn't supposed to be used in tests directly, so I don't want to change that file.) llvm-svn: 362809	2019-06-07 16:06:27 +00:00
Jinsong Ji	7aafdef627	[MachineScheduler] checkResourceLimit boundary condition update When we call checkResourceLimit in bumpCycle or bumpNode, and we know the resource count has just reached the limit (the equations are equal). We should return true to mark that we are resource limited for next schedule, or else we might continue to schedule in favor of latency for 1 more schedule and create a schedule that actually overbook the resource. When we call checkResourceLimit to estimate the resource limite before scheduling, we don't need to return true even if the equations are equal, as it shouldn't limit the schedule for it . Differential Revision: https://reviews.llvm.org/D62345 llvm-svn: 362805	2019-06-07 14:54:47 +00:00
David Bolvansky	43f8ce44b7	[NFC] Added tests for D63004 llvm-svn: 362801	2019-06-07 14:05:42 +00:00
Nico Weber	d546b5052b	llvm-lib: Disallow mixing object files with different machine types lib.exe doesn't allow creating .lib files with object files that have differing machine types. Update llvm-lib to match. The motivation is to make it possible to infer the machine type of a .lib file in lld, so that it can warn when e.g. a 32-bit .lib file is passed to a 64-bit link (PR38965). Fixes PR38782. Differential Revision: https://reviews.llvm.org/D62913 llvm-svn: 362798	2019-06-07 13:24:34 +00:00
Sanjay Patel	6880bceda2	[x86] narrow extract subvector of vector select This is a potentially large perf win for AVX1 targets because of the way we auto-vectorize to 256-bit but then expect the backend to legalize/optimize for the half-implemented AVX1 ISA. On the motivating example from PR37428 (even though this patch doesn't solve the vector shift issue): https://bugs.llvm.org/show_bug.cgi?id=37428 ...there's a 16% speedup when compiling with "-mavx" (perf tested on Haswell) because we eliminate the remaining 256-bit vblendv ops. I added comments on a couple of tests that require further work. If we have 256-bit logic ops separating the vselect and extract, we should probably narrow everything to 128-bit, but that requires a larger pattern match. Differential Revision: https://reviews.llvm.org/D62969 llvm-svn: 362797	2019-06-07 13:17:46 +00:00
Sam Elliott	f720647ddd	[RISCV] Support Bit-Preserving FP in F/D Extensions Summary: This allows some integer bitwise operations to instead be performed by hardware fp instructions. This is correct because the RISC-V spec requires the F and D extensions to use the IEEE-754 standard representation, and fp register loads and stores to be bit-preserving. This is tested against the soft-float ABI, but with hardware float extensions enabled, so that the tests also ensure the optimisation also fires in this case. Reviewers: asb, luismarques Reviewed By: asb Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62900 llvm-svn: 362790	2019-06-07 12:20:14 +00:00
Valery Pykhtin	cb8de55f47	[AMDGPU] Constrain the AMDGPU inliner on maximum number of basic blocks in a caller function (compile time performance) Differential revision: https://reviews.llvm.org/D62917 llvm-svn: 362789	2019-06-07 12:16:46 +00:00
Cullen Rhodes	1f0d251244	[AArch64][AsmParser] error on unexpected SVE predicate type suffix Summary: This patch fixes a bug in the assembler that permitted a type suffix on predicate registers when not expected. For instance, the following was previously valid: faddv h0, p0.q, z1.h This bug was present in all SVE instructions containing predicates with no type suffix and no predication form qualifier, i.e. /z or /m. The latter instructions are already caught with an appropiate error message by the assembler, e.g.: .text <stdin>:1:13: error: not expecting size suffix cmpne p1.s, p0.b/z, z2.s, 0 ^ A similar issue for SVE vector registers was fixed in: https://reviews.llvm.org/D59636 Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62942 llvm-svn: 362780	2019-06-07 08:46:56 +00:00
Cullen Rhodes	f730548484	[AArch64][AsmParser] Provide better diagnostics for SVE predicates Patch by Sander de Smalen (sdesmalen) Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62941 llvm-svn: 362779	2019-06-07 08:37:00 +00:00
George Rimar	33044a7ae2	[llvm-objcopy] - Emit error and don't crash if program header reaches past end of file. This is https://bugs.llvm.org/show_bug.cgi?id=42122. If an object file has a size less than program header's file [offset + size] (i.e. if we have overflow), llvm-objcopy crashes instead of reporting a error. The patch fixes this issue. Differential revision: https://reviews.llvm.org/D62898 llvm-svn: 362778	2019-06-07 08:34:18 +00:00
Pengfei Wang	f8b28931a7	[X86] -march=cooperlake (llvm) Support intel -march=cooperlake in llvm Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62836 llvm-svn: 362776	2019-06-07 08:31:35 +00:00
Sam Parker	c5ef502ee8	[CodeGen] Generic Hardware Loop Support Patch which introduces a target-independent framework for generating hardware loops at the IR level. Most of the code has been taken from PowerPC CTRLoops and PowerPC has been ported over to use this generic pass. The target dependent parts have been moved into TargetTransformInfo, via isHardwareLoopProfitable, with HardwareLoopInfo introduced to transfer information from the backend. Three generic intrinsics have been introduced: - void @llvm.set_loop_iterations Takes as a single operand, the number of iterations to be executed. - i1 @llvm.loop_decrement(anyint) Takes the maximum number of elements processed in an iteration of the loop body and subtracts this from the total count. Returns false when the loop should exit. - anyint @llvm.loop_decrement_reg(anyint, anyint) Takes the number of elements remaining to be processed as well as the maximum numbe of elements processed in an iteration of the loop body. Returns the updated number of elements remaining. llvm-svn: 362774	2019-06-07 07:35:30 +00:00
Michael Pozulp	65d1ff8e7e	[NFC] Delete trailing whitespace character. llvm-svn: 362772	2019-06-07 06:28:43 +00:00
Michael Pozulp	767bdd55e1	[llvm-objdump] Print source when subsequent lines in the translation unit come from the same line in two different headers. Reviewers: grimar, rupprecht, jhenderson Reviewed By: grimar, jhenderson Subscribers: llvm-commits, jhenderson Tags: #llvm Differential Revision: https://reviews.llvm.org/D62461 llvm-svn: 362771	2019-06-07 06:23:54 +00:00
Michael Pozulp	50f61af3f3	[llvm-objdump] Add warning if --disassemble-functions specifies an unknown symbol Summary: Fixes Bug 41904 https://bugs.llvm.org/show_bug.cgi?id=41904 Reviewers: jhenderson, rupprecht, grimar, MaskRay Reviewed By: jhenderson, rupprecht, MaskRay Subscribers: dexonsmith, rupprecht, kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62275 llvm-svn: 362768	2019-06-07 05:11:13 +00:00
Fangrui Song	c841b9abf0	[MC][ELF] Don't create relocations with section symbols for STB_LOCAL ifunc We should keep the symbol type (STT_GNU_IFUNC) for a local ifunc because it may result in an IRELATIVE reloc that the dynamic loader will use to resolve the address at startup time. There is another problem that is not fixed by this patch: a PC relative relocation should also create a relocation with the ifunc symbol. llvm-svn: 362767	2019-06-07 03:47:22 +00:00
Michael Pozulp	c7029e4ef4	[NFC] Test commit. llvm-svn: 362763	2019-06-07 01:55:59 +00:00
Matt Arsenault	c0edb8f5cf	AMDGPU: Don't count mask branch pseudo towards skip threshold llvm-svn: 362761	2019-06-07 00:14:55 +00:00
Matt Arsenault	99ee81b183	AMDGPU: Insert skips for blocks with FLAT This already forced a skip for VMEM, so it should also be done for flat. I'm somewhat skeptical about the benefit of this though. llvm-svn: 362760	2019-06-07 00:14:45 +00:00
Nemanja Ivanovic	ef4a3aa549	[PowerPC] Exploit the vector min/max instructions Use the PPC vector min/max instructions for computing the corresponding operation as these should be faster than the compare/select sequences we currently emit. Differential revision: https://reviews.llvm.org/D47332 llvm-svn: 362759	2019-06-06 23:49:01 +00:00
Matt Arsenault	b6cfa129cc	AMDGPU: Insert skip branches over return blocks SIInsertSkips really doesn't understand the control flow, and makes very stupid assumptions about the block layout. This was able to get away with not skipping return blocks, since usually after structurization there is only one placed at the end of the function. Tail duplication can break this assumption. llvm-svn: 362754	2019-06-06 22:51:51 +00:00
Cameron McInally	66f286845c	[NFC][CodeGen] Add unary fneg tests to X86/fma4-intrinsics-x86.ll llvm-svn: 362752	2019-06-06 21:49:59 +00:00
Alexey Lapshin	b9f1e7b16e	[DebugInfo] Incorrect debug info record generated for loop counter. Incorrect Debug Variable Range was calculated while "COMPUTING LIVE DEBUG VARIABLES" stage. Range for Debug Variable("i") computed according to current state of instructions inside of basic block. But Register Allocator creates new instructions which were not taken into account when Live Debug Variables computed. In the result DBG_VALUE instruction for the "i" variable was put after these newly inserted instructions. This is incorrect. Debug Value for the loop counter should be inserted before any loop instruction. Differential Revision: https://reviews.llvm.org/D62650 llvm-svn: 362750	2019-06-06 21:19:39 +00:00
Alexander Timofeev	37bd9bd137	[AMDGPU] Partial revert for the `ba447bae74` "Divergence driven ISel. Assign register class for cross block values according to the divergence." that discovered the design flaw leading to several issues that required to be solved before. This change reverts AMDGPU specific changes and keeps common part unaffected. llvm-svn: 362749	2019-06-06 21:13:02 +00:00
Cameron McInally	169fc2b020	[NFC][CodeGen] Add unary fneg tests to X86/fma-intrinsics-x86.ll llvm-svn: 362748	2019-06-06 21:12:22 +00:00
Craig Topper	f320f26716	[X86] Make a bunch of merge masked binops commutable for loading folding. This primarily affects add/fadd/mul/fmul/and/or/xor/pmuludq/pmuldq/max/min/fmaxc/fminc/pmaddwd/pavg. We already commuted the unmasked and zero masked versions. I've added 512-bit stack folding tests for most of the instructions affected. I've tested needing commuting and not commuting across unmasked, merged masked, and zero masked. The 128/256 bit instructions should behave similarly. llvm-svn: 362746	2019-06-06 21:00:04 +00:00
Sanjay Patel	38c5ee1802	[InstSimplify] add tests for fcmp with known-never-nan operands; NFC llvm-svn: 362742	2019-06-06 20:14:06 +00:00
Cameron McInally	3d2ee0053a	[NFC][CodeGen] Add unary fneg tests to X86/fma-scalar-combine.ll llvm-svn: 362741	2019-06-06 20:11:30 +00:00
Craig Topper	ca541b20d0	[CFLGraph] Add support for unary fneg instruction. Differential Revision: https://reviews.llvm.org/D62791 llvm-svn: 362737	2019-06-06 19:21:23 +00:00
Jason Liu	60ec248148	[AIX] Implement function descriptor on SDAG Summary: (1) Function descriptor on AIX On AIX, a called routine may have 2 distinct symbols associated with it: * A function descriptor (Name) * A function entry point (.Name) The descriptor structure on AIX is the same as those in the ELF V1 ABI: * The address of the entry point of the function. * The TOC base address for the function. * The environment pointer. The descriptor symbol uses the same name as the source level function in C. The function entry point is analogous to the symbol we would generate for a function in a non-descriptor-based ABI, except that it is renamed by prepending a ".". Which symbol gets referenced depends on the context: * Taking the address of the function references the descriptor symbol. * Calling the function references the entry point symbol. (2) Speaking of implementation on AIX, for direct function call target, we create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to replace original TargetGlobalAddress SDNode. Then down the path, we can take advantage of this MCSymbol. Patch by: Xiangling_L Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara Differential Revision: https://reviews.llvm.org/D62532 llvm-svn: 362735	2019-06-06 19:13:36 +00:00
Cameron McInally	f288a0685f	[NFC][CodeGen] Add unary fneg tests to X86/fma4-fneg-combine.ll llvm-svn: 362733	2019-06-06 19:02:46 +00:00
Craig Topper	6cda33ba36	[InlineCost] Add support for unary fneg. This adds support for unary fneg based on the implementation of BinaryOperator without the soft float FP cost. Previously we would just delegate to visitUnaryInstruction. I think the only real change is that we will pass the FastMath flags to SimplifyFNeg now. Differential Revision: https://reviews.llvm.org/D62699 llvm-svn: 362732	2019-06-06 19:02:18 +00:00
Cameron McInally	06de52674d	[NFC][CodeGen] Add unary fneg tests to X86/fma_patterns.ll llvm-svn: 362730	2019-06-06 18:41:18 +00:00
Philip Reames	101915cfda	[LoopPred] Fix a bug in unconditional latch bailout introduced in r362284 This is a really silly bug that even a simple test w/an unconditional latch would have caught. I tried to guard against the case, but put it in the wrong if check. Oops. llvm-svn: 362727	2019-06-06 18:02:36 +00:00
Simon Pilgrim	842c7792aa	[DAGCombine] MergeConsecutiveStores - improve non-temporal load\store handling (PR42123) This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores: 1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another. 2 - The merged load\store node must retain the non-temporal flag. Differential Revision: https://reviews.llvm.org/D62910 llvm-svn: 362723	2019-06-06 17:04:13 +00:00
Cameron McInally	f1b8c6ac4f	[NFC][CodeGen] Add unary fneg tests to X86/fma_patterns_wide.ll llvm-svn: 362720	2019-06-06 16:55:51 +00:00
Craig Topper	6b67dfa54c	[X86] Make masked floating point equality/ordered compares commutable for load folding purposes. Same as what is supported for the unmasked form. llvm-svn: 362717	2019-06-06 16:39:04 +00:00
Cameron McInally	5c01140581	[NFC][CodeGen] Add unary fneg tests to fmul-combines.ll fnabs.ll llvm-svn: 362715	2019-06-06 16:13:23 +00:00
Cameron McInally	1d85a7518c	[NFC][CodeGen] Add unary fneg tests to fp-fast.ll fp-fold.ll fp-in-intregs.ll fp-stack-compare-cmov.ll fp-stack-compare.ll fsxor-alignment.ll llvm-svn: 362712	2019-06-06 15:29:11 +00:00
Cameron McInally	0924f44859	[NFC][CodeGen] Remove duplicate test in fp-fast.ll @test10 is the same as @test11. llvm-svn: 362710	2019-06-06 14:52:16 +00:00
Jason Liu	0338b88861	[AIX] Implement call lowering with parameters could pass onto GPRs Summary: This patch implements SDAG call lowering on AIX for functions which only have parameters that could fit into GPRs. Reviewers: hubert.reinterpretcast, syzaara Differential Revision: https://reviews.llvm.org/D62823 llvm-svn: 362708	2019-06-06 14:36:43 +00:00
Thomas Preud'homme	71d3f227a7	FileCheck [6/12]: Introduce numeric variable definition Summary: This patch is part of a patch series to add support for FileCheck numeric expressions. This specific patch introduces support for defining numeric variable in a CHECK directive. This commit introduces support for defining numeric variable from a litteral value in the input text. Numeric expressions can then use the variable provided it is on a later line. Copyright: - Linaro (changes up to diff 183612 of revision D55940) - GraphCore (changes in later versions of revision D55940 and in new revision created off D55940) Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield Tags: #llvm Differential Revision: https://reviews.llvm.org/D60386 llvm-svn: 362705	2019-06-06 13:21:06 +00:00
Owen Reynolds	bf5bca5bea	[llvm-ar] Create thin archives with MRI scripts This patch implements the "CREATE_THIN" MRI script command, allowing thin archives to be created via MRI scripts. Differential Revision: https://reviews.llvm.org/D62919 llvm-svn: 362704	2019-06-06 13:19:50 +00:00
Sanjay Patel	dd2d1a168f	[InstCombine] add tests for loads of bitcasted vector pointer; NFC llvm-svn: 362703	2019-06-06 13:18:20 +00:00
Adhemerval Zanella	559e69a821	AArch64] Handle ISD::LRINT and ISD::LLRINT for float16 This patch is a follow up for D62018 to add lrint/llrint support for float16. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62863 llvm-svn: 362700	2019-06-06 12:38:11 +00:00
Benjamin Kramer	f1249442cf	Revert "[SCEV] Use wrap flags in InsertBinop" This reverts commit r362687. Miscompiles llvm-profdata during selfhost. llvm-svn: 362699	2019-06-06 12:35:46 +00:00
Adhemerval Zanella	bce9e11a7b	[AArch64] Handle ISD::LROUND and ISD::LLROUND for float16 This patch is a follow up for D61391 to add lround/llround support for float16. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62861 llvm-svn: 362698	2019-06-06 11:53:26 +00:00
Simon Pilgrim	dc8affe607	[X86][SSE] Add nonuniform constant vector test for PR42105 llvm-svn: 362697	2019-06-06 11:15:36 +00:00
Luis Marques	711f361596	[RISCV] Disable test/Analysis/CostModel/RISCV tests if RISCV backend not built Adds missing lit.local.cfg. Fixes rL362691. llvm-svn: 362693	2019-06-06 10:12:28 +00:00
Petar Avramovic	81132ce0e9	[MIPS GlobalISel] Select sqrt Select G_FSQRT for MIPS32. Differential Revision: https://reviews.llvm.org/D62905 llvm-svn: 362692	2019-06-06 10:00:41 +00:00
Luis Marques	cff7d2fdc9	[RISCV] Add CostModel GEP tests Differential Revision: https://reviews.llvm.org/D61185 llvm-svn: 362691	2019-06-06 09:47:53 +00:00
Petar Avramovic	0a1fd355b2	[MIPS GlobalISel] Select fabs Select G_FABS for MIPS32. Differential Revision: https://reviews.llvm.org/D62903 llvm-svn: 362690	2019-06-06 09:22:37 +00:00
Petar Avramovic	a7d0006447	[MIPS GlobalISel] Select fpext and fptrunc Select G_FPEXT and G_FPTRUNC for MIPS32. Differential Revision: https://reviews.llvm.org/D62902 llvm-svn: 362689	2019-06-06 09:16:58 +00:00
Petar Avramovic	faaa2b5d21	[MIPS GlobalISel] Select floor and ceil Select G_FFLOOR and G_FCEIL for MIPS32. Differential Revision: https://reviews.llvm.org/D62901 llvm-svn: 362688	2019-06-06 09:02:24 +00:00
Sam Parker	7cc580f5e9	[SCEV] Use wrap flags in InsertBinop If the given SCEVExpr has no (un)signed flags attached to it, transfer these to the resulting instruction or use them to find an existing instruction. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 362687	2019-06-06 08:56:26 +00:00
Dylan McKay	3c82c57d2b	[AVR] Fix the 'load.ll' test after r362351 In that commit, the 'load.ll' test was modified, but still failed. This commit updates the test so that it now passes. llvm-svn: 362684	2019-06-06 08:06:50 +00:00
Amara Emerson	d3144a4abc	[AArch64][GlobalISel] Add manual selection support for G_ZEXTLOADs to s64. We already get support for G_ZEXTLOAD to s32 from the importer, but it can't deal with the SUBREG_TO_REG in the pattern. Tweaking the existing manual selection code for G_LOAD to handle an additional SUBREG_TO_REG when dealing with G_ZEXTLOAD isn't much work. Also add tests to check the imported pattern selections to s32 work. llvm-svn: 362681	2019-06-06 07:58:37 +00:00
Amara Emerson	d940e20051	[AArch64][GlobalISel] Add the new changes to fix PR42129 that were supposed to go into r362666. The changes weren't staged so ended up just re-commiting the unmodified reverted change. llvm-svn: 362677	2019-06-06 07:33:47 +00:00
Craig Topper	9226ba6b37	[X86] Don't turn avx masked.load with constant mask into masked.load+vselect when passthru value is all zeroes. This is intended to enable the use of an immediate blend or more optimal instruction. But if the passthru is zero we don't need any additional instructions. llvm-svn: 362675	2019-06-06 05:41:27 +00:00
Craig Topper	cf44372137	[X86] Add test case for masked load with constant mask and all zeros passthru. avx/avx2 masked loads only support all zeros for passthru in hardware. So we have to emit a blend for all other values. We have an optimization that tries to optimize this blend if the mask is constant. But we don't need to perform this optimization if the passthru value is zero which doesn't need the blend at all. llvm-svn: 362674	2019-06-06 05:41:22 +00:00
Amara Emerson	c37ff0d138	Revert "Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp"" When looking through copies, make sure to not try to find the vreg def of a physreg. Normally getVRegDef will return nullptr in this case, but if there happens to be multiple defs then it will assert. This fixes PR42129. llvm-svn: 362666	2019-06-05 23:46:16 +00:00
Matt Arsenault	34c8b835b1	AMDGPU: Don't fix emergency stack slot at offset 0 This forced the caller to be aware of this, which is an ugly ABI feature. Partially reverts r295877. The original reasons for doing this are mostly fixed. Alloca is now in a non-0 address space, so it should be OK to have 0 as a valid pointer. Since we treat the absolute address as the pointer value, this part only really needed to apply to kernels. Since r357093, we avoid the need to increment/decrement the offset register in more cases, and since r354816 the scavenger can fail without spilling, so it's less critical that we try to avoid an offset that fits in the MUBUF offset. Restrict to callable functions for now to split this into 2 steps to limit thte number of test updates and in case anything breaks. llvm-svn: 362665	2019-06-05 22:37:50 +00:00
Cameron McInally	c72fbe5dc1	[MSAN] Add unary FNeg visitor to the MemorySanitizer Differential Revision: https://reviews.llvm.org/D62909 llvm-svn: 362664	2019-06-05 22:37:05 +00:00
Ulrich Weigand	6c5d5ce551	Allow target to handle STRICT floating-point nodes The ISD::STRICT_ nodes used to implement the constrained floating-point intrinsics are currently never passed to the target back-end, which makes it impossible to handle them correctly (e.g. mark instructions are depending on a floating-point status and control register, or mark instructions as possibly trapping). This patch allows the target to use setOperationAction to switch the action on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code will stop converting the STRICT nodes to regular floating-point nodes, but instead pass the STRICT nodes to the target using normal SelectionDAG matching rules. To avoid having the back-end duplicate all the floating-point instruction patterns to handle both strict and non-strict variants, we make the MI codegen explicitly aware of the floating-point exceptions by introducing two new concepts: - A new MCID flag "mayRaiseFPException" that the target should set on any instruction that possibly can raise FP exception according to the architecture definition. - A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI instruction resulting from expansion of any constrained FP intrinsic. Any MI instruction that is both marked as mayRaiseFPException and FPExcept then needs to be considered as raising exceptions by MI-level codegen (e.g. scheduling). Setting those two new flags is straightforward. The mayRaiseFPException flag is simply set via TableGen by marking all relevant instruction patterns in the .td files. The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes in the SelectionDAG, and gets inherited in the MachineSDNode nodes created from it during instruction selection. The flag is then transfered to an MIFlag when creating the MI from the MachineSDNode. This is handled just like fast-math flags like no-nans are handled today. This patch includes both common code changes required to implement the new features, and the SystemZ implementation. Reviewed By: andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D55506 llvm-svn: 362663	2019-06-05 22:33:10 +00:00
Petr Hosek	2f94203e23	Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp" This reverts commit r362435 as this triggers ICE, see PR42129 for details. llvm-svn: 362662	2019-06-05 22:27:31 +00:00
Matt Arsenault	b812b7a45e	AMDGPU: Invert frame index offset interpretation Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661	2019-06-05 22:20:47 +00:00
Joseph Tremoulet	acb5609063	[EarlyCSE] Add tests for negated min/max/abs [NFC] Summary: I'm planning to update the hashing logic to recognize their equivalence in a subsequent change (D62644). Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62918 llvm-svn: 362657	2019-06-05 21:30:10 +00:00
Matt Arsenault	663d762c9a	NewGVN: Handle addrspacecast The AllConstant check needs to be moved out of the if/else if chain to avoid a test regression. The "there is no SimplifyZExt" comment puzzles me, since there is SimplifyCastInst. Additionally, the Simplify* calls seem to not see the operand as constant, so this needs to be tried if the simplify failed. llvm-svn: 362653	2019-06-05 21:15:52 +00:00
Tim Northover	8d7f118ab2	InstCombine: correctly change byval type attribute alongside call args. When the byval attribute has a type, it must match the pointee type of any parameter; but InstCombine was not updating the attribute when folding casts of various kinds away. llvm-svn: 362643	2019-06-05 20:38:17 +00:00
Matt Arsenault	4fb580c314	AMDGPU: Remove amdgpu-max-work-group-size attribute This has been deprecated for a long time, and mesa recently switched to amdgpu-flat-work-group-size. llvm-svn: 362641	2019-06-05 20:32:32 +00:00
Dan Gohman	53572d0470	[WebAssembly] Limit PIC support to the Emscripten target The current PIC support currently only works with Emscripten, so disable it for other targets. This is the PIC portion of https://reviews.llvm.org/D62542. Reviewed By: dschuff, sbc100 llvm-svn: 362638	2019-06-05 20:01:01 +00:00
Simon Pilgrim	036fa5346f	[X86][SSE] Add vector tests to cover more isNegatibleForFree/GetNegatedExpression cases (PR42105) Some already combine correctly, but vector constant analysis is weak. llvm-svn: 362633	2019-06-05 18:55:54 +00:00
Cameron McInally	8b83a9c6b1	[NFC][Reassociate] Fix mistake in 468b2ad Missed 2 'fast fsub(0.0,X) -> fneg(X)' changes. llvm-svn: 362631	2019-06-05 18:50:07 +00:00
Cameron McInally	5162266515	[NFC][Reassociate] Add unary fneg tests to fast-basictest.ll llvm-svn: 362630	2019-06-05 18:35:54 +00:00
Craig Topper	d0fff89b81	[X86] Add the vector integer min/max instructions to isAssociativeAndCommutative. As far as I know these should be freely reassociatable just like the floating point MAXC/MINC instructions. The reduce test changes are largely regressions and caused by the "generic" CPU we default to not having a scheduler model. The machine-combiner-int-vec.ll test shows the positive benefits of this change. Differential Revision: https://reviews.llvm.org/D62787 llvm-svn: 362629	2019-06-05 18:25:09 +00:00
Philip Reames	13dd125043	[Tests] Add poison inference tests for indvars showing both existing transforms, and some room for improvement llvm-svn: 362628	2019-06-05 18:00:59 +00:00
Cameron McInally	0a31726d20	[NFC][Reassociate] Regenerate CHECKs for fast-basictest.ll llvm-svn: 362627	2019-06-05 18:00:27 +00:00
Sanjay Patel	2bf82879bd	[x86] split more 256-bit stores of concatenated vectors As suggested in D62498 - collectConcatOps() matches both concat_vectors and insert_subvector patterns, and we see more test improvements by using the more general match. llvm-svn: 362620	2019-06-05 16:40:57 +00:00
Simon Pilgrim	a0e350e640	[X86][SSE] Add additional nt-load test cases as discussed on D62910 llvm-svn: 362616	2019-06-05 16:11:57 +00:00

... 7 8 9 10 11 ...

62877 Commits