llvm-project

Commit Graph

Author	SHA1	Message	Date
Reid Kleckner	592a193285	Revert [SLP] Look-ahead operand reordering heuristic. This reverts r364084 (git commit `5698921be2`) It caused crashes while compiling a file in Chrome. Reduction forthcoming. llvm-svn: 364111	2019-06-21 23:10:25 +00:00
Shoaib Meenai	6442317219	[llvm-lipo] Implement -thin Creates thin output file of specified arch_type from the fat input file. Patch by Anusha Basana <anushabasana@fb.com> Differential Revision: https://reviews.llvm.org/D63341 llvm-svn: 364107	2019-06-21 21:59:01 +00:00
Julian Lettner	19c4d660f4	[ASan] Use dynamic shadow on 32-bit iOS and simulators The VM layout on iOS is not stable between releases. On 64-bit iOS and its derivatives we use a dynamic shadow offset that enables ASan to search for a valid location for the shadow heap on process launch rather than hardcode it. This commit extends that approach for 32-bit iOS plus derivatives and their simulators. rdar://50645192 rdar://51200372 rdar://51767702 Reviewed By: delcypher Differential Revision: https://reviews.llvm.org/D63586 llvm-svn: 364105	2019-06-21 21:01:39 +00:00
Craig Topper	f5a5785632	[X86] Add test cases for incorrect shrinking of volatile vector loads from 128-bits to 32 or 64 bits. NFC This is caused by isel patterns that look for vzmovl+load and treat it the same as vzload. llvm-svn: 364101	2019-06-21 20:16:26 +00:00
Matt Arsenault	22e3dc60a0	AMDGPU: Fix not using s33 for scratch wave offset in kernels Fixes missing piece from r363990. llvm-svn: 364099	2019-06-21 20:04:02 +00:00
Craig Topper	4649a051bf	[X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into (insert_subvector allzeros, (vzmovl X), 0) 128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg. This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns. Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining. I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload. Differential Revision: https://reviews.llvm.org/D63512 llvm-svn: 364095	2019-06-21 19:10:21 +00:00
Craig Topper	4569cdbcf5	[X86] Don't mark v64i8/v32i16 ISD::SELECT as custom unless they are legal types. We don't have any Custom handling during type legalization. Only operation legalization. Fixes PR42355 llvm-svn: 364093	2019-06-21 18:50:00 +00:00
Craig Topper	91ea99295c	[X86] Add avx512bw command lines to avx512-select.ll Prep for fixing PR42355 and ensuring we have coverage of ISD::SELECT for v64i8/v32i16 on KNL and SKX configs. llvm-svn: 364092	2019-06-21 18:49:42 +00:00
Simon Pilgrim	5dba4ed208	[X86][AVX] Combine INSERT_SUBVECTOR(SRC0, EXTRACT_SUBVECTOR(SRC1)) as shuffle Subvector shuffling often ends up as insert/extract subvector. llvm-svn: 364090	2019-06-21 18:35:04 +00:00
Amara Emerson	6e71b34fe6	[AArch64][GlobalISel] Implement selection support for the new G_JUMP_TABLE and G_BRJT ops. With this we can now fully code generate jump tables, which is important for code size. Differential Revision: https://reviews.llvm.org/D63223 llvm-svn: 364086	2019-06-21 18:10:41 +00:00
Amara Emerson	fe4625fb24	[GlobalISel][IRTranslator] Change switch table translation to generate jump tables and range checks. This change makes use of the newly refactored SwitchLoweringUtils code from SelectionDAG to in order to generate jump tables and range checks where appropriate. Much of this code is ported from SDAG with some modifications. We generate G_JUMP_TABLE and G_BRJT instructions when JT opportunities are found. This means that targets which previously relied on the naive one MBB per case stmt translation will now start falling back until they add support for the new opcodes. For range checks, we don't generate any previously unused operations. This just recognizes contiguous ranges of case values and generates a single block per range. Single case value blocks are just a special case of ranges so we get that support almost for free. There are still some optimizations missing that I haven't ported over, and bit-tests are also unimplemented. This patch series is already complex enough. Actual arm64 support for selection of jump tables is coming in a later patch. Differential Revision: https://reviews.llvm.org/D63169 llvm-svn: 364085	2019-06-21 18:10:38 +00:00
Simon Pilgrim	5698921be2	[SLP] Look-ahead operand reordering heuristic. This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example). Committed on behalf of @vporpo (Vasileios Porpodas) Differential Revision: https://reviews.llvm.org/D60897 llvm-svn: 364084	2019-06-21 17:57:01 +00:00
David Bolvansky	2441a4074c	[NFC] Update shl-sub tests llvm-svn: 364083	2019-06-21 17:51:18 +00:00
Sanjay Patel	f483617256	[InstCombine] add tests for ctpop folds; NFC llvm-svn: 364082	2019-06-21 17:44:09 +00:00
Craig Topper	6af1be9664	[X86] Use vmovq for v4i64/v4f64/v8i64/v8f64 vzmovl. We already use vmovq for v2i64/v2f64 vzmovl. But we were using a blendpd+xorpd for v4i64/v4f64/v8i64/v8f64 under opt speed. Or movsd+xorpd under optsize. I think the blend with 0 or movss/d is only needed for vXi32 where we don't have an instruction that can move 32 bits from one xmm to another while zeroing upper bits. movq is no worse than blendpd on any known CPUs. llvm-svn: 364079	2019-06-21 17:24:21 +00:00
Amara Emerson	8f25a021dd	[AArch64][GlobalISel] Make s8 and s16 G_CONSTANTs legal. We sometimes get poor code size because constants of types < 32b are legalized as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the localizer can no longer sink them (although it's possible to extend it to do so). On AArch64 however s8 and s16 constants can be selected in the same way as s32 constants, with a mov pseudo into a W register. If we make s8 and s16 constants legal then we can avoid unnecessary truncates, they can be CSE'd, and the localizer can sink them as normal. There is a caveat: if the user of a smaller constant has to widen the sources, we end up with an anyext of the smaller typed G_CONSTANT. This can cause regressions because of the additional extend and missed pattern matching. To remedy this, there's a new artifact combiner to generate the wider G_CONSTANT if it's legal for the target. Differential Revision: https://reviews.llvm.org/D63587 llvm-svn: 364075	2019-06-21 16:43:50 +00:00
Stanislav Mekhanoshin	bdf7f81b89	[AMDGPU] hazard recognizer for fp atomic to s_denorm_mode This requires 3 wait states unless there is a wait or VALU in between. Differential Revision: https://reviews.llvm.org/D63619 llvm-svn: 364074	2019-06-21 16:30:14 +00:00
David Bolvansky	dbcdad51ff	[InstCombine] (1 << (C - x)) -> ((1 << C) >> x) if C is bitwidth - 1 Summary: ``` %a = sub i32 31, %x %r = shl i32 1, %a => %d = shl i32 1, 31 %r = lshr i32 %d, %x Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/btZm Reviewers: spatel, lebedev.ri, nikic Reviewed By: lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63652 llvm-svn: 364073	2019-06-21 16:25:32 +00:00
David Bolvansky	045b0f60b6	[NFC] Added more tests for D63652 llvm-svn: 364069	2019-06-21 16:14:13 +00:00
David Bolvansky	4b28478389	[InstCombine] cttz(abs(x)) -> cttz(x) Summary: Signedness does not change number of trailing zeros. Reviewers: spatel, lebedev.ri, nikic Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D63546 llvm-svn: 364064	2019-06-21 15:26:22 +00:00
Sanjay Patel	ddb9093684	[GVNSink] prevent crashing on mismatched instructions (PR42346) Patch based on suggestion by James Molloy (@jmolloy) in: https://bugs.llvm.org/show_bug.cgi?id=42346 llvm-svn: 364062	2019-06-21 15:17:24 +00:00
David Bolvansky	b0ba049f58	[NFC] Added tests for (1 << (C - x)) -> ((1 << C) >> x) llvm-svn: 364060	2019-06-21 15:00:31 +00:00
George Rimar	fa1c7d9bdf	[llvm-objcopy] - Get rid of dynrel.elf precompiled binary from inputs. We do not have to spread using the precompiled binaries in the tests, when we can use YAML. This patch removes the dynrel.elf binary and adds a few comments to the test cases. Differential revision: https://reviews.llvm.org/D63641 llvm-svn: 364052	2019-06-21 14:15:15 +00:00
Jay Foad	d9d3c91b48	[Scalarizer] Propagate IR flags Summary: The motivation for this was to propagate fast-math flags like nnan and ninf on vector floating point operations to the corresponding scalar operations to take advantage of follow-on optimizations. But I think the same argument applies to all of our IR flags: if they apply to the vector operation then they also apply to all the individual scalar operations, and they might enable follow-on optimizations. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63593 llvm-svn: 364051	2019-06-21 14:10:18 +00:00
George Rimar	0a32c07cd7	[llvm-readobj] - Inline a few yaml inputs into test cases. There are some test that are splitted into main part + input yaml for no visible reason. This patch inines the yaml part for the 3 test cases I found. Differential revision: https://reviews.llvm.org/D63644 llvm-svn: 364049	2019-06-21 14:07:35 +00:00
Andrea Di Biagio	dd0dc19b1c	Set an explicit x86 triple for test bottleneck-analysis.s added by my r364045. NFC This should unbreak the ppc64 buildbots. llvm-svn: 364048	2019-06-21 14:05:58 +00:00
Sam Elliott	96c8bc7956	[RISCV] Add RISCV-specific TargetTransformInfo Summary: LLVM Allows Targets to provide information that guides optimisations made to LLVM IR. This is done with callbacks on a TargetTransformInfo object. This patch adds a TargetTransformInfo class for RISC-V. This will allow us to implement RISC-V specific callbacks as they become necessary. This commit also adds the getIntImmCost callbacks, and tests them with a simple constant hoisting test. Our immediate costs are on the conservative side, for the moment, but we prevent hoisting in most circumstances anyway. Previous review was on D63007 Reviewers: asb, luismarques Reviewed By: asb Subscribers: ributzka, MaskRay, llvm-commits, Jim, benna, psnobl, jocewei, PkmX, rkruppe, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, apazos, simoncook, johnrusso, rbar, hiraditya, mgorny Tags: #llvm Differential Revision: https://reviews.llvm.org/D63433 llvm-svn: 364046	2019-06-21 13:36:09 +00:00
Andrea Di Biagio	aa9b6468bd	[MCA][Bottleneck Analysis] Teach how to compute a critical sequence of instructions based on the simulation. This patch teaches the bottleneck analysis how to identify and print the most expensive sequence of instructions according to the simulation. Fixes PR37494. The goal is to help users identify the sequence of instruction which is most critical for performance. A dependency graph is internally used by the bottleneck analysis to describe data dependencies and processor resource interferences between instructions. There is one node in the graph for every instruction in the input assembly sequence. The number of nodes in the graph is independent from the number of iterations simulated by the tool. It means that a single node of the graph represents all the possible instances of a same instruction contributed by the simulated iterations. Edges are dynamically "discovered" by the bottleneck analysis by observing instruction state transitions and "backend pressure increase" events generated by the Execute stage. Information from the events is used to identify critical dependencies, and materialize edges in the graph. A dependency edge is uniquely identified by a pair of node identifiers plus an instance of struct DependencyEdge::Dependency (which provides more details about the actual dependency kind). The bottleneck analysis internally ranks dependency edges based on their impact on the runtime (see field DependencyEdge::Dependency::Cost). To this end, each edge of the graph has an associated cost. By default, the cost of an edge is a function of its latency (in cycles). In practice, the cost of an edge is also a function of the number of cycles where the dependency has been seen as 'contributing to backend pressure increases'. The idea is that the higher the cost of an edge, the higher is the impact of the dependency on performance. To put it in another way, the cost of an edge is a measure of criticality for performance. Note how a same edge may be found in multiple iteration of the simulated loop. The logic that adds new edges to the graph checks if an equivalent dependency already exists (duplicate edges are not allowed). If an equivalent dependency edge is found, field DependencyEdge::Frequency of that edge is incremented by one, and the new cost is cumulatively added to the existing edge cost. At the end of simulation, costs are propagated to nodes through the edges of the graph. The goal is to identify a critical sequence from a node of the root-set (composed by node of the graph with no predecessors) to a 'sink node' with no successors. Note that the graph is intentionally kept acyclic to minimize the complexity of the critical sequence computation algorithm (complexity is currently linear in the number of nodes in the graph). The critical path is finally computed as a sequence of dependency edges. For edges describing processor resource interferences, the view also prints a so-called "interference probability" value (by dividing field DependencyEdge::Frequency by the total number of iterations). Examples of critical sequence computations can be found in tests added/modified by this patch. On output streams that support colored output, instructions from the critical sequence are rendered with a different color. Strictly speaking the analysis conducted by the bottleneck analysis view is not a critical path analysis. The cost of an edge doesn't only depend on the dependency latency. More importantly, the cost of a same edge may be computed differently by different iterations. The number of dependencies is discovered dynamically based on the events generated by the simulator. However, their number is not fixed. This is especially true for edges that model processor resource interferences; an interference may not occur in every iteration. For that reason, it makes sense to also print out a "probability of interference". By construction, the accuracy of this analysis (as always) is strongly dependent on the simulation (and therefore the quality of the information available in the scheduling model). That being said, the critical sequence effectively identifies a performance criticality. Instructions from that sequence are expected to have a very big impact on performance. So, users can take advantage of this information to focus their attention on specific interactions between instructions. In my experience, it works quite well in practice, and produces useful output (in a reasonable amount time). Differential Revision: https://reviews.llvm.org/D63543 llvm-svn: 364045	2019-06-21 13:32:54 +00:00
Simon Tatham	0c7af66450	[ARM] Add MVE 64-bit GPR <-> vector move instructions. These instructions let you load half a vector register at once from two general-purpose registers, or vice versa. The assembly syntax for these instructions mentions the vector register name twice. For the move _into_ a vector register, the MC operand list also has to mention the register name twice (once as the output, and once as an input to represent where the unchanged half of the output register comes from). So we can conveniently assign one of the two asm operands to be the output $Qd, and the other $QdSrc, which avoids confusing the auto-generated AsmMatcher too much. For the move _from_ a vector register, there's no way to get round the fact that both instances of that register name have to be inputs, so we need a custom AsmMatchConverter to avoid generating two separate output MC operands. (And even that wouldn't have worked if it hadn't been for D60695.) Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62679 llvm-svn: 364041	2019-06-21 13:17:23 +00:00
Simon Tatham	bafb105e96	[ARM] Add MVE vector instructions that take a scalar input. This adds the `MVE_qDest_rSrc` superclass and all its instances, plus a few other instructions that also take a scalar input register or two. I've also belatedly added custom diagnostic messages to the operand classes for odd- and even-numbered GPRs, which required matching changes in two of the existing MVE assembly test files. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62678 llvm-svn: 364040	2019-06-21 13:17:08 +00:00
Paul Robinson	26cc5bcb1a	Fix a crash with assembler source and -g. llvm-mc or clang with -g normally produces debug info describing the assembler source itself; however, if that source already contains some .file/.loc directives, we should instead emit the debug info described by those directives. For certain assembler sources seen in the wild (particularly in the Chrome build) this was causing a crash due to incorrect assumptions about legal sequences of assembler source text. Fixes PR38994. Differential Revision: https://reviews.llvm.org/D63573 llvm-svn: 364039	2019-06-21 13:10:19 +00:00
Simon Pilgrim	36a999ffb8	[X86] X86ISD::ANDNP is a (non-commutative) binop The sat add/sub tests still have unnecessary extract_subvector((vandnps ymm, ymm), 0) uses that should be split to (vandnps (extract_subvector(ymm, 0), extract_subvector(ymm, 0)), but its getting better. llvm-svn: 364038	2019-06-21 12:42:39 +00:00
Simon Tatham	a6b6a15701	[ARM] Add a batch of similarly encoded MVE instructions. Summary: This adds the `MVE_qDest_qSrc` superclass and all instructions that inherit from it. It's not the complete class of _everything_ with a q-register as both destination and source; it's a subset of them that all have similar encodings (but it would have been hopelessly unwieldy to call it anything like MVE_111x11100). This category includes add/sub with carry; long multiplies; halving multiplies; multiply and accumulate, and some more complex instructions. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62677 llvm-svn: 364037	2019-06-21 12:13:59 +00:00
James Henderson	9485b265e8	[binutils] Add response file option to help and docs Many LLVM-based tools already support response files (i.e. files containing a list of options, specified with '@'). This change simply updates the documentation and help text for some of these tools to include it. I haven't attempted to fix all tools, just a selection that I am interested in. I've taken the opportunity to add some tests for --help behaviour, where they were missing. We could expand these tests, but I don't think that's within scope of this patch. This fixes https://bugs.llvm.org/show_bug.cgi?id=42233 and https://bugs.llvm.org/show_bug.cgi?id=42236. Reviewed by: grimar, MaskRay, jkorous Differential Revision: https://reviews.llvm.org/D63597 llvm-svn: 364036	2019-06-21 11:49:20 +00:00
James Henderson	beb2493fb7	[llvm-dwarfdump] Remove unnecessary explicit -h behaviour --help and -h are automatically supported by the command-line parser, unless overridden by the tool. The behaviour of the PrintHelpMessage being used for -h prior to this patch is subtly different to that provided by --help automatically (it omits certain elements of help text and options, such as --help-list), so overriding the default is not desirable, without good reason. This patch removes the explicit specification of -h and its behaviour, so that the default behaviour is used. Reviewed by: hintonda Differential Revision: https://reviews.llvm.org/D63565 llvm-svn: 364029	2019-06-21 11:22:20 +00:00
Simon Tatham	7d76f8acf0	[ARM] Add MVE vector compare instructions. Summary: These take a pair of vector register to compare, and a comparison type (written in the form of an Arm condition suffix); they output a vector of booleans in the VPR register, where predication can conveniently use them. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62676 llvm-svn: 364027	2019-06-21 11:14:51 +00:00
Simon Pilgrim	771c33e375	[X86][AVX] isNOT - handle concat_vectors(xor X, -1, xor Y, -1) pattern llvm-svn: 364022	2019-06-21 10:44:15 +00:00
Simon Tatham	c9b2cd4674	[ARM] Add a batch of MVE floating-point instructions. Summary: This includes floating-point basic arithmetic (add/sub/multiply), complex add/multiply, unary negation and absolute value, rounding to integer value, and conversion to/from integer formats. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62675 llvm-svn: 364013	2019-06-21 09:35:07 +00:00
Yevgeny Rouban	d5e1ce3f44	[LICM & MSSA] Fixed test to run only with assertions enabled as it uses -debug-only llvm-svn: 364005	2019-06-21 04:49:40 +00:00
Amara Emerson	bc0d08e0ee	[GlobalISel][Localizer] Allow localization of G_INTTOPTR and chains of instructions. G_INTTOPTR can prevent the localizer from moving G_CONSTANTs, but since it's essentially a side effect free cast instruction we can remat both instructions. This patch changes the localizer to enable localization of the chains by iterating over the entry block instructions in reverse order. That way, uses will localized first, and then the defs are free to be localized as well. This also changes the previous SmallPtrSet of localized instructions to use a SetVector instead. We're dealing with pointers and need deterministic iteration order. Overall, this change improves ARM64 -O0 CTMark code size by around 0.7% geomean. Differential Revision: https://reviews.llvm.org/D63630 llvm-svn: 364001	2019-06-21 00:36:19 +00:00
Cameron McInally	1c0bd6dd2c	[Reassociate] Remove bogus assert reported in PR42349. Also, add a FIXME for the unsafe transform on a unary FNeg. A unary FNeg can only be transformed to a FMul by -1.0 when the nnan flag is present. The unary FNeg project is a WIP, so the unsafe transformation is acceptable until that work is complete. The bogus assert with introduced in D63445. llvm-svn: 363998	2019-06-20 23:03:55 +00:00
Sanjay Patel	b342f026a4	[InstSimplify] simplify power-of-2 (single bit set) sequences As discussed in PR42314: https://bugs.llvm.org/show_bug.cgi?id=42314 Improving the canonicalization for these patterns: rL363956 ...means we should adjust/enhance the related simplification. https://rise4fun.com/Alive/w1cp Name: isPow2 or zero %x = and i32 %xx, 2048 %a = add i32 %x, -1 %r = and i32 %a, %x => %r = i32 0 llvm-svn: 363997	2019-06-20 22:55:28 +00:00
Eli Friedman	45270054bc	[ARM GlobalISel] Tests for s64 G_ADD and G_SUB. Forgot to commit these in r363989 (https://reviews.llvm.org/D63585) llvm-svn: 363991	2019-06-20 22:00:07 +00:00
Matt Arsenault	d88db6d7fc	AMDGPU: Always use s33 for global scratch wave offset Every called function could possibly need this to calculate the absolute address of stack objectst, and this avoids inserting a copy around every call site in the kernel. It's also somewhat cleaner to keep this in a callee saved SGPR. llvm-svn: 363990	2019-06-20 21:58:24 +00:00
Rainer Orth	6fde832b82	[profile] Solaris ld supports __start___llvm_prof_data etc. labels Currently, many profiling tests on Solaris FAIL like Command Output (stderr): -- Undefined first referenced symbol in file __llvm_profile_register_names_function /tmp/lit_tmp_Nqu4eh/infinite_loop-9dc638.o __llvm_profile_register_function /tmp/lit_tmp_Nqu4eh/infinite_loop-9dc638.o Solaris 11.4 ld supports the non-standard GNU ld extension of adding __start_SECNAME and __stop_SECNAME labels to sections whose names are valid as C identifiers. Given that we already use Solaris 11.4-only features like ld -z gnu-version-script-compat and fully working .preinit_array support in compiler-rt, we don't need to worry about older versions of Solaris ld. The patch documents that support (although the comment in lib/Transforms/Instrumentation/InstrProfiling.cpp (needsRuntimeRegistrationOfSectionRange) is quite cryptic what it's actually about), and adapts the affected testcase not to expect the alternativeq __llvm_profile_register_functions and __llvm_profile_init. It fixes all affected tests. Tested on amd64-pc-solaris2.11. Differential Revision: https://reviews.llvm.org/D41111 llvm-svn: 363984	2019-06-20 21:27:06 +00:00
Matt Arsenault	740322f1eb	AMDGPU: Add intrinsics for DS GWS semaphore instructions llvm-svn: 363983	2019-06-20 21:11:42 +00:00
Alina Sbirlea	d0b11698cd	[LICM & MSSA] Limit unsafe sinking and hoisting. Summary: The getClobberingMemoryAccess API checks for clobbering accesses in a loop by walking the backedge. This may check if a memory access is being clobbered by the loop in a previous iteration, depending how smart AA got over the course of the updates in MemorySSA (it does not occur when built from scratch). If no clobbering access is found inside the loop, it will optimize to an access outside the loop. This however does not mean that access is safe to sink. Given: ``` for i load a[i] store a[i] ``` The access corresponding to the load can be optimized to outside the loop, and the load can be hoisted. But it is incorrect to sink it. In order to sink the load, we'd need to check no Def clobbers the Use in the same iteration. With this patch we currently restrict sinking to either Defs not existing in the loop, or Defs preceding the load in the same block. An easy extension is to ensure the load (Use) post-dominates all Defs. Caught by PR42294. This issue also shed light on the converse problem: hoisting stores in this same scenario would be illegal. With this patch we restrict hoisting of stores to the case when their corresponding Defs are dominating all Uses in the loop. Reviewers: george.burgess.iv Subscribers: jlebar, Prazek, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63582 llvm-svn: 363982	2019-06-20 21:09:09 +00:00
Sanjay Patel	3207566dd6	[InstSimplify] add tests for known-not-a-power-of-2; NFC I added a canonicalization to create this general pattern in: rL363956 But as noted in PR42314: https://bugs.llvm.org/show_bug.cgi?id=42314#c11 ...we have a (potentially expensive) simplification for the version of the code that we just canonicalized away from, so we should add/adjust that code to match. llvm-svn: 363981	2019-06-20 21:04:14 +00:00
Matt Arsenault	8ad1decf45	AMDGPU: Insert mem_viol check loop around GWS pre-GFX9 It is necessary to emit this loop around GWS operations in case the wave is preempted pre-GFX9. llvm-svn: 363979	2019-06-20 20:54:32 +00:00
Cameron McInally	9589db7a98	[NFC][SLP] Pre-commit unary FNeg test to X86/propagate_ir_flags.ll llvm-svn: 363978	2019-06-20 20:53:51 +00:00
Leonard Chan	108a946319	Update LLVM test to not check for the EliminateAvailableExternallyPass for lto-pre-link O2 pipeline runs. llvm-svn: 363977	2019-06-20 20:51:58 +00:00
Leonard Chan	97dc622ab3	[clang][NewPM] Do not eliminate available_externally durng `-O2 -flto` runs This fixes CodeGen/available-externally-suppress.c when the new pass manager is turned on by default. available_externally was not emitted during -O2 -flto runs when it should still be retained for link time inlining purposes. This can be fixed by checking that we aren't LTOPrelinking when adding the EliminateAvailableExternallyPass. Differential Revision: https://reviews.llvm.org/D63580 llvm-svn: 363971	2019-06-20 19:44:51 +00:00
David Bolvansky	642ed40e57	[NFC] Add more tests for D46262 llvm-svn: 363970	2019-06-20 19:39:15 +00:00
David Bolvansky	e0c1c3baf9	[NFC] Updated tests for D63546 llvm-svn: 363967	2019-06-20 19:30:56 +00:00
Craig Topper	9e1665f2d6	[X86] Add BLSI to isUseDefConvertible. Summary: BLSI sets the C flag is the input is not zero. So if its followed by a TEST of the input where only the Z flag is consumed, we can replace it with the opposite check of the C flag. We should be able to do the same for BLSMSK and BLSR, but the naive test case for those is being optimized to a subo by CodeGenPrepare. Reviewers: spatel, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63589 llvm-svn: 363957	2019-06-20 17:52:53 +00:00
Sanjay Patel	63311bfb83	[InstCombine] canonicalize check for power-of-2 The form that compares against 0 is better because: 1. It removes a use of the input value. 2. It's the more standard form for this pattern: https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2 3. It results in equal or better codegen (tested with x86, AArch64, ARM, PowerPC, MIPS). This is a root cause for PR42314, but probably doesn't completely answer the codegen request: https://bugs.llvm.org/show_bug.cgi?id=42314 Alive proof: https://rise4fun.com/Alive/9kG Name: is power-of-2 %neg = sub i32 0, %x %a = and i32 %neg, %x %r = icmp eq i32 %a, %x => %dec = add i32 %x, -1 %a2 = and i32 %dec, %x %r = icmp eq i32 %a2, 0 Name: is not power-of-2 %neg = sub i32 0, %x %a = and i32 %neg, %x %r = icmp ne i32 %a, %x => %dec = add i32 %x, -1 %a2 = and i32 %dec, %x %r = icmp ne i32 %a2, 0 llvm-svn: 363956	2019-06-20 17:41:15 +00:00
Philip Reames	8c80d08052	[Tests] Add a tricky LFTR case for documentation purposes Thought of this case while working on something else. We appear to get it right in all of the variations I tried, but that's by accident. So, add a test which would catch the potential bug. llvm-svn: 363953	2019-06-20 17:16:53 +00:00
Amy Huang	7fac5c8d94	Store a pointer to the return value in a static alloca and let the debugger use that as the variable address for NRVO variables. Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D63361 llvm-svn: 363952	2019-06-20 17:15:21 +00:00
David Bolvansky	01511192b2	[InstCombine] cttz(-x) -> cttz(x) Summary: Signedness does not change number of trailing zeros. Reviewers: spatel, lebedev.ri, nikic Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63534 llvm-svn: 363951	2019-06-20 17:04:14 +00:00
Matt Arsenault	5dbe4a9926	AMDGPU: Eliminate test usage of legacy FP elim attributes llvm-svn: 363950	2019-06-20 17:03:27 +00:00
Matt Arsenault	5dc457cbe4	AMDGPU: Fix ignoring DisableFramePointerElim in leaf functions The attribute can specify elimination for leaf or non-leaf, so it should always be considered. I copied this bug from AArch64, which probably should also be fixed. llvm-svn: 363949	2019-06-20 17:03:23 +00:00
Evandro Menezes	aa10f05044	[CodeGen] Fix formatting and comments (NFC) llvm-svn: 363947	2019-06-20 16:34:00 +00:00
Stanislav Mekhanoshin	e917b3b4b8	[AMDGPU] gfx10 tests. NFC. llvm-svn: 363946	2019-06-20 16:29:40 +00:00
Sanjay Patel	d729ed8d44	[InstCombine] add commuted variants for power-of-2 checks; NFC llvm-svn: 363945	2019-06-20 16:27:23 +00:00
Matt Arsenault	b7f87c0ecf	AMDGPU: Treat undef as an inline immediate This should only matter in vectors with an undef component, since a full undef vector would have been folded out. llvm-svn: 363941	2019-06-20 16:01:09 +00:00
Matt Arsenault	fcce531752	AMDGPU: Make test functions hidden Reduces amount of code in the function from eliminating the GOT load. llvm-svn: 363940	2019-06-20 15:38:30 +00:00
Sanjay Patel	345473c791	[InstCombine] add tests for checking power-of-2; NFC llvm-svn: 363938	2019-06-20 15:25:18 +00:00
Cameron McInally	4452c3b490	[NFC][SLP] Pre-commit unary FNeg test to X86/phi3.ll llvm-svn: 363937	2019-06-20 15:17:17 +00:00
Simon Tatham	232db11020	[ARM] Add a batch of MVE integer instructions. This includes integer arithmetic of various kinds (add/sub/multiply, saturating and not), and the immediate forms of VMOV and VMVN that load an immediate into all lanes of a vector. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62674 llvm-svn: 363936	2019-06-20 15:16:56 +00:00
Stanislav Mekhanoshin	0846c125f9	[AMDGPU] gfx1010 core wave32 changes Differential Revision: https://reviews.llvm.org/D63204 llvm-svn: 363934	2019-06-20 15:08:34 +00:00
Simon Pilgrim	1d8093249f	[DAGCombiner] Support (shl (zext (srl x, C)), C) -> (zext (shl (srl x, C), C)) non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. llvm-svn: 363929	2019-06-20 14:42:27 +00:00
Simon Pilgrim	72186a2494	[SLP][X86] Add lookahead reordering tests from D60897 llvm-svn: 363925	2019-06-20 12:52:58 +00:00
Fangrui Song	7064a437f8	[llvm-nm] Generalize ELF symbol types 'N' and 'n' Reviewed By: grimar, jhenderson Differential Revision: https://reviews.llvm.org/D63588 llvm-svn: 363918	2019-06-20 10:15:11 +00:00
Petar Avramovic	153bd24eda	[MIPS GlobalISel] Select integer to floating point conversions Select G_SITOFP and G_UITOFP for MIPS32. Differential Revision: https://reviews.llvm.org/D63542 llvm-svn: 363912	2019-06-20 09:05:02 +00:00
Petar Avramovic	4b4dae1c76	[MIPS GlobalISel] Select floating point to integer conversions Select G_FPTOSI and G_FPTOUI for MIPS32. Differential Revision: https://reviews.llvm.org/D63541 llvm-svn: 363911	2019-06-20 08:52:53 +00:00
Craig Topper	3ba20e943e	[X86] Add test cases showing missed opportunities to use the C flag from the BLSI instruction to avoid a TEST instruction llvm-svn: 363909	2019-06-20 06:45:01 +00:00
Matt Arsenault	c67c484f36	AMDGPU: Don't clobber VCC in MUBUF addr64 emulation Introducing VCC defs during SIFixSGPRCopies is generally problematic. Avoid it by starting with the VOP3 form with the general condition register. This is the easiest to fix instance, but doesn't solve any specific problems I'm looking at. llvm-svn: 363904	2019-06-20 00:51:28 +00:00
Eli Friedman	d88e28d13e	[llvm-objdump] Switch between ARM/Thumb based on mapping symbols. The ARMDisassembler changes allow changing between ARM and Thumb mode based on the MCSubtargetInfo, rather than the Target, which simplifies the other changes a bit. I'm not really happy with adding more target-specific logic to tools/llvm-objdump/, but there isn't any easy way around it: the logic in question specifically applies to disassembling an object file, and that code simply isn't located in lib/Target, at least at the moment. Differential Revision: https://reviews.llvm.org/D60927 llvm-svn: 363903	2019-06-20 00:29:40 +00:00
Thomas Preud'homme	a2ef1ba32f	[FileCheck] Stop qualifying expressions as numeric Summary: Stop referring to "numeric expression", using simply the term "expression" instead. Likewise for numeric operation since operations are only used in numeric expressions. Reviewers: jhenderson, jdenny, probinson, arichardson Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63500 llvm-svn: 363901	2019-06-19 23:47:24 +00:00
Thomas Preud'homme	baae41ff76	FileCheck: Return parse error w/ Error & Expected Summary: Make use of Error and Expected to bubble up diagnostics and force checking of errors in the callers. Reviewers: jhenderson, jdenny, probinson, arichardson Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63125 llvm-svn: 363900	2019-06-19 23:47:10 +00:00
Matt Arsenault	e24b34e9c9	AMDGPU: Undo sub x, c canonicalization for v2i16 Should avoid regression from D62341 llvm-svn: 363899	2019-06-19 23:37:43 +00:00
Matt Arsenault	532be255a5	AMDGPU: Add baseline test for vector sub x, c canonicalization This will catch regressions from D62341, and show improvements from a future patch to fix them. llvm-svn: 363888	2019-06-19 22:37:08 +00:00
Simon Atanasyan	f61c43c636	[mips] Mark the `lwupc` instruction as MIPS64 R6 only The "The MIPS64 Instruction Set Reference Manual" [1] states that the `lwupc` is MIPS64 Release 6 only. It should not be supported for 32-bit CPUs. [1] https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00087-2B-MIPS64BIS-AFP-6.06.pdf llvm-svn: 363886	2019-06-19 22:08:06 +00:00
Philip Reames	eda1ba65ca	LFTR for multiple exit loops Teach IndVarSimply's LinearFunctionTestReplace transform to handle multiple exit loops. LFTR does two key things 1) it rewrites (all) exit tests in terms of a common IV potentially eliminating one in the process and 2) it moves any offset/indexing/f(i) style logic out of the loop. This turns out to actually be pretty easy to implement. SCEV already has all the information we need to know what the backedge taken count is for each individual exit. (We use that when computing the BE taken count for the loop as a whole.) We basically just need to iterate through the exiting blocks and apply the existing logic with the exit specific BE taken count. (The previously landed NFC makes this super obvious.) I chose to go ahead and apply this to all loop exits instead of only latch exits as originally proposed. After reviewing other passes, the only case I could find where LFTR form was harmful was LoopPredication. I've fixed the latch case, and guards aren't LFTRed anyways. We'll have some more work to do on the way towards widenable_conditions, but that's easily deferred. I do want to note that I added one bit after the review. When running tests, I saw a new failure (no idea why didn't see previously) which pointed out LFTR can rewrite a constant condition back to a loop varying one. This was theoretically possible with a single exit, but the zero case covered it in practice. With multiple exits, we saw this happening in practice for the eliminate-comparison.ll test case because we'd compute a ExitCount for one of the exits which was guaranteed to never actually be reached. Since LFTR ran after simplifyAndExtend, we'd immediately turn around and undo the simplication work we'd just done. The solution seemed obvious, so I didn't bother with another round of review. Differential Revision: https://reviews.llvm.org/D62625 llvm-svn: 363883	2019-06-19 21:58:25 +00:00
Philip Reames	80eb1ce7a0	[Tests] Autogen a test so that future changes are understandable llvm-svn: 363882	2019-06-19 21:39:07 +00:00
Alina Sbirlea	109d2ea153	[MemorySSA] Cleanup trivial phis. Summary: This is unfortunately needed for correctness, if we are to extend the tolerance of the update API to the way simple loop unswitch is doing cloning. In simple loop unswitch (as opposed to loop unswitch), not all blocks are cloned. This can create unreachable cloned blocks (no predecessor), which are later cleaned up. In MemorySSA, the APIs for supporting these kind of updates (clone + update exit blocks), make certain assumption on the integrity of the CFG. When cloning, if something was not cloned, it's values in MemorySSA default to LiveOnEntry. When updating exit blocks, it is safe to assume that we can first insert phis in the blocks merging two clones, then add additional phis in the IDF of the blocks that received phis. This no longer holds true if one of the clones being merged comes from an unreachable block. We'd conservatively need to add all phis before filling in their incoming definitions. In practice this restriction can be relaxed if we clean up trivial phis after the first round of insertion. Reviewers: george.burgess.iv Subscribers: jlebar, Prazek, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63354 llvm-svn: 363880	2019-06-19 21:33:09 +00:00
Matt Arsenault	4d000d2488	AMDGPU: Fix folding immediate into readfirstlane through reg_sequence The def instruction for the vreg may not match, because it may be folding through a reg_sequence. The assert was overly conservative and not necessary. It's not actually important if DefMI really defined the register, because the fold that will be done cares about the def of the value that will be folded. For some reason copies aren't making it through the reg_sequence, although they should. llvm-svn: 363876	2019-06-19 20:44:15 +00:00
Peter Collingbourne	2742eeb78e	hwasan: Shrink outlined checks by 1 instruction. Turns out that we can save an instruction by folding the right shift into the compare. Differential Revision: https://reviews.llvm.org/D63568 llvm-svn: 363874	2019-06-19 20:40:03 +00:00
Matt Arsenault	4d55d024be	Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics" This reapplies r363678, using the correct chain for the CopyToReg for v0. glueCopyToM0 counterintuitively changes the operands of the original node. llvm-svn: 363870	2019-06-19 19:55:27 +00:00
Yuanfang Chen	40a156b791	[llvm-readobj] Match GNU output for DT_RPATH and DT_RUNPATH when dumping dynamic symbol table. Reviewers: jhenderson, grimar, MaskRay, rupprecht, espindola Subscribers: emaste, nemanjai, arichardson, kbarton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63347 llvm-svn: 363868	2019-06-19 19:31:07 +00:00
Yuanfang Chen	fee7365b07	[llvm-objdump] Remove unnecessary indentation when dumping ELF data. Reviewers: MaskRay, jhenderson, rupprecht Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63393 llvm-svn: 363858	2019-06-19 18:44:29 +00:00
Volkan Keles	61d7e35b22	Fix GlobalISel MachineVerifier tests. NFC. These test were failing when building llvm with `-DLLVM_DEFAULT_TARGET_TRIPLE=''`. Add `-march` to the run line to fix the issue. llvm-svn: 363854	2019-06-19 18:15:45 +00:00
Sanjay Patel	b5640b6fe8	[x86] avoid vector load narrowing with extracted store uses (PR42305) This is an exception to the rule that we should prefer xmm ops to ymm ops. As shown in PR42305: https://bugs.llvm.org/show_bug.cgi?id=42305 ...the store folding opportunity with vextractf128 may result in better perf by reducing the instruction count. Differential Revision: https://reviews.llvm.org/D63517 llvm-svn: 363853	2019-06-19 18:13:47 +00:00
Sanjay Patel	33ef687d94	[x86] add test for unaligned 32-byte load/store splitting; NFC llvm-svn: 363852	2019-06-19 18:06:59 +00:00
Simon Pilgrim	6016fb726c	[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. llvm-svn: 363850	2019-06-19 18:00:24 +00:00
Huihui Zhang	670778c762	[InstCombine] Fold icmp eq/ne (and %x, signbit), 0 -> %x s>=/s< 0 earlier Summary: To generate simplified IR, make sure fold ``` (X & signbit) ==/!= 0) -> X s>=/s< 0; ``` is scheduled before fold ``` ((X << Y) & C) == 0 -> (X & (C >> Y)) == 0. ``` https://rise4fun.com/Alive/fbdh Reviewers: lebedev.ri, efriedma, spatel, craig.topper Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63026 llvm-svn: 363845	2019-06-19 17:31:39 +00:00
Sanjay Patel	3e03bf6921	[InstSimplify] add a phi test with 1 incoming value; NFC D63489 proposes to change this behavior, but there's no direct -instsimplify test to verify that the transform exists. llvm-svn: 363842	2019-06-19 17:23:29 +00:00
Simon Pilgrim	34279db355	[X86][SSE] Combine shuffles to ANY_EXTEND/ANY_EXTEND_VECTOR_INREG. We already do this for ZERO_EXTEND/ZERO_EXTEND_VECTOR_INREG - this just extends the pattern matcher to recognize cases where we don't need the zeros in the extension. llvm-svn: 363841	2019-06-19 17:21:15 +00:00
Evandro Menezes	a7ed3a627b	[AArch64] Improve jump tables testing (NFC) Improve testing of the minimum and maximum sizes of jump tables. llvm-svn: 363839	2019-06-19 16:59:34 +00:00
Simon Tatham	2f5188fd58	[ARM] Add MVE vector bit-operations (register inputs). This includes all the obvious bitwise operations (AND, OR, BIC, ORN, MVN) in register-to-register forms, and the immediate forms of AND/OR/BIC/ORN; byte-order reverse instructions; and the VMOVs that access a single lane of a vector. Some of those VMOVs (specifically, the ones that access a 32-bit lane) share an encoding with existing instructions that were disassembled as accessing half of a d-register (e.g. `vmov.32 r0, d1[0]`), but in 8.1-M they're now written as accessing a quarter of a q-register (e.g. `vmov.32 r0, q0[2]`). The older syntax is still accepted by the assembler. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62673 llvm-svn: 363838	2019-06-19 16:43:53 +00:00
Evandro Menezes	54252b8243	[AArch64] Improve jump tables testing (NFC) Improve testing of the minimum and maximum sizes of jump tables. llvm-svn: 363837	2019-06-19 16:35:30 +00:00
James Henderson	e20326ed33	[test][llvm-dwarfdump] Remove pointless CHECK-NOT lines The original line was there from when this test was added, but it is checking for a switch that doesn't exist, so really has no purpose, at least any more. llvm-svn: 363833	2019-06-19 16:31:59 +00:00
Hubert Tong	1f6ddfb6a3	[NFC][llvm-objcopy] Fix overly restrictive od output check The check against the output of `od` in the affected tests expect a specific input offset format. They also expect a specific offset value, not consistent with the EXAMPLE section for `od` in POSIX.1-2017 Chapter 4, while using the `-j` option. In particular, the example shows that the input offset begins at 0 following the bytes skipped. This patch adjusts the matching of the input offset to be more generic. In order to avoid false matches, it restricts the number of bytes to be formatted. llvm-svn: 363829	2019-06-19 16:04:24 +00:00
Hubert Tong	e9983eed5a	[NFC][LSR] Avoid undefined grep in pr2570.ll greater-than-sign is not a BRE special character. POSIX.1-2017 XBD Section 9.3.2 indicates that the interpretation of `\>` is undefined. This patch replaces the pattern. llvm-svn: 363828	2019-06-19 16:02:54 +00:00
Cameron McInally	7aa898e61e	[DFSan] Add UnaryOperator visitor to DataFlowSanitizer Differential Revision: https://reviews.llvm.org/D62815 llvm-svn: 363814	2019-06-19 15:11:41 +00:00
Cameron McInally	a027cf4764	[Reassociate] Handle unary FNeg in the Reassociate pass Differential Revision: https://reviews.llvm.org/D63445 llvm-svn: 363813	2019-06-19 14:59:14 +00:00
Bjorn Pettersson	16ff5fea87	[ConstantFolding] Add constant folding for smul.fix and smul.fix.sat Summary: This patch teaches ConstantFolding to constant fold both scalar and vector variants of llvm.smul.fix and llvm.smul.fix.sat. As described in the LangRef rounding is unspecified for these instrinsics. If the result cannot be represented exactly the default behavior in ConstantFolding is to round down towards negative infinity. If a target has a preferred rounding that is different some kind of target hook would be needed (same strategy as used by the SelectionDAG legalizer). Reviewers: nikic, leonardchan, RKSimon Reviewed By: leonardchan Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63385 llvm-svn: 363811	2019-06-19 14:28:03 +00:00
Ulrich Weigand	3641b10f3d	[SystemZ] Support vector load/store alignment hints Vector load/store instructions support an optional alignment field that the compiler can use to provide known alignment info to the hardware. If the field is used (and the information is correct), the hardware may be able (on some models) to perform faster memory accesses than otherwise. This patch adds support for alignment hints in the assembler and disassembler, and fills in known alignment during codegen. llvm-svn: 363806	2019-06-19 14:20:00 +00:00
Simon Pilgrim	c3994f77cb	[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero. Matches what we already do for SIGN_EXTEND. llvm-svn: 363802	2019-06-19 13:58:02 +00:00
Fangrui Song	102b1efd53	[llvm-dwarfdump] --gdb-index: fix uninitialized TuListOffset The test only checks the existence of the `Types CU list` line. Unfortunately I can't make a better test because {gcc,clang} -fuse-ld={lld,gold} --gdb-index do not give me a non-empty types CU list. Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D63537 llvm-svn: 363800	2019-06-19 13:51:29 +00:00
Simon Pilgrim	128ce93c60	Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics There may or may not be additional work to handle this correctly on SI/CI. ........ Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/ llvm-svn: 363797	2019-06-19 13:00:54 +00:00
David Bolvansky	e3cd19d330	[NFC] Added tests for D63534 llvm-svn: 363796	2019-06-19 12:59:37 +00:00
David Bolvansky	21fd232385	[NFC] Added tests for cttz(abs(x)) -> cttz(x) fold llvm-svn: 363795	2019-06-19 12:55:39 +00:00
Simon Pilgrim	9eed5d2f78	[DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. llvm-svn: 363793	2019-06-19 12:41:37 +00:00
Simon Pilgrim	8c49366c9b	[DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> 0 non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. This requires us to tweak matchBinaryPredicate to allow it to (optionally) handle constants with different type widths. llvm-svn: 363792	2019-06-19 12:25:29 +00:00
Simon Pilgrim	85f70baa23	[X86] Add non-uniform (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) test llvm-svn: 363791	2019-06-19 11:36:01 +00:00
Simon Pilgrim	d954a53633	[DAGCombine] Fix (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) comment. NFCI. We pre-extend, not post. llvm-svn: 363787	2019-06-19 11:17:48 +00:00
Orlando Cazalet-Hyams	1251cac62a	[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. I have set up a separate review D61933 for a fix which is required for this patch. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse Reviewed By: hfinkel, jmorse Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 > llvm-svn: 363046 llvm-svn: 363786	2019-06-19 10:50:47 +00:00
Jay Foad	45d19fb470	[ConstantFolding] Fix assertion failure on non-power-of-two vector load. Summary: The test case does an (out of bounds) load from a global constant with type <3 x float>. InstSimplify tried to turn this into an integer load of the whole alloc size of the vector, which is 128 bits due to alignment padding, and then bitcast this to <3 x vector> which failed an assertion due to the type size mismatch. The fix is to do an integer load of the normal size of the vector, with no alignment padding. Reviewers: tpr, arsenm, majnemer, dstuttard Reviewed By: arsenm Subscribers: hfinkel, wdng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63375 llvm-svn: 363784	2019-06-19 10:28:48 +00:00
Lewis Revill	18737e81eb	[RISCV] Allow parsing immediates that use tilde & exclaim This patch allows immediates (and CSR alias immediates) which start with a tilde token or an exclaim (!) token to be parsed as intended. Differential Revision: https://reviews.llvm.org/D57320 llvm-svn: 363783	2019-06-19 10:27:24 +00:00
Lewis Revill	218aa0edb1	[RISCV] Fix failure to parse parenthesized immediates Since the parser attempts to parse an operand as a register with parentheses before parsing it as an immediate, immediates in parentheses should not be parsed by parseRegister. However in the case where the immediate does not start with an identifier, the LParen is not unlexed and so the RParen causes an unexpected token error. This patch adds the missing UnLex, and modifies the existing UnLex to not use a buffered token, as it should always be unlexing an LParen. Differential Revision: https://reviews.llvm.org/D57319 llvm-svn: 363782	2019-06-19 10:11:13 +00:00
Clement Courbet	f7a6fb9f2c	Fix r363773: Update Barcelona MCA tests. llvm-svn: 363781	2019-06-19 10:00:36 +00:00
George Rimar	b6e20937b3	[yaml2obj/obj2yaml] - Make RawContentSection::Info Optional<> This allows to customize this field for "implicit" sections properly. Differential revision: https://reviews.llvm.org/D63487 llvm-svn: 363777	2019-06-19 08:57:38 +00:00
Roman Lebedev	9f9691c032	[NFC][X86][MCA] Barcelona: add load/store/load-store-throughput tests llvm-svn: 363775	2019-06-19 08:53:34 +00:00
Roman Lebedev	4358016b03	[NFC][X86][MCA] BdVer2: add load-store-throughput test llvm-svn: 363774	2019-06-19 08:53:28 +00:00
Clement Courbet	4ef7c2868a	[X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsr Summary: llvm.x86.sse.stmxcsr only writes to memory. llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE. Reviewers: craig.topper, RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62896 llvm-svn: 363773	2019-06-19 08:44:31 +00:00
Lewis Revill	39263ac5d1	[RISCV] Add lowering of global TLS addresses This patch adds lowering for global TLS addresses for the TLS models of InitialExec, GlobalDynamic, LocalExec and LocalDynamic. LocalExec support required using a 4-operand add instruction, which uses the fourth operand to express a relocation on the symbol. The necessary fixup is emitted when the instruction is emitted. Differential Revision: https://reviews.llvm.org/D55305 llvm-svn: 363771	2019-06-19 08:40:59 +00:00
Alex Bradbury	ec4e0809df	[RISCV] Fix test after r363757 r363757 renamed ExpandISelPseudo to FinalizeISel, so the RUN line in select-optimize-multiple.mir needed updating to refer to finalize-isel. llvm-svn: 363762	2019-06-19 03:18:48 +00:00
Matt Arsenault	9cac4e6d14	Rename ExpandISelPseudo->FinalizeISel, delay register reservation This allows targets to make more decisions about reserved registers after isel. For example, now it should be certain there are calls or stack objects in the frame or not, which could have been introduced by legalization. Patch by Matthias Braun llvm-svn: 363757	2019-06-19 00:25:39 +00:00
Thomas Lively	1885747498	[WebAssembly] Optimize ISel for SIMD Boolean reductions Summary: Converting the result *.{all,any}_true to a bool at the source level generates LLVM IR that compares the result to 0. This check is redundant since these instructions already return either 0 or 1 and therefore conform to the BooleanContents setting for WebAssembly. This CL adds patterns to detect and remove such redundant operations on the result of Boolean reductions. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63529 llvm-svn: 363756	2019-06-19 00:02:13 +00:00
Evandro Menezes	1933cbe866	[test] Change comment wording (NFC) llvm-svn: 363751	2019-06-18 23:31:10 +00:00
Michael Trent	c2885ded2b	Print dylib load kind (weak, reexport, etc) in llvm-objdump -m -dylibs-used Summary: Historically llvm-objdump prints the path to a dylib as well as the dylib's compatibility version and current version number. This change extends this information by adding the kind of dylib load: weak, reexport, etc. rdar://51383512 Reviewers: pete, lhames Reviewed By: pete Subscribers: rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62866 llvm-svn: 363746	2019-06-18 22:20:10 +00:00
Michael Liao	4f7f70e262	Recommit [SROA] Enhance SROA to handle `addrspacecast`ed allocas [SROA] Enhance SROA to handle `addrspacecast`ed allocas - Fix typo in original change - Add additional handling to ensure all return pointers are properly casted. Summary: - After `addrspacecast` is allowed to be eliminated in SROA, the adjusting of storage pointer (from `alloca) needs to handle the potential different address spaces between the storage pointer (from alloca) and the pointer being used. Reviewers: arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63501 llvm-svn: 363743	2019-06-18 21:41:13 +00:00
Matt Arsenault	e8d8bb5170	InstCombine: Pre-commit test for reassociating nuw D39417 llvm-svn: 363741	2019-06-18 21:32:51 +00:00
Huihui Zhang	d16779a732	[ARM] Comply with rules on ARMv8-A thumb mode partial deprecation of IT. Summary: When identifing instructions that can be folded into a MOVCC instruction, checking for a predicate operand is not enough, also need to check for thumb2 function, with restrict-IT, is the machine instruction eligible for ARMv8 IT or not. Notes in ARMv8-A Architecture Reference Manual, section "Partial deprecation of IT" https://usermanual.wiki/Pdf/ARM20Architecture20Reference20ManualARMv8.1667877052.pdf "ARMv8-A deprecates some uses of the T32 IT instruction. All uses of IT that apply to instructions other than a single subsequent 16-bit instruction from a restricted set are deprecated, as are explicit references to the PC within that single 16-bit instruction. This permits the non-deprecated forms of IT and subsequent instructions to be treated as a single 32-bit conditional instruction." Reviewers: efriedma, lebedev.ri, t.p.northover, jmolloy, aemerson, compnerd, stoklund, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63474 llvm-svn: 363739	2019-06-18 20:55:09 +00:00
Sam Elliott	9f155bc6e5	[RISCV] Prevent re-ordering some adds after shifts Summary: DAGCombine will normally turn a `(shl (add x, c1), c2)` into `(add (shl x, c2), c1 << c2)`, where `c1` and `c2` are constants. This can be prevented by a callback in TargetLowering. On RISC-V, materialising the constant `c1 << c2` can be more expensive than materialising `c1`, because materialising the former may take more instructions, and may use a register, where materialising the latter would not. This patch implements the hook in RISCVTargetLowering to prevent this transform, in the cases where: - `c1` fits into the immediate field in an `addi` instruction. - `c1` takes fewer instructions to materialise than `c1 << c2`. In future, DAGCombine could do the check to see whether `c1` fits into an add immediate, which might simplify more targets hooks than just RISC-V. Reviewers: asb, luismarques, efriedma Reviewed By: asb Subscribers: xbolva00, lebedev.ri, craig.topper, lewis-revill, Jim, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62857 llvm-svn: 363736	2019-06-18 20:38:08 +00:00
Sanjay Patel	413ed69b4b	[x86] add another test for load splitting with extracted stores (PR42305); NFC llvm-svn: 363732	2019-06-18 20:13:35 +00:00
Adrian Prantl	fc5107cde6	Add debug location verification for !llvm.loop attachments. This patch teaches the Verifier how to detect broken !llvm.loop attachments as discussed in https://reviews.llvm.org/D60831. This allows LLVM to warn and strip out the broken debug info before attempting an LTO compilation with input generated by LLVM predating https://reviews.llvm.org/rL361149. rdar://problem/51631158 Differential Revision: https://reviews.llvm.org/D63499 [Re-applies r363725 without changes after fixing a broken testcase.] llvm-svn: 363731	2019-06-18 20:09:09 +00:00
Adrian Prantl	1db8d4a866	Fix broken debug info in in an !llvm.loop attachment in this testcase. llvm-svn: 363730	2019-06-18 20:07:53 +00:00
Adrian Prantl	acc93d62e0	Revert Add debug location verification for !llvm.loop attachments. This reverts r363725 (git commit `8ff822d61d`) llvm-svn: 363728	2019-06-18 19:54:17 +00:00
Adrian Prantl	8ff822d61d	Add debug location verification for !llvm.loop attachments. This patch teaches the Verifier how to detect broken !llvm.loop attachments as discussed in https://reviews.llvm.org/D60831. This allows LLVM to warn and strip out the broken debug info before attempting an LTO compilation with input generated by LLVM predating https://reviews.llvm.org/rL361149. rdar://problem/51631158 Differential Revision: https://reviews.llvm.org/D63499 llvm-svn: 363725	2019-06-18 19:42:29 +00:00
Jordan Rupprecht	33e85ad956	Revert [SROA] Enhance SROA to handle `addrspacecast`ed allocas This reverts r363711 (git commit `76a149ef81`) This causes stage2 build failures, e.g.: http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/132/steps/stage%202%20build/logs/stdio http://lab.llvm.org:8011/builders/ppc64le-lld-multistage-test/builds/87/steps/build-stage2-unified-tree/logs/stdio llvm-svn: 363718	2019-06-18 18:40:04 +00:00
Michael Liao	76a149ef81	[SROA] Enhance SROA to handle `addrspacecast`ed allocas Summary: - After `addrspacecast` is allowed to be eliminated in SROA, the adjusting of storage pointer (from `alloca) needs to handle the potential different address spaces between the storage pointer (from alloca) and the pointer being used. Reviewers: arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63501 llvm-svn: 363711	2019-06-18 17:58:49 +00:00
Sanjay Patel	223176f5d7	[x86] add test for load splitting with extracted store (PR42305); NFC llvm-svn: 363704	2019-06-18 17:16:17 +00:00
Simon Tatham	cfc70782d7	[ARM] Add MVE vector shift instructions. This includes saturating and non-saturating shifts, both with immediate shift count and with the shift counts given by another vector register; VSHLC (in which the bits shifted out of each active vector lane are shifted in to the next active lane); and also VMOVL, which is enough like an immediate shift that it didn't fit too badly in this category. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62672 llvm-svn: 363696	2019-06-18 16:19:59 +00:00
Simon Tatham	faaf1a5366	[ARM] Add MVE integer vector min/max instructions. Summary: These form a small family of their own, to go with the floating-point VMINNM/VMAXNM instructions added in a previous commit. They introduce the first of many special cases in the mnemonic recognition code, because VMIN with the E suffix used by the VPT predication system needs to avoid being interpreted as the nonexistent instruction 'VMI' with an ordinary 'NE' condition suffix. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62671 llvm-svn: 363695	2019-06-18 15:51:46 +00:00
Simon Pilgrim	9aa25be149	[TargetLowering] SimplifyDemandedVectorElts - support MUL and ANY_EXTEND_VECTOR_INREG Also fold ANY_EXTEND_VECTOR_INREG -> BITCAST if we only need the bottom element. Fixes temporary regression introduced in rL363693. llvm-svn: 363694	2019-06-18 15:49:35 +00:00
Simon Pilgrim	9c8593934a	[X86][AVX] extract_subvector(any_extend(x)) -> any_extend_vector_inreg(x) Part of fixing the X86 regression noted in D63281 - I've split this into X86 and generic parts - the generic commit will be coming shortly and will fix the vector-reduce-mul-widen.ll regression introduced here. llvm-svn: 363693	2019-06-18 15:30:50 +00:00
Simon Tatham	ed4a602515	[ARM] Rename MVE instructions in Tablegen for consistency. Summary: Their names began with a mishmash of `MVE_`, `t2` and no prefix at all. Now they all start with `MVE_`, which seems like a reasonable choice on the grounds that (a) NEON is the thing they're most at risk of being confused with, and (b) MVE implies Thumb-2, so a prefix indicating MVE is strictly more specific than one indicating Thumb-2. Reviewers: ostannard, SjoerdMeijer, dmgreen Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63492 llvm-svn: 363690	2019-06-18 15:05:42 +00:00
Lewis Revill	74c8364954	[RISCV] Lower calls through PLT This patch adds support for generating calls through the procedure linkage table where required for a given ExternalSymbol or GlobalAddress callee. Differential Revision: https://reviews.llvm.org/D55304 llvm-svn: 363686	2019-06-18 14:29:45 +00:00
Fangrui Song	677423997d	[llvm-readobj] Allow --hex-dump/--string-dump to dump multiple sections 1) `-x foo` currently dumps one `foo`. This change makes it dump all `foo`. 2) `-x foo -x foo` currently dumps `foo` twice. This change makes it dump `foo` once. In addition, if foo has section index 9, `-x foo -x 9` dumps `foo` once. 3) Give a warning instead of an error if `foo` does not exist. The new behaviors match GNU readelf. Also, print a new line as a separator between two section dumps. GNU readelf uses two lines, but one seems good enough. Reviewed By: grimar, jhenderson Differential Revision: https://reviews.llvm.org/D63475 llvm-svn: 363683	2019-06-18 14:01:03 +00:00
Matt Arsenault	8d35dcd703	AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics There may or may not be additional work to handle this correctly on SI/CI. llvm-svn: 363678	2019-06-18 13:19:57 +00:00
Simon Pilgrim	83bacd8d72	[SelectionDAG] Legalize vaargs that require vector splitting This adds vector splitting for vaarg instructions during type legalization Committed on behalf of @luke (Luke Lau) Differential Revision: https://reviews.llvm.org/D60762 llvm-svn: 363671	2019-06-18 12:24:02 +00:00
Matt Arsenault	bcb5ea0042	AMDGPU: Fold readlane from copy of SGPR or imm These may be inserted to assert uniformity somewhere. llvm-svn: 363670	2019-06-18 12:23:46 +00:00
Matt Arsenault	23f03f5059	AMDGPU: Fix iterator crash in AMDGPUPromoteAlloca The lifetime intrinsic was erased, which was the next iterator. llvm-svn: 363668	2019-06-18 12:23:44 +00:00
Matt Arsenault	d5ce8ec778	AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.scale llvm-svn: 363667	2019-06-18 12:23:42 +00:00
Jonas Paulsson	5c64a8c4c6	[SystemZ] Fix AHIMuxK pseudo expansion. Do not emit a copy if the source and destination registers are the same. Review: Ulrich Weigand llvm-svn: 363665	2019-06-18 12:10:02 +00:00
Graham Hunter	43854e3ccc	[SVE][IR] Scalable Vector IR Type with pr42210 fix Recommit of D32530 with a few small changes: - Stopped recursively walking through aggregates in the verifier, so that we don't impose too much overhead on large modules under LTO (see PR42210). - Changed tests to match; the errors are slightly different since they only report the array or struct that actually contains a scalable vector, rather than all aggregates which contain one in a nested member. - Corrected an older comment Reviewers: thakis, rengolin, sdesmalen Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D63321 llvm-svn: 363658	2019-06-18 10:11:56 +00:00
Simon Pilgrim	6658bfb171	[X86] Regenerate promote.ll. NFC. llvm-svn: 363657	2019-06-18 10:10:53 +00:00
Fangrui Song	291e11ea02	[llvm-objdump] Tidy up AMDGCNPrettyPrinter llvm-svn: 363650	2019-06-18 06:35:18 +00:00
Craig Topper	02a445c245	[X86] Add i128 ctpop and i32/i64/i128 optsize test cases to popcnt.ll Test cases for PR41151 and D59909. llvm-svn: 363647	2019-06-18 04:52:49 +00:00
Craig Topper	587427716c	[X86] Remove MOVDI2SSrm/MOV64toSDrm/MOVSS2DImr/MOVSDto64mr CodeGenOnly instructions. The isel patterns for these use a bitcast and load/store, but DAG combine should have canonicalized those away. For the purposes of the memory folding table these opcodes can be replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes. llvm-svn: 363644	2019-06-18 03:23:15 +00:00
Craig Topper	8582ecd8d9	[X86] Introduce new MOVSSrm/MOVSDrm opcodes that use VR128 register class. Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt. Use the new versions in patterns that previously used a COPY_TO_REGCLASS to VR128. These patterns expect the upper bits to be zero. The current set up appears to work, but I'm not sure we should be enforcing upper bits being zero through a COPY_TO_REGCLASS. I wanted to flip the arrangement and use a COPY_TO_REGCLASS to FR32/FR64 for the patterns that need an f32/f64 result, but that complicated fastisel and globalisel. I've been doing some experiments with reducing some isel patterns and ended up in a situation where I had a (SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our post-isel peephole was unable to avoid using an instruction for the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128 instruction removes the COPY_TO_REGCLASS that was breaking this. llvm-svn: 363643	2019-06-18 03:23:11 +00:00
Alex Brachet	7747700937	[llvm-strip] Error when using stdin twice Summary: Implements bug [[ https://bugs.llvm.org/show_bug.cgi?id=42204 \| 42204 ]]. llvm-strip now warns when the same input file is used more than once, and errors when stdin is used more than once. Reviewers: jhenderson, rupprecht, espindola, alexshap Reviewed By: jhenderson, rupprecht Subscribers: emaste, arichardson, jakehehrlich, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63122 llvm-svn: 363638	2019-06-18 00:39:10 +00:00
Matt Arsenault	5a321b899e	GlobalISel: Use the original flags when lowering fneg to fsub This was ignoring the flag on fneg, and using the source instruction's flags. Also fixes tests missing from r358702. Note the expansion itself isn't correct without nnan, but that should be fixed separately. llvm-svn: 363637	2019-06-17 23:48:43 +00:00
Peter Collingbourne	d57f7cc15e	hwasan: Use bits [3..11) of the ring buffer entry address as the base stack tag. This saves roughly 32 bytes of instructions per function with stack objects and causes us to preserve enough information that we can recover the original tags of all stack variables. Now that stack tags are deterministic, we no longer need to pass -hwasan-generate-tags-with-calls during check-hwasan. This also means that the new stack tag generation mechanism is exercised by check-hwasan. Differential Revision: https://reviews.llvm.org/D63360 llvm-svn: 363636	2019-06-17 23:39:51 +00:00
Peter Collingbourne	fb9ce100d1	hwasan: Add a tag_offset DWARF attribute to instrumented stack variables. The goal is to improve hwasan's error reporting for stack use-after-return by recording enough information to allow the specific variable that was accessed to be identified based on the pointer's tag. Currently we record the PC and lower bits of SP for each stack frame we create (which will eventually be enough to derive the base tag used by the stack frame) but that's not enough to determine the specific tag for each variable, which is the stack frame's base tag XOR a value (the "tag offset") that is unique for each variable in a function. In IR, the tag offset is most naturally represented as part of a location expression on the llvm.dbg.declare instruction. However, the presence of the tag offset in the variable's actual location expression is likely to confuse debuggers which won't know about tag offsets, and moreover the tag offset is not required for a debugger to determine the location of the variable on the stack, so at the DWARF level it is represented as an attribute so that it will be ignored by debuggers that don't know about it. Differential Revision: https://reviews.llvm.org/D63119 llvm-svn: 363635	2019-06-17 23:39:41 +00:00
Amara Emerson	146882242f	[GlobalISel][Localizer] Rewrite localizer to run in 2 phases, inter & intra block. Inter-block localization is the same as what currently happens, except now it only runs on the entry block because that's where the problematic constants with long live ranges come from. The second phase is a new intra-block localization phase which attempts to re-sink the already localized instructions further right before one of the multiple uses. One additional change is to also localize G_GLOBAL_VALUE as they're constants too. However, on some targets like arm64 it takes multiple instructions to materialize the value, so some additional heuristics with a TTI hook have been introduced attempt to prevent code size regressions when localizing these. Overall, these changes improve CTMark code size on arm64 by 1.2%. Full code size results: Program baseline new diff ------------------------------------------------------------------------------ test-suite...-typeset/consumer-typeset.test 1249984 1217216 -2.6% test-suite...:: CTMark/ClamAV/clamscan.test 1264928 1232152 -2.6% test-suite :: CTMark/SPASS/SPASS.test 1394092 1361316 -2.4% test-suite...Mark/mafft/pairlocalalign.test 731320 714928 -2.2% test-suite :: CTMark/lencod/lencod.test 1340592 `1324200` -1.2% test-suite :: CTMark/kimwitu++/kc.test 3853512 3820420 -0.9% test-suite :: CTMark/Bullet/bullet.test 3406036 3389652 -0.5% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8017000 8016992 -0.0% test-suite...TMark/7zip/7zip-benchmark.test 2856588 2856588 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 765704 765704 0.0% Geomean difference -1.2% Differential Revision: https://reviews.llvm.org/D63303 llvm-svn: 363632	2019-06-17 23:20:29 +00:00
Michael Berg	f9bff2a55e	Propagate fmf in IRTranslate for fneg Summary: This case is related to D63405 in that we need to be propagating FMF on negates. Reviewers: volkan, spatel, arsenm Reviewed By: arsenm Subscribers: wdng, javed.absar Differential Revision: https://reviews.llvm.org/D63458 llvm-svn: 363631	2019-06-17 23:19:40 +00:00
Stanislav Mekhanoshin	ca42687d62	[AMDGPU] gfx1010 subvector test. NFC. llvm-svn: 363623	2019-06-17 21:55:06 +00:00
Volkan Keles	689509edab	[test][AArch64] Relax the check line for G_BRJT in legalizer-info-validation.mir Replace the specific number with a pattern to relax the test. llvm-svn: 363621	2019-06-17 21:25:25 +00:00
Philip Reames	44475363e8	Teach getSCEVAtScope how to handle loop phis w/invariant operands in loops w/taken backedges This patch really contains two pieces: Teach SCEV how to fold a phi in the header of a loop to the value on the backedge when a) the backedge is known to execute at least once, and b) the value is safe to use globally within the scope dominated by the original phi. Teach IndVarSimplify's rewriteLoopExitValues to allow loop invariant expressions which already exist (and thus don't need new computation inserted) even in loops where we can't optimize away other uses. Differential Revision: https://reviews.llvm.org/D63224 llvm-svn: 363619	2019-06-17 21:06:17 +00:00
Daniel Sanders	184c8ee920	[globalisel] Fix iterator invalidation in the extload combines Summary: Change the way we deal with iterator invalidation in the extload combines as it was still possible to neglect to visit a use. Even worse, it happened in the in-tree test cases and the checks weren't good enough to detect it. We now take a cheap copy of the use list before iterating over it. This prevents iterator invalidation from occurring and has the nice side effect of making the existing schedule-for-erase/schedule-for-insert mechanism moot. Reviewers: aditya_nandakumar Reviewed By: aditya_nandakumar Subscribers: rovka, kristof.beyls, javed.absar, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61813 llvm-svn: 363616	2019-06-17 20:56:31 +00:00
Stanislav Mekhanoshin	3138278287	[AMDGPU] Propagate function attributes thru bitcasts AMDGPUPropagateAttributes will not work on function bitcatsts, so move AMDGPUFixFunctionBitcasts before it. Differential Revision: https://reviews.llvm.org/D63455 llvm-svn: 363614	2019-06-17 20:42:48 +00:00
Philip Reames	fe8bd96ebd	Fix a bug w/inbounds invalidation in LFTR (recommit) Recommit r363289 with a bug fix for crash identified in pr42279. Issue was that a loop exit test does not have to be an icmp, leading to a null dereference crash when new logic was exercised for that case. Test case previously committed in r363601. Original commit comment follows: This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV. The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying. As such, our potential UB triggering use does not change the semantics of the original program. As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well. (Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.) Differential Revision: https://reviews.llvm.org/D62939 llvm-svn: 363613	2019-06-17 20:32:22 +00:00
Nicolai Haehnle	ae4fcb97dd	AMDGPU/GFX10: Don't generate s_code_end padding in the asm-printer Summary: The purpose of the padding is to guard against stale code being fetched into the instruction cache by the lowest level prefetching. We're generating relocatable ELF here, and so the padding should arguably be added by the linker. This is in fact what Mesa does. This also fixes multi-part shaders for Mesa. Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82 Reviewers: arsenm, rampitec, t-tye Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63427 llvm-svn: 363602	2019-06-17 19:28:43 +00:00
Philip Reames	58c75565f3	Reduced test case for pr42279 in advance of the relevant re-commit + fix llvm-svn: 363601	2019-06-17 19:27:45 +00:00
Nicolai Haehnle	8af7198c6c	AMDGPU: Explicitly define a triple for some tests Summary: This is related to the changes to the groupstaticsize intrinsic in D61494 which would otherwise make the related tests in these files fail or much less useful. Note that for some reason, SOPK generation is less effective in the amdhsa OS, which is why I chose PAL. I haven't investigated this deeper. Change-Id: I6bb99569338f7a433c28b4c9eb1e3e036b00d166 Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63392 llvm-svn: 363600	2019-06-17 19:25:57 +00:00
Joseph Tremoulet	daa1ae6142	[EarlyCSE] Fix hashing of self-compares Summary: Update compare normalization in SimpleValue hashing to break ties (when the same value is being compared to itself) by switching to the swapped predicate if it has a lower numerical value. This brings the hashing in line with isEqual, which already recognizes the self-compares with swapped predicates as equal. Fixes PR 42280. Reviewers: spatel, efriedma, nikic, fhahn, uabelho Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63349 llvm-svn: 363598	2019-06-17 19:11:28 +00:00
Alina Sbirlea	7a0098aa6e	[MemorySSA] Don't use template when the clone is a simplified instruction. Summary: LoopRotate doesn't create a faithful clone of an instruction, it may simplify it beforehand. Hence the clone of an instruction that has a MemoryDef associated may not be a definition, but a use or not a memory alternig instruction. Don't rely on the template when the clone may be simplified. Reviewers: george.burgess.iv Subscribers: jlebar, Prazek, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63355 llvm-svn: 363597	2019-06-17 18:58:40 +00:00
Jessica Paquette	49537bbf74	[GlobalISel][AArch64] Fold G_SUB into G_ICMP when it's safe to do so Basically porting over the behaviour in AArch64ISelLowering to GISel. See emitComparison for reference. When we have something like this: ``` lhs = G_SUB 0, y ... G_ICMP lhs, rhs ``` We can fold away the G_SUB and produce a cmn instead, given that we produce the same value in NZCV. Add a test showing that the transformation works, and also showing that we don't perform the transformation when it's unsafe. Also factor out the CSet emission into emitCSetForICMP. Differential Revision: https://reviews.llvm.org/D63163 llvm-svn: 363596	2019-06-17 18:40:06 +00:00
Simon Pilgrim	835999e48a	[X86][SSE] Scalarize under-aligned XMM vector nt-stores (PR42026) If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64). llvm-svn: 363592	2019-06-17 18:20:04 +00:00
Alina Sbirlea	05f77803f4	[MemorySSA] Add all MemoryPhis before filling their values. Summary: Add all MemoryPhis in IDF before filling in their incomign values. Otherwise, a new Phi can be added that needs to become the incoming value of another Phi. Test fails the verification in verifyPrevDefInPhis. Reviewers: george.burgess.iv Subscribers: jlebar, Prazek, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63353 llvm-svn: 363590	2019-06-17 18:16:53 +00:00
Stanislav Mekhanoshin	a9191c8492	[AMDGPU] gfx1010 wavefrontsize intrinsic folding Differential Revision: https://reviews.llvm.org/D63206 llvm-svn: 363588	2019-06-17 17:57:50 +00:00
Matt Arsenault	6d741f29ec	AMDGPU: Fold readlane/readfirstlane calls llvm-svn: 363587	2019-06-17 17:52:35 +00:00
Stanislav Mekhanoshin	ad04e7ad42	[AMDGPU] Pass to propagate ABI attributes from kernels to the functions The pass works in two modes: Mode 1: Just set attributes starting from kernels. This can work at the very beginning of opt and llc pipeline, but cannot clone functions because it must be a function pass. Mode 2: Actually clone functions for new attributes. This can only work after all function passes in the opt pipeline because it has to be a module pass. Differential Revision: https://reviews.llvm.org/D63208 llvm-svn: 363586	2019-06-17 17:47:28 +00:00
Simon Pilgrim	bb9adfdb4e	[X86][AVX] Split under-aligned vector nt-stores. If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it. llvm-svn: 363582	2019-06-17 17:22:38 +00:00
Warren Ristow	6452bdd29b	[LV] Suppress vectorization in some nontemporal cases When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type DataType, unsigned Alignment) const; bool isLegalNTLoad(Type DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 llvm-svn: 363581	2019-06-17 17:20:08 +00:00
Matt Arsenault	3e140066bc	GlobalISel: Ignore callsite attributes when picking intrinsic type A target intrinsic may be defined as possibly reading memory, but the call site may have additional knowledge that it doesn't read memory. The intrinsic lowering will expect the pessimistic assumption of the intrinsic definition, so the chain should still be used. I fixed the same bug in SelectionDAG in r287593. llvm-svn: 363580	2019-06-17 17:01:35 +00:00
Matt Arsenault	a7f09f3c9e	GlobalISel: Verify intrinsics I keep using the wrong instruction when manually writing tests. This really needs to check the number of operands, but I don't see an easy way to do that right now. llvm-svn: 363579	2019-06-17 17:01:32 +00:00
Stanislav Mekhanoshin	5d00c3060e	[AMDGPU] gfx1010 wave32 metadata Differential Revision: https://reviews.llvm.org/D63207 llvm-svn: 363577	2019-06-17 16:48:56 +00:00
Tom Stellard	8b1c53b528	AMDGPU/GlobalISel: Implement select for G_ICMP and G_SELECT Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60640 llvm-svn: 363576	2019-06-17 16:27:43 +00:00
Francis Visoiu Mistrih	34667519dc	[Remarks] Extend -fsave-optimization-record to specify the format Use -fsave-optimization-record=<format> to specify a different format than the default, which is YAML. For now, only YAML is supported. llvm-svn: 363573	2019-06-17 16:06:00 +00:00
Simon Pilgrim	1c91e63897	[X86][SSE] Add tests for underaligned nt loads Test both 'unaligned' (which we should just use regular unaligned loads) and 'subvector aligned' (which we should split) llvm-svn: 363565	2019-06-17 14:38:17 +00:00
Simon Pilgrim	454e6b9010	[X86][SSE] Prevent misaligned non-temporal vector load/store combines For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway. First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets. Differential Revision: https://reviews.llvm.org/D63246 llvm-svn: 363564	2019-06-17 14:26:10 +00:00
Matt Arsenault	1df203d78e	InferAddressSpaces: Fix cloning original addrspacecast If an addrspacecast needed to be inserted again, this was creating a clone of the original cast for each user. Just use the original, which also saves losing the value name. llvm-svn: 363562	2019-06-17 14:13:29 +00:00
Matt Arsenault	b10f097833	AMDGPU: Ignore subtarget for InferAddressSpaces Even if the target doesn't have flat instructions, addrspace(0) is still flat. It just happens to not work. llvm-svn: 363561	2019-06-17 14:13:24 +00:00
Matt Arsenault	f3b64d80bc	AMDGPU: Mark exp/exp.compr as inaccessiblememonly Should also be marked writeonly, but I think that would require splitting the version with done set to a separate intrinsic Test change is only from renumbering the attribute group numbers, which for some reason the generated check lines consider. llvm-svn: 363560	2019-06-17 13:52:24 +00:00
Sam Parker	1bd3d00e7e	[CodeGen] Check for HardwareLoop Latch ExitBlock The HardwareLoops pass finds exit blocks with a scevable exit count. If the target specifies to update the loop counter in a register, through a phi, we need to ensure that the exit block is a latch so that we can insert the phi with the correct value for the incoming edge. Differential Revision: https://reviews.llvm.org/D63336 llvm-svn: 363556	2019-06-17 13:39:28 +00:00
Simon Pilgrim	f1e2827170	[X86][SSE] Avoid unnecessary stack codegen in NT store codegen tests. llvm-svn: 363552	2019-06-17 12:35:26 +00:00
Bjorn Pettersson	83773b77a5	[LV] Deny irregular types in interleavedAccessCanBeWidened Summary: Avoid that loop vectorizer creates loads/stores of vectors with "irregular" types when interleaving. An example of an irregular type is x86_fp80 that is 80 bits, but that may have an allocation size that is 96 bits. So an array of x86_fp80 is not bitcast compatible with a vector of the same type. Not sure if interleavedAccessCanBeWidened is the best place for this check, but it solves the problem seen in the added test case. And it is the same kind of check that already exists in memoryInstructionCanBeWidened. Reviewers: fhahn, Ayal, craig.topper Reviewed By: fhahn Subscribers: hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63386 llvm-svn: 363547	2019-06-17 12:02:24 +00:00
Sander de Smalen	74ac20158a	Test forward references in IntrinsicEmitter on Neon LD(2\|3\|4) This patch tests the forward-referencing added in D62995 by changing some existing intrinsics to use forward referencing of overloadable parameters, rather than backward referencing. This patch changes the TableGen definition/implementation of llvm.aarch64.neon.ld2lane and llvm.aarch64.neon.ld2lane intrinsics (and similar for ld3 and ld4). This change is intended to be non-functional, since the behaviour of the intrinsics is expected to be the same. Reviewers: arsenm, dmgreen, RKSimon, greened, rnk Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D63189 llvm-svn: 363546	2019-06-17 12:01:53 +00:00
Luis Marques	2e46312ffd	[DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting Some GEPs were not being split, presumably because that split would just be undone by the DAGCombiner. Not performing those splits can prevent important optimizations, such as preventing the element indices / member offsets from being (partially) folded into load/store instruction immediates. This patch: - Makes the splits also occur in the cases where the base address and the GEP are in the same BB. - Ensures that the DAGCombiner doesn't reassociate them back again. Differential Revision: https://reviews.llvm.org/D60294 llvm-svn: 363544	2019-06-17 10:54:12 +00:00
Simon Pilgrim	ef78e55205	[SelectionDAG] Fold insert_subvector(undef, extract_subvector(v, c), c) -> v in getNode This is already done in DAGCombiner::visitINSERT_SUBVECTOR, but this helps a number of shuffles across different vector widths recognise when they come from the same source. llvm-svn: 363542	2019-06-17 10:14:52 +00:00
Sam Parker	60d6fb2a63	[SCEV] Use NoWrapFlags when expanding a simple mul Second functional change following on from rL362687. Pass the NoWrapFlags from the MulExpr to InsertBinop when we're generating a shl or mul. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 363540	2019-06-17 10:05:18 +00:00
Fangrui Song	46f9cbe28d	[llvm-objdump] Use %08 instead of %016 to print leading addresses for 32-bit binaries Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D63398 llvm-svn: 363539	2019-06-17 09:59:55 +00:00
Fangrui Song	ac14f7b10c	[lit] Delete empty lines at the end of lit.local.cfg NFC llvm-svn: 363538	2019-06-17 09:51:07 +00:00
Roman Lebedev	25a043e78a	[NFC][Codegen] Standalone tests for icmp eq/ne (urem %x, C), 0 -> icmp eq/ne %x, 0 fold (D63390) llvm-svn: 363537	2019-06-17 09:50:50 +00:00
Sander de Smalen	5d6ee76c16	Describe stack-id as an enum This patch changes MIR stack-id from an integer to an enum, and adds printing/parsing support for this in MIR files. The default stack-id '0' is now renamed to 'default'. This should make MIR tests that have stack objects with different stack-ids more descriptive. It also clarifies code operating on StackID. Reviewers: arsenm, thegameg, qcolombet Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60137 llvm-svn: 363533	2019-06-17 09:13:29 +00:00
Hans Wennborg	a9e5d2f35d	Re-commit r357452 (take 3): "SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)" Third time's the charm. This was reverted in r363220 due to being suspected of an internal benchmark regression and a test failure, none of which turned out to be caused by this. llvm-svn: 363529	2019-06-17 07:47:28 +00:00
Yevgeny Rouban	ee62c40eae	[SimplifyCFG] Fix prof branch_weights MD while removing unreachable switch cases SimplifyCFG has a bug that results in inconsistent prof branch_weights metadata if unreachable switch cases are removed. This patch fixes this bug by making use of the newly introduced SwitchInstProfUpdateWrapper class (see patch D62122). A new test is created. Differential Revision: https://reviews.llvm.org/D62186 llvm-svn: 363527	2019-06-17 05:55:12 +00:00
Justin Hibbits	1d1cf30b73	PowerPC: Optimize SPE double parameter calling setup Summary: SPE passes doubles the same as soft-float, in register pairs as i32 types. This is all handled by the target-independent layer. However, this is not optimal when splitting or reforming the doubles, as it pushes to the stack and loads from, on either side. For instance, to pass a double argument to a function, assuming the double value is in r5, the sequence currently looks like this: evstdd 5, X(1) lwz 3, X(1) lwz 4, X+4(1) Likewise, to form a double into r5 from args in r3 and r4: stw 3, X(1) stw 4, X+4(1) evldd 5, X(1) This optimizes the fence to use SPE instructions. Now, to pass a double to a function: mr 4, 5 evmergehi 3, 5, 5 And to form a double into r5 from args in r3 and r4: evmergelo 5, 3, 4 This is comparable to the way that gcc generates the double splits. This also fixes a bug with expanding builtins to libcalls, where the LowerCallTo() code path was generating intermediate illegal type nodes. Reviewers: nemanjai, hfinkel, joerg Subscribers: kbarton, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54583 llvm-svn: 363526	2019-06-17 03:15:23 +00:00
Seiya Nuta	4f15732067	[yaml2obj][MachO] Don't fill dummy data for virtual sections Summary: Currently, MachOWriter::writeSectionData writes dummy data (0xdeadbeef) to fill section data areas in the file even if the section is a virtual one. Since virtual sections don't occupy any space in the file, writing dummy data could results the "OS.tell() - fileStart <= Sec.offset" assertion failure. This patch fixes the bug by simply not writing any dummy data for virtual sections. Reviewers: beanz, jhenderson, rupprecht, alexshap Reviewed By: alexshap Subscribers: compnerd, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62991 llvm-svn: 363525	2019-06-17 02:07:20 +00:00
Seiya Nuta	13de174b4c	[llvm-objcopy] Add elf32-sparc and elf32-sparcel target Summary: The "sparc"/"sparcel" architectures appears in ArchMap (used by -B option) but not in OutputFormatMap (used by -I/-O option). Add their targets into OutputFormatMap for consistency. Note that AFAIK there're no targets for 32-bit little-endian SPARC ("elf32-sparcel") in GNU binutils. Reviewers: espindola, alexshap, rupprecht, jhenderson, compnerd, jakehehrlich Reviewed By: jhenderson, compnerd, jakehehrlich Subscribers: jyknight, emaste, arichardson, fedor.sergeev, jakehehrlich, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63238 llvm-svn: 363524	2019-06-17 02:03:45 +00:00
Roman Lebedev	5a663bd77a	[InstSimplify] Fix addo/subo undef folds (PR42209) Fix folds of addo and subo with an undef operand to be: `@llvm.{u,s}{add,sub}.with.overflow` all fold to `{ undef, false }`, as per LLVM undef rules. Same for commuted variants. Based on the original version of the patch by @nikic. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42209 \| PR42209 ]] Differential Revision: https://reviews.llvm.org/D63065 llvm-svn: 363522	2019-06-16 20:39:45 +00:00
Nicolai Haehnle	41abf2766e	AMDGPU: Prepare for explicit absolute relocations in code generation Summary: We will use absolute relocations for LDS symbols. Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61492 llvm-svn: 363517	2019-06-16 17:43:37 +00:00
Nicolai Haehnle	6d71be4e67	AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0 Summary: Instead of encoding a high-word of 0 using a fake TargetGlobalAddress, just use a literal target constant. This simplifies some subsequent changes. The generated assembly is now more explicit about the kind of relocation that is to be used. Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61491 llvm-svn: 363516	2019-06-16 17:32:01 +00:00
Nicolai Haehnle	490e83cd43	AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsic Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539 Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62486 Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0 llvm-svn: 363514	2019-06-16 17:14:12 +00:00
Stanislav Mekhanoshin	5250021672	[AMDGPU] gfx10 conditional registers handling This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513	2019-06-16 17:13:09 +00:00
Sanjay Patel	c8d88ad1a9	[CodeGenPrepare][x86] shift both sides of a vector select when profitable This is based on the example/discussion in PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 Proper vector shift instructions don't appear until AVX2, so we may generate several extra instructions within a loop trying to compensate for that. It's difficult to recover from that shift expansion later than this, so use the existing TLI hook and splat analysis to enable better codegen. This extends CGP functionality introduced with: rL201655 Differential Revision: https://reviews.llvm.org/D63233 llvm-svn: 363511	2019-06-16 15:29:03 +00:00
Sanjay Patel	d14389c0a5	[x86] split 256-bit vector selects if operands are vector concats This is similar logic/motivation to the select splitting in D62969. In D63233, the pattern changes so that we no longer have an extract_subvector of vselect, but the operands of the select are still being concatenated. The closest case is represented in either the first or last test diffs here - we have an extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions. I think that's the right trade-off for most AVX1 targets. In the example based on PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 ...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx). Differential Revision: https://reviews.llvm.org/D63364 llvm-svn: 363508	2019-06-16 14:04:49 +00:00
Simon Pilgrim	fcffc2facc	[X86] CombineShuffleWithExtract - handle cases with different vector extract sources Insert the shorter vector source into an undef vector of the longer vector source's type. llvm-svn: 363507	2019-06-16 08:00:41 +00:00
Simon Pilgrim	90e87af303	[X86][AVX] Handle lane-crossing shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well. Fixes PR34380 llvm-svn: 363500	2019-06-15 18:30:43 +00:00
Simon Pilgrim	990f3ceb67	[X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3) This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0) llvm-svn: 363499	2019-06-15 17:05:24 +00:00
Roman Lebedev	5dd61974f9	[NFC][MCA][X86] Add one more 'clear super register' pattern - movss/movsd load clears high XMM bits llvm-svn: 363498	2019-06-15 16:12:13 +00:00
Roman Lebedev	680c43b73a	[NFC][MCA][X86] Add baseline test coverage for AMD Barcelona (aka K10, fam10h) Looking into sched model for that CPU ... llvm-svn: 363497	2019-06-15 16:12:05 +00:00
Kang Zhang	2d51adcb57	[PowerPC] Set the innermost hot loop to align 32 bytes Summary: If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that we can decrease cache misses and branch-prediction misses. Actual alignment of the loop will depend on the hotness check and other logic in alignBlocks. The old code will only align hot loop to 32 bytes when the LoopSize larger than 16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop to 32 bytes not only for the hot loop whose size is 16~32 bytes. Reviewed By: steven.zhang, jsji Differential Revision: https://reviews.llvm.org/D61228 llvm-svn: 363495	2019-06-15 15:10:24 +00:00
Nikita Popov	8550fb386a	[SCEV] Use unsigned/signed intersection type in SCEV Based on D59959, this switches SCEV to use unsigned/signed range intersection based on the sign hint. This will prefer non-wrapping ranges in the relevant domain. I've left the one intersection in getRangeForAffineAR() to use the smallest intersection heuristic, as there doesn't seem to be any obvious preference there. Differential Revision: https://reviews.llvm.org/D60035 llvm-svn: 363490	2019-06-15 09:15:52 +00:00
Nikita Popov	9145562b48	[SimplifyIndVar] Simplify non-overflowing saturating add/sub If we can detect that saturating math that depends on an IV cannot overflow, replace it with simple math. This is similar to the CVP optimization from D62703, just based on a different underlying analysis (SCEV vs LVI) that catches different cases. Differential Revision: https://reviews.llvm.org/D62792 llvm-svn: 363489	2019-06-15 08:48:52 +00:00
Fangrui Song	e1aa69f755	[RISCV] Regenerate remat.ll and atomic-rmw.ll after D43256 llvm-svn: 363487	2019-06-15 07:49:14 +00:00
Alex Brachet	899a3072f0	[objcopy] Error when --preserve-dates is specified with standard streams Summary: llvm-objcopy/strip now error when -p is specified when reading from stdin or writing to stdout Reviewers: jhenderson, rupprecht, espindola, alexshap Reviewed By: jhenderson, rupprecht Subscribers: emaste, arichardson, jakehehrlich, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63090 llvm-svn: 363485	2019-06-15 05:32:23 +00:00
Michael Berg	ad6bb86b2d	adding more fmf propagation for selects plus updated tests llvm-svn: 363484	2019-06-15 04:53:51 +00:00
Fangrui Song	968b5f84af	Revert "adding more fmf propagation for selects plus tests" This reverts rL363474. -debug-only=isel was added to some tests that don't specify `REQUIRES: asserts`. This causes failures on -DLLVM_ENABLE_ASSERTIONS=off builds. I chose to revert instead of fixing the tests because I'm not sure whether we should add `REQUIRES: asserts` to more tests. llvm-svn: 363482	2019-06-15 03:51:08 +00:00
Huihui Zhang	dc2fd6a14e	[InstCombine] Add tests to show missing fold opportunity for "icmp and shift" (nfc). Summary: For icmp pred (and (sh X, Y), C), 0 When C is signbit, expect to fold (X << Y) & signbit ==/!= 0 into (X << Y) >=/< 0, rather than (X & (signbit >> Y)) != 0. When C+1 is power of 2, expect to fold (X << Y) & ~C ==/!= 0 into (X << Y) </>= C+1, rather than (X & (~C >> Y)) == 0. For icmp pred (and X, (sh signbit, Y)), 0 Expect to fold (X & (signbit l>> Y)) ==/!= 0 into (X << Y) >=/< 0 Expect to fold (X & (signbit << Y)) ==/!= 0 into (X l>> Y) >=/< 0 Reviewers: lebedev.ri, efriedma, spatel, craig.topper Reviewed By: lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63025 llvm-svn: 363479	2019-06-15 00:33:41 +00:00
Matt Arsenault	9487278010	Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect" This reapplies r363410, avoiding null dereference if there is no AltRegBank. llvm-svn: 363478	2019-06-15 00:33:26 +00:00
Mitch Phillips	0d44f129bb	Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect" This patch breaks UBSan build bots. See https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for a guide as to how to reproduce the error. This reverts commit `c2864c0de0`. This reverts rL363410. llvm-svn: 363476	2019-06-14 23:45:34 +00:00
Michael Berg	69394bedc5	adding more fmf propagation for selects plus tests llvm-svn: 363474	2019-06-14 23:30:52 +00:00
Guozhi Wei	d2210af332	[MBP] Move a latch block with conditional exit and multi predecessors to top of loop Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is: * a latch block * it has two successors, one is loop header, another is exit * it has more than one predecessors If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch. Differential Revision: https://reviews.llvm.org/D43256 llvm-svn: 363471	2019-06-14 23:08:59 +00:00
Akira Hatanaka	a704a8f28c	[ObjC][ARC] Delete ObjC runtime calls on global variables annotated with 'objc_arc_inert' Those calls are no-ops, so they can be safely deleted. rdar://problem/49839633 Differential Revision: https://reviews.llvm.org/D62433 llvm-svn: 363468	2019-06-14 22:06:32 +00:00
Matt Arsenault	aa41e92e17	AMDGPU: Avoid most waitcnts before calls Currently you get extra waits, because waits are inserted for the register dependencies of the call, and the function prolog waits on everything. Currently waits are still inserted on returns. It may make sense to not do this, and wait in the caller instead. llvm-svn: 363465	2019-06-14 21:52:26 +00:00
Francis Visoiu Mistrih	5501dda247	[Remarks][NFC] Improve testing and documentation of -foptimization-record-passes This adds: * documentation to the user manual * nicer error message * test for the error case * test for the gold plugin llvm-svn: 363463	2019-06-14 21:38:57 +00:00
Matt Arsenault	282dac717e	SROA: Allow eliminating addrspacecasted allocas There is a circular dependency between SROA and InferAddressSpaces today that requires running both multiple times in order to be able to eliminate all simple allocas and addrspacecasts. InferAddressSpaces can't remove addrspacecasts when written to memory, and SROA helps move pointers out of memory. This should avoid inserting new commuting addrspacecasts with GEPs, since there are unresolved questions about pointer wrapping between different address spaces. For now, don't replace volatile operations that don't match the alloca addrspace, as it would change the address space of the access. It may be still OK to insert an addrspacecast from the new alloca, but be more conservative for now. llvm-svn: 363462	2019-06-14 21:38:31 +00:00
Matt Arsenault	e6efb6433f	SROA: Add baseline test for addrspacecast changes llvm-svn: 363460	2019-06-14 21:22:26 +00:00
Matt Arsenault	bb0a610599	AMDGPU: Fix capitalized register names in asm constraints This was a workaround a long time ago, but the canonical lower case names work now. llvm-svn: 363459	2019-06-14 21:16:06 +00:00
Matt Arsenault	9e5fa33378	AMDGPU: Fix dropping memref for ds append/consume The way SelectionDAG treats memory operands is very frustrating, and by default drops them unless a property is set on the pattern. There is no pattern for manually selected instructions, so this requires manually setting them. llvm-svn: 363455	2019-06-14 21:01:24 +00:00
Matt Arsenault	1509fde891	AMDGPU: Add baseline test for call waitcnt insertion llvm-svn: 363453	2019-06-14 21:01:23 +00:00
Sanjay Patel	501bb982b9	[x86] add test for 256-bit blendv with AVX targets; NFC This is a reduction of the pattern seen in D63233. llvm-svn: 363448	2019-06-14 20:03:42 +00:00
Amara Emerson	f79d3bc724	[GlobalISel] Add a G_BRJT opcode. This is a branch opcode that takes a jump table pointer, jump table index and an index into the table to do an indirect branch. We pass both the table pointer and JTI to allow targets like ARM64 to more easily use the existing jump table compression optimization without having to walk up the block to find a paired G_JUMP_TABLE. Differential Revision: https://reviews.llvm.org/D63159 llvm-svn: 363434	2019-06-14 17:55:48 +00:00
Florian Hahn	dcdd12b68c	Revert Fix a bug w/inbounds invalidation in LFTR Reverting because it breaks a green dragon build: http://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/18208 This reverts r363289 (git commit `eb88badff9`) llvm-svn: 363427	2019-06-14 17:23:09 +00:00
Valery Pykhtin	ffeb01c113	[AMDGPU] Don't constrain callees with inlinehint from inlining on MaxBB check Summary: Function bodies marked inline in an opencl source are eliminated but MaxBB check may prevent inlining them leaving undefined references. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, Anastasia, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63337 llvm-svn: 363418	2019-06-14 16:37:33 +00:00
Kevin P. Neal	fece7c6c83	[FPEnv] Lower STRICT_FP_EXTEND and STRICT_FP_ROUND nodes in preprocess phase of ISelLowering to mirror non-strict nodes on x86. I recently discovered a bug on the x86 platform: The fp80 type was not handled well by x86 for constrained floating point nodes, as their regular counterparts are replaced by extending loads and truncating stores during the preprocess phase. Normally, platforms don't have this issue, as they don't typically attempt to perform such legalizations during instruction selection preprocessing. Before this change, strict_fp nodes survived until they were mutated to normal nodes, which happened shortly after preprocessing on other platforms. This modification lowers these nodes at the same phase while properly utilizing the chain.5 Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Craig Topper, Kevin P. Neal Approved by: Craig Topper Differential Revision: https://reviews.llvm.org/D63271 llvm-svn: 363417	2019-06-14 16:28:55 +00:00
Sanjay Patel	75312aa805	[x86] move vector shift tests for PR37428; NFC As suggested in the post-commit thread for rL363392 - it's wasteful to have so many runs for larger tests. AVX1/AVX2 is what shows the diff and probably what matters most going forward. llvm-svn: 363411	2019-06-14 15:23:09 +00:00
Matt Arsenault	c2864c0de0	GlobalISel: Avoid producing Illegal copies in RegBankSelect Avoid producing illegal register bank copies for reg_sequence and phi. The default implementation assumes it is possible to pick any operand's bank and use that for the result, introducing a copy for operands with a different bank. This does not check for illegal copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR operand requires the result to be a VGPR. The changes in getInstrMappingImpl aren't strictly necessary, since AMDGPU now just bypasses this for reg_sequence/phi. This could be replaced with an assert in case other targets run into this. It is currently responsible for producing the error for unsatisfiable copies, but this will be better served with a verifier check. For phis, for now assume any undetermined operands must be VGPRs. Eventually, this needs to be able to defer mapping these operations. This also does not yet have a way to check for whether the block is in a divergent region. llvm-svn: 363410	2019-06-14 15:22:25 +00:00
Sanjay Patel	7ea378b940	[CodeGenPrepare] propagate debuginfo when copying a shuffle llvm-svn: 363409	2019-06-14 15:05:35 +00:00
Matt Arsenault	492d71cc99	AMDGPU: Fold readlane intrinsics of constants I'm not 100% sure about this, since I'm worried about IR transforms that might end up introducing divergence downstream once replaced with a constant, but I haven't come up with an example yet. llvm-svn: 363406	2019-06-14 14:51:26 +00:00
Mikhail Maltsev	d1cc2e1543	[ARM] Add MVE horizontal accumulation instructions This is the family of vector instructions that combine all the lanes in their input vector(s), and output a value in one or two GPRs. Differential Revision: https://reviews.llvm.org/D62670 llvm-svn: 363403	2019-06-14 14:31:13 +00:00
George Rimar	0aecabae14	Revert "Revert r363377: [yaml2obj] - Allow setting custom section types for implicit sections." LLD test case will be fixed in a following commit. Original commit message: [yaml2obj] - Allow setting custom section types for implicit sections. We were hardcoding the final section type for sections that are usually implicit. The patch fixes that. This also fixes a few issues in existent test cases and removes one precompiled object. Differential revision: https://reviews.llvm.org/D63267 llvm-svn: 363401	2019-06-14 14:25:34 +00:00
Rui Ueyama	9f4e21c69a	Revert r363377: [yaml2obj] - Allow setting custom section types for implicit sections. This reverts commit r363377 because lld's ELF/invalid/undefined-local-symbol-in-dso.test test started failing after this commit. llvm-svn: 363394	2019-06-14 13:57:25 +00:00
Sanjay Patel	e5a78cd90f	[x86] add test for original example in PR37428; NFC The reduced case may avoid complications seen in this larger function. llvm-svn: 363392	2019-06-14 13:44:01 +00:00
Matt Arsenault	74d67c2086	AMDGPU: Fix printing trailing whitespace after s_endpgm llvm-svn: 363384	2019-06-14 13:26:29 +00:00
George Rimar	3b523c0a2e	[yaml2obj] - Allow setting custom section types for implicit sections. We were hardcoding the final section type for sections that are usually implicit. The patch fixes that. This also fixes a few issues in existent test cases and removes one precompiled object. Differential revision: https://reviews.llvm.org/D63267 llvm-svn: 363377	2019-06-14 12:16:59 +00:00
James Henderson	f7cfabb45d	[llvm-readobj] Don't abort printing of dynamic table if string reference is invalid If dynamic table is missing, output "dynamic strtab not found'. If the index is out of range, output "Invalid Offset<..>". https://bugs.llvm.org/show_bug.cgi?id=40807 Reviewed by: jhenderson, grimar, MaskRay Differential Revision: https://reviews.llvm.org/D63084 Patch by Yuanfang Chen. llvm-svn: 363374	2019-06-14 12:02:01 +00:00
George Rimar	d6df7ded6e	[llvm-readobj] - Do not fail to dump the object which has wrong type of .shstrtab. Imagine we have object that has .shstrtab with type != SHT_STRTAB. In this case, we fail to dump the object, though GNU readelf dumps it without any issues and warnings. This patch fixes that. It adds a code to ELFDumper.cpp which is based on the implementation of getSectionName from the ELF.h: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Object/ELF.h#L608 https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Object/ELF.h#L431 https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Object/ELF.h#L539 The difference is that all non critical errors are ommitted what allows us to improve the dumping on a tool side. Also, this opens a road for a follow-up that should allow us to dump the section headers, but drop the section names in case if .shstrtab is completely absent and/or broken. Differential revision: https://reviews.llvm.org/D63266 llvm-svn: 363371	2019-06-14 11:56:10 +00:00
Sjoerd Meijer	3058a62b90	[ARM] MVE VPT Block Pass Initial commit of a new pass to create vector predication blocks, called VPT blocks, that are supported by the Armv8.1-M MVE architecture. This is a first naive implementation. I.e., for 2 consecutive predicated instructions I1 and I2, for example, it will generate 2 VPT blocks: VPST I1 VPST I2 A more optimal implementation would obviously put instructions in the same VPT block when they are predicated on the same condition and when it is allowed to do this: VPTT I1 I2 We will address this optimisation with follow up patches when the groundwork is in. Creating VPT Blocks is very similar to IT Blocks, which is the reason I added this to Thumb2ITBlocks.cpp. This allows reuse of the def use analysis that we need for the more optimal implementation. VPT blocks cannot be nested in IT blocks, and vice versa, and so these 2 passes cannot interact with each other. Instructions allowed in VPT blocks must be MVE instructions that are marked as VPT compatible. Differential Revision: https://reviews.llvm.org/D63247 llvm-svn: 363370	2019-06-14 11:46:05 +00:00
George Rimar	43f62ff17c	[yaml2obj] - Allow setting the custom Address for .strtab Despite the fact that .strtab is non-allocatable, there is no reason to disallow setting the custom address for it. The patch also adds a test case showing we can set any address we want for other implicit sections. Differential revision: https://reviews.llvm.org/D63137 llvm-svn: 363368	2019-06-14 11:13:32 +00:00
George Rimar	cfa1a62a4c	[yaml2obj] - Allow setting cutom Flags for implicit sections. With this patch we get ability to set any flags we want for implicit sections defined in YAML. Differential revision: https://reviews.llvm.org/D63136 llvm-svn: 363367	2019-06-14 11:01:14 +00:00
Sam Parker	0cf9639a9c	[SCEV] Pass NoWrapFlags when expanding an AddExpr InsertBinop now accepts NoWrapFlags, so pass them through when expanding a simple add expression. This is the first re-commit of the functional changes from rL362687, which was previously reverted. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 363364	2019-06-14 09:19:41 +00:00
Eugene Leviant	d46ebd207b	[llvm-objcopy][IHEX] Improve test case formatting. NFC Differential revision: https://reviews.llvm.org/D63258 llvm-svn: 363359	2019-06-14 08:09:10 +00:00
Alex Brachet	d54d4f9905	[llvm-objcopy] Changed command line parsing errors Summary: Tidied up errors during command line parsing to be more consistent with the rest of llvm-objcopy errors. Reviewers: jhenderson, rupprecht, espindola, alexshap Reviewed By: jhenderson, rupprecht Subscribers: emaste, arichardson, MaskRay, llvm-commits, jakehehrlich Tags: #llvm Differential Revision: https://reviews.llvm.org/D62973 llvm-svn: 363350	2019-06-14 02:04:02 +00:00
David Blaikie	4129e3e0f8	DebugInfo: Include enumerators in pubnames This is consistent with GCC's behavior (which is the defacto standard for pubnames). Though I find the presence of enumerators from enum classes to be a bit confusing, possibly a bug on GCC's end (since they can't be named unqualified, unlike the other names - and names nested in classes don't go in pubnames, for instance - presumably because one must name the class first & that's enough to limit the scope of the search) llvm-svn: 363349	2019-06-14 01:58:56 +00:00
Tim Shen	4121bdc3d4	[X86] Add target triple for live-debug-values-fragments.mir llvm-svn: 363348	2019-06-14 01:41:04 +00:00
Douglas Yung	5b188f8dac	Add REQUIRES: zlib to test added in r363325 as the profile uses zlib compression. llvm-svn: 363347	2019-06-14 01:08:50 +00:00
Stanislav Mekhanoshin	c43e67bfff	[AMDGPU] gfx1011/gfx1012 targets Differential Revision: https://reviews.llvm.org/D63307 llvm-svn: 363344	2019-06-14 00:33:31 +00:00
Stanislav Mekhanoshin	68a2fef9ae	[AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32 Differential Revision: https://reviews.llvm.org/D63301 llvm-svn: 363339	2019-06-13 23:47:36 +00:00
Seiya Nuta	b1027a480a	[llvm-objcopy] Fix sparc target endianness Summary: AFAIK, the "sparc" target is big endian and the target for 32-bit little-endian SPARC is denoted as "sparcel". This patch fixes the endianness of "sparc" target and adds "sparcel" target for 32-bit little-endian SPARC. Reviewers: espindola, alexshap, rupprecht, jhenderson Reviewed By: jhenderson Subscribers: jyknight, emaste, arichardson, fedor.sergeev, jakehehrlich, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63251 llvm-svn: 363336	2019-06-13 23:24:12 +00:00
Amy Huang	49275272e3	Use fully qualified name when printing S_CONSTANT records Summary: Before it was using the fully qualified name only for static data members. Now it does for all variable names to match MSVC. Reviewers: rnk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63012 llvm-svn: 363335	2019-06-13 22:53:43 +00:00
Amara Emerson	fb0a40f064	[GlobalISel][IRTranslator] Add debug loc with line 0 to constants emitted into the entry block. Constants, including G_GLOBAL_VALUE, are all emitted into the entry block which lets us use the vreg def assuming it dominates all other users. However, it can cause jumpy debug behaviour since the DebugLoc attached to these MIs are from a user instruction that could be in a different block. Fixes PR40887. Differential Revision: https://reviews.llvm.org/D63286 llvm-svn: 363331	2019-06-13 22:15:35 +00:00
Jinsong Ji	1c88445840	[MachinePiepliner] Don't check boundary node in checkValidNodeOrder This was exposed by PowerPC target enablement. In ScheduleDAG, if we haven't seen any uses in this scheduling region, we will create a dependence edge to ExitSU to model the live-out latency. This is required for vreg defs with no in-region use, and prefetches with no vreg def. When we build NodeOrder in Scheduler, we ignore these boundary nodes. However, when we check Succs in checkValidNodeOrder, we did not skip them, so we still assume all the nodes have been sorted and in order in Indices array. So when we call lower_bound() for ExitSU, it will return Indices.end(), causing memory issues in following Node access. Differential Revision: https://reviews.llvm.org/D63282 llvm-svn: 363329	2019-06-13 21:51:12 +00:00
Vedant Kumar	901d04fc6d	[Coverage] Load code coverage data from archives Support loading code coverage data from regular archives, thin archives, and from MachO universal binaries which contain archives. Testing: check-llvm, check-profile (with {A,UB}San enabled) rdar://51538999 Differential Revision: https://reviews.llvm.org/D63232 llvm-svn: 363325	2019-06-13 20:48:57 +00:00
Shawn Landden	24f4085811	[SimplifyCFG] NFC, update Switch tests as a baseline. Also add baseline tests to show effect of later patches. There were a couple of regressions here that were never caught, but my patch set that this is a preparation to will fix them. This is the third attempt to land this patch. Differential Revision: https://reviews.llvm.org/D61150 llvm-svn: 363319	2019-06-13 19:36:38 +00:00
Cameron McInally	79ec1a2957	Revert "[NFC][CodeGen] Add unary fneg tests to fp-fast.ll fp-fold.ll fp-in-intregs.ll fp-stack-compare-cmov.ll fp-stack-compare.ll fsxor-alignment.ll" This reverts commit `1d85a7518c`. llvm-svn: 363317	2019-06-13 19:25:16 +00:00
Cameron McInally	07514a1b16	Revert "[NFC][CodeGen] Add unary fneg tests to fmul-combines.ll fnabs.ll" This reverts commit `5c01140581`. llvm-svn: 363316	2019-06-13 19:25:12 +00:00
Cameron McInally	8984dbc27c	Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma_patterns_wide.ll" This reverts commit `f1b8c6ac4f`. llvm-svn: 363315	2019-06-13 19:25:09 +00:00
Cameron McInally	5d9271802b	Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma_patterns.ll" This reverts commit `06de52674d`. llvm-svn: 363314	2019-06-13 19:25:06 +00:00
Cameron McInally	d331e71bdb	Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma4-fneg-combine.ll" This reverts commit `f288a0685f`. llvm-svn: 363313	2019-06-13 19:25:03 +00:00
Cameron McInally	31da4f80d5	Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma-scalar-combine.ll" This reverts commit `3d2ee0053a`. llvm-svn: 363312	2019-06-13 19:25:00 +00:00
Cameron McInally	d3eaa332e4	Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma-intrinsics-x86.ll" This reverts commit `169fc2b020`. llvm-svn: 363311	2019-06-13 19:24:57 +00:00
Cameron McInally	2aff82bfa6	Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma4-intrinsics-x86.ll" This reverts commit `66f286845c`. llvm-svn: 363310	2019-06-13 19:24:54 +00:00
Cameron McInally	0a3fe05047	Revert "[NFC][CodeGen] Add unary FNeg tests to some X86/ and XCore/ tests." This reverts commit `4f3cf3853e`. llvm-svn: 363309	2019-06-13 19:24:51 +00:00
Cameron McInally	a0d06a626f	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/fma-intrinsics-canonical.ll" This reverts commit `ee5881a88c`. llvm-svn: 363308	2019-06-13 19:24:47 +00:00
Cameron McInally	a37d925d3d	Revert "[NFC][CodeGen] Forgot 2 unary FNeg tests in X86/fma-intrinsics-canonical.ll" This reverts commit `5f39a3096f`. llvm-svn: 363307	2019-06-13 19:24:44 +00:00
Cameron McInally	e00198f7a8	Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma-fneg-combine.ll" This reverts commit `10c0855542`. llvm-svn: 363306	2019-06-13 19:24:41 +00:00
Cameron McInally	ea28a063fd	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/combine-fcopysign.ll X86/dag-fmf-cse.ll X86/fast-isel-fneg.ll X86/fdiv.ll" This reverts commit `e04c4b6af8`. llvm-svn: 363305	2019-06-13 19:24:38 +00:00
Cameron McInally	4890457196	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll X86/combine-fabs.ll" This reverts commit `6fe46ec25d`. llvm-svn: 363304	2019-06-13 19:24:34 +00:00
Cameron McInally	21a29a9e65	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll" This reverts commit `2aa5ada267`. llvm-svn: 363303	2019-06-13 19:24:31 +00:00
Cameron McInally	7d4e7efd2e	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll" This reverts commit `27a5db9de5`. llvm-svn: 363302	2019-06-13 19:24:28 +00:00
Cameron McInally	8608afa964	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll" This reverts commit `41e0b9f280`. llvm-svn: 363301	2019-06-13 19:24:24 +00:00
Cameron McInally	675be5db46	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll" This reverts commit `aeb89f8b33`. llvm-svn: 363300	2019-06-13 19:24:21 +00:00
Stanislav Mekhanoshin	335f9883f0	[AMDGPU] gfx1010: small test change for wave32. NFC llvm-svn: 363297	2019-06-13 19:05:04 +00:00
Sanjay Patel	5bf7f81aa8	[InstCombine] add test for failed libfunction prototype matching; NFC llvm-svn: 363291	2019-06-13 18:26:10 +00:00
Philip Reames	eb88badff9	Fix a bug w/inbounds invalidation in LFTR This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV. The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying. As such, our potential UB triggering use does not change the semantics of the original program. As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well. (Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.) Differential Revision: https://reviews.llvm.org/D62939 llvm-svn: 363289	2019-06-13 18:23:13 +00:00
Sanjay Patel	4d93fb528e	[InstCombine] auto-generate complete test checks; NFC llvm-svn: 363286	2019-06-13 18:14:49 +00:00
David Bolvansky	a9d8388e80	[NFC] Updated testcase for D54411/rL363284 llvm-svn: 363285	2019-06-13 18:13:03 +00:00
David Bolvansky	896ece41e4	[Codegen] Merge tail blocks with no successors after block placement Summary: I found the following case having tail blocks with no successors merging opportunities after block placement. Before block placement: bb0: ... bne a0, 0, bb2: bb1: mv a0, 1 ret bb2: ... bb3: mv a0, 1 ret bb4: mv a0, -1 ret The conditional branch bne in bb0 is opposite to beq. After block placement: bb0: ... beq a0, 0, bb1 bb2: ... bb4: mv a0, -1 ret bb1: mv a0, 1 ret bb3: mv a0, 1 ret After block placement, that appears new tail merging opportunity, bb1 and bb3 can be merged as one block. So the conditional constraint for merging tail blocks with no successors should be removed. In my experiment for RISC-V, it decreases code size. Author of original patch: Jim Lin Reviewers: haicheng, aheejin, craig.topper, rnk, RKSimon, Jim, dmgreen Reviewed By: Jim, dmgreen Subscribers: xbolva00, dschuff, javed.absar, sbc100, jgravelle-google, aheejin, kito-cheng, dmgreen, PkmX, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54411 llvm-svn: 363284	2019-06-13 18:11:32 +00:00
Stanislav Mekhanoshin	2bda177da0	[AMDGPU] ImmArg and SourceOfDivergence for permlane/dpp Added missing ImmArg and SourceOfDivergence to the crosslane intrinsics. Differential Revision: https://reviews.llvm.org/D63216 llvm-svn: 363276	2019-06-13 16:31:51 +00:00
Cameron McInally	aeb89f8b33	[NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll Patch 2 of n. llvm-svn: 363275	2019-06-13 15:54:20 +00:00
Joseph Tremoulet	3bc6e2a7aa	[EarlyCSE] Ensure equal keys have the same hash value Summary: The logic in EarlyCSE that looks through 'not' operations in the predicate recognizes e.g. that `select (not (cmp sgt X, Y)), X, Y` is equivalent to `select (cmp sgt X, Y), Y, X`. Without this change, however, only the latter is recognized as a form of `smin X, Y`, so the two expressions receive different hash codes. This leads to missed optimization opportunities when the quadratic probing for the two hashes doesn't happen to collide, and assertion failures when probing doesn't collide on insertion but does collide on a subsequent table grow operation. This change inverts the order of some of the pattern matching, checking first for the optional `not` and then for the min/max/abs patterns, so that e.g. both expressions above are recognized as a form of `smin X, Y`. It also adds an assertion to isEqual verifying that it implies equal hash codes; this fires when there's a collision during insertion, not just grow, and so will make it easier to notice if these functions fall out of sync again. A new flag --earlycse-debug-hash is added which can be used when changing the hash function; it forces hash collisions so that any pair of values inserted which compare as equal but hash differently will be caught by the isEqual assertion. Reviewers: spatel, nikic Reviewed By: spatel, nikic Subscribers: lebedev.ri, arsenm, craig.topper, efriedma, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62644 llvm-svn: 363274	2019-06-13 15:24:11 +00:00
Diogo N. Sampaio	0be2d25ecc	[FIX] Forces shrink wrapping to consider any memory access as aliasing with the stack Summary: Relate bug: https://bugs.llvm.org/show_bug.cgi?id=37472 The shrink wrapping pass prematurally restores the stack, at a point where the stack might still be accessed. Taking an exception can cause the stack to be corrupted. As a first approach, this patch is overly conservative, assuming that any instruction that may load or store could access the stack. Reviewers: dmgreen, qcolombet Reviewed By: qcolombet Subscribers: simpal01, efriedma, eli.friedman, javed.absar, llvm-commits, eugenis, chill, carwil, thegameg Tags: #llvm Differential Revision: https://reviews.llvm.org/D63152 llvm-svn: 363265	2019-06-13 13:56:19 +00:00
Simon Tatham	286e1d2c2d	[ARM] Set up infrastructure for MVE vector instructions. This commit prepares the way to start adding the main collection of MVE instructions, which operate on the 128-bit vector registers. The most obvious thing that's needed, and the simplest, is to add the MQPR register class, which is like the existing QPR except that it has fewer registers in it. The more complicated part: MVE defines a system of vector predication, in which instructions operating on 128-bit vector registers can be constrained to operate on only a subset of the lanes, using a system of prefix instructions similar to the existing Thumb IT, in that you have one prefix instruction which designates up to 4 following instructions as subject to predication, and within that sequence, the predicate can be inverted by means of T/E suffixes ('Then' / 'Else'). To support instructions of this type, we've added two new Tablegen classes `vpred_n` and `vpred_r` for standard clusters of MC operands to add to a predicated instruction. Both include a flag indicating how the instruction is predicated at all (options are T, E and 'not predicated'), and an input register field for the register controlling the set of active lanes. They differ from each other in that `vpred_r` also includes an input operand for the previous value of the output register, for instructions that leave inactive lanes unchanged. `vpred_n` lacks that extra operand; it will be used for instructions that don't preserve inactive lanes in their output register (either because inactive lanes are zeroed, as the MVE load instructions do, or because the output register isn't a vector at all). This commit also adds the family of prefix instructions themselves (VPT / VPST), and all the machinery needed to work with them in assembly and disassembly (e.g. generating the 't' and 'e' mnemonic suffixes on disassembled instructions within a predicated block) I've added a couple of demo instructions that derive from the new Tablegen base classes and use those two operand clusters. The bulk of the vector instructions will come in followup commits small enough to be manageable. (One exception is that I've added the full version of `isMnemonicVPTPredicable` in the AsmParser, because it seemed pointless to carefully split it up.) Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62669 llvm-svn: 363258	2019-06-13 13:11:13 +00:00
Jeremy Morse	bf2b2f08b0	[DebugInfo] Honour variable fragments in LiveDebugValues This patch makes the LiveDebugValues pass consider fragments when propagating DBG_VALUE insts between blocks, fixing PR41979. Fragment info for a variable location is added to the open-ranges key, which allows distinct fragments to be tracked separately. To handle overlapping fragments things become slightly funkier. To avoid excessive searching for overlaps in the data-flow part of LiveDebugValues, this patch: * Pre-computes pairings of fragments that overlap, for each DILocalVariable * During data-flow, whenever something happens that causes an open range to be terminated (via erase), any fragments pre-determined to overlap are also terminated. The effect of which is that when encountering a DBG_VALUE fragment that overlaps others, the overlapped fragments do not get propagated to other blocks. We still rely on later location-list building to correctly handle overlapping fragments within blocks. It's unclear whether a mixture of DBG_VALUEs with and without fragmented expressions are legitimate. To avoid suprises, this patch interprets a DBG_VALUE with no fragment as overlapping any DBG_VALUE _with_ a fragment. Differential Revision: https://reviews.llvm.org/D62904 llvm-svn: 363256	2019-06-13 12:51:57 +00:00
Dmitry Preobrazhensky	1fca3b1972	[AMDGPU][MC] Enabled constant expressions as operands of s_getreg/s_setreg See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61125 llvm-svn: 363255	2019-06-13 12:46:37 +00:00
Simon Pilgrim	a284f4fa7c	[X86][AVX] Add broadcast(v4f64 hadd) test llvm-svn: 363252	2019-06-13 11:42:32 +00:00
Simon Pilgrim	0baf136a4d	[X86][SSE] Avoid assert for broadcast(horiz-op()) cases for non-f64 cases. Based on fuzz test from @craig.topper llvm-svn: 363251	2019-06-13 11:26:21 +00:00
Simon Pilgrim	a6b87aa7ee	[X86][SSE] Add tests for underaligned nt stores Test both 'unaligned' (which we should scalarize) and 'subvector aligned' (which we should split) llvm-svn: 363249	2019-06-13 10:41:56 +00:00
Chris Jackson	7b39513302	[llvm-nm] Additional lit tests for command line options Differential Revision: https://reviews.llvm.org/D62955 llvm-svn: 363248	2019-06-13 10:39:36 +00:00
Simon Pilgrim	e1aea85896	[X86][SSE] Add SSE4A nt store tests on X86 as well as X64 We should be able to use MOVNTSD (f64) instead of MOVNTI (i32) to reduce the number of ops 32-bit targets Pulled out of D63246 llvm-svn: 363247	2019-06-13 10:30:12 +00:00
Jeremy Morse	181bf0cefb	[DebugInfo] Use FrameDestroy to extend stack locations to end-of-function We aim to ignore changes in variable locations during the prologue and epilogue of functions, to avoid using space documenting location changes that aren't visible. However in D61940 / r362951 this got ripped out as the previous implementation was unsound. Instead, use the FrameDestroy flag to identify when we're in the epilogue of a function, and ignore variable location changes accordingly. This fits in with existing code that examines the FrameSetup flag. Some variable locations get shuffled in modified tests as they now cover greater ranges, which is what would be expected. Some additional single-location variables are generated too. Two tests are un-xfailed, they were only xfailed due to r362951 deleting functionality they depended on. Apparently some out-of-tree backends don't accurately maintain FrameDestroy flags -- if you're an out-of-tree maintainer and see changes in variable locations disappear due to a faulty FrameDestroy flag, it's safe to back this change out. The impact is just slightly more debug info than necessary. Differential Revision: https://reviews.llvm.org/D62314 llvm-svn: 363245	2019-06-13 10:03:17 +00:00
Eugene Leviant	86b7f865ac	[llvm-objcopy] Implement IHEX reader This is the final part of IHEX format support in llvm-objcopy Differential revision: https://reviews.llvm.org/D62583 llvm-svn: 363243	2019-06-13 09:56:14 +00:00
Sander de Smalen	51c2fa0e2a	Improve reduction intrinsics by overloading result value. This patch uses the mechanism from D62995 to strengthen the definitions of the reduction intrinsics by letting the scalar result/accumulator type be overloaded from the vector element type. For example: ; The LLVM LangRef specifies that the scalar result must equal the ; vector element type, but this is not checked/enforced by LLVM. declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a) This patch changes that into: declare i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> %a) Which has the type-constraint more explicit and causes LLVM to check the result type with the vector element type. Reviewers: RKSimon, arsenm, rnk, greened, aemerson Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D62996 llvm-svn: 363240	2019-06-13 09:37:38 +00:00
Owen Reynolds	8d59f5370d	Revert [llvm-ar][test] Add to MRI test coverage This reverts 363232 due to mru-utf8.test buildbot test failure Differential Revision: https://reviews.llvm.org/D63197 llvm-svn: 363239	2019-06-13 09:02:33 +00:00
Sam Parker	9d28473a35	[ARM][TTI] Scan for existing loop intrinsics TTI should report that it's not profitable to generate a hardware loop if it, or one of its child loops, has already been converted. Differential Revision: https://reviews.llvm.org/D63212 llvm-svn: 363234	2019-06-13 08:28:46 +00:00
Owen Reynolds	02eac87ba3	[llvm-ar][test] Add to MRI test coverage This change adds tests to cover existing MRI script functionality. Differential Revision: https://reviews.llvm.org/D63197 llvm-svn: 363232	2019-06-13 07:45:12 +00:00
Craig Topper	b1daec0eae	[X86] Correct instruction operands in evex-to-vex-compress.mir to be closer to real instructions. $noreg was being used way more than it should have. We also had xmm registers in addressing modes. Mostly found by hacking the machine verifier to do some stricter checking that happened to work for this test, but not sure if generally applicable for other tests or other targets. llvm-svn: 363231	2019-06-13 07:11:02 +00:00
Shawn Landden	8b142bcc3f	[SimplifyCFG] reverting preliminary Switch patches again This reverts 363226 and 363227, both NFC intended I swear I fixed the test case that is failing, and ran the tests, but I will look into it again. llvm-svn: 363229	2019-06-13 05:26:17 +00:00
Shawn Landden	c54b2011bd	[SimplifyCFG] NFC, update Switch tests to better examine successive patches Also add baseline tests to show effect of later patches. There were a couple of regressions here that were never caught, but my patch set that this is a preparation to will fix them. Differential Revision: https://reviews.llvm.org/D61150 llvm-svn: 363226	2019-06-13 04:51:35 +00:00
Craig Topper	387acd64f3	[X86] Add tests for some the special cases in EVEX to VEX to the evex-to-vex-compress.mir test. llvm-svn: 363224	2019-06-13 04:10:08 +00:00
Shawn Landden	c6cba2957d	[SimplifyCFG] revert the last commit. I ran ALL the test suite locally, so I will look into this... llvm-svn: 363223	2019-06-13 02:47:47 +00:00
Shawn Landden	f93b99b2b6	[SimplifyCFG] NFC, update Switch tests to HEAD so I can see if my changes change anything Also add baseline tests to show effect of later patches. Differential Revision: https://reviews.llvm.org/D61150 llvm-svn: 363222	2019-06-13 02:24:24 +00:00
David L. Jones	c73fadaa84	Revert r361811: 'Re-commit r357452 (take 2): "SimplifyCFG SinkCommonCodeFromPredecessors ...' We have observed some failures with internal builds with this revision. - Performance regressions: - llvm's SingleSource/Misc evalloop shows performance regressions (although these may be red herrings). - Benchmarks for Abseil's SwissTable. - Correctness: - Failures for particular libicu tests when building the Google AppEngine SDK (for PHP). hwennborg has already been notified, and is aware of reproducer failures. llvm-svn: 363220	2019-06-13 02:04:45 +00:00
Dinar Temirbulatov	b2f45ba1e8	[SLP] Update propagate_ir_flags.ll test to check that we do retain the common subset, NFC. llvm-svn: 363218	2019-06-13 00:19:50 +00:00
Philip Reames	0bded8442f	[Tests] Highlight impact of multiple exit LFTR (D62625) as requested by reviewer llvm-svn: 363217	2019-06-12 23:39:49 +00:00
Cameron McInally	41e0b9f280	[NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll Patch 1 of n. llvm-svn: 363215	2019-06-12 22:50:44 +00:00
Sanjay Patel	a1421e8347	[x86] add tests for vector shifts; NFC llvm-svn: 363203	2019-06-12 21:30:06 +00:00
Cameron McInally	27a5db9de5	[NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll Patch 3 of 3 for X86/avx512vl-intrinsics-fast-isel.ll llvm-svn: 363200	2019-06-12 20:56:59 +00:00
Jordan Rupprecht	565f1e2298	[llvm-readobj] Fix output interleaving issue caused by using multiple streams at the same time. Summary: Use llvm::fouts() as the default stream for outputing. No new stream should be constructed to output at the same time. https://bugs.llvm.org/show_bug.cgi?id=42140 Reviewers: jhenderson, grimar, MaskRay, phosek, rupprecht Reviewed By: rupprecht Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63115 Patch by Yuanfang Chen! llvm-svn: 363198	2019-06-12 20:16:22 +00:00
Cameron McInally	2aa5ada267	[NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll Patch 2 of 3 for X86/avx512vl-intrinsics-fast-isel.ll llvm-svn: 363194	2019-06-12 19:39:42 +00:00
Philip Reames	00e481b75d	[Tests] Autogen RLEV test and add tests for a future enhancement llvm-svn: 363193	2019-06-12 19:23:10 +00:00
Philip Reames	851adc000c	[Tests] Add tests to highlight sibling loop optimization order issue for exit rewriting The issue addressed in r363180 is more broadly relevant. For the moment, we don't actually get any of these cases because we a) restrict SCEV formation due to SCEExpander needing to preserve LCSSA, and b) don't iterate between loops. llvm-svn: 363192	2019-06-12 19:04:51 +00:00
Stanislav Mekhanoshin	000f9cc62a	[AMDGPU] more gfx1010 tests. NFC. llvm-svn: 363190	2019-06-12 18:44:11 +00:00
Jordan Rupprecht	146a154e61	[llvm-ar][test] Relax lit directory assumptions in thin-archive.test Summary: thin-archive.test assumes the Output/<testname> structure that lit creates. Rewrite the test in a way that still tests the same thing (creating via relative path and adding via absolute path) but doesn't assume this specific lit structure, making it possible to run in a lit emulator. Reviewers: gbreynoo Reviewed By: gbreynoo Subscribers: llvm-commits, bkramer Tags: #llvm Differential Revision: https://reviews.llvm.org/D62930 llvm-svn: 363189	2019-06-12 18:41:27 +00:00
Stanislav Mekhanoshin	245b5ba344	[AMDGPU] gfx1010 dpp16 and dpp8 Differential Revision: https://reviews.llvm.org/D63203 llvm-svn: 363186	2019-06-12 18:02:41 +00:00
Stanislav Mekhanoshin	5f581c9f08	[AMDGPU] gfx1010 premlane instructions Differential Revision: https://reviews.llvm.org/D63202 llvm-svn: 363185	2019-06-12 17:52:51 +00:00
Simon Atanasyan	efc0d1a298	[Mips] Add s.d instruction alias for Mips1 Add support for s.d instruction for Mips1 which expands into two swc1 instructions. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D63199 llvm-svn: 363184	2019-06-12 17:52:05 +00:00
Simon Pilgrim	ef7d4fbe80	[X86][SSE] Avoid unnecessary stack codegen in NT merge-consecutive-stores codegen tests. llvm-svn: 363181	2019-06-12 17:28:48 +00:00
Philip Reames	e51c3d8b82	[SCEV] Teach computeSCEVAtScope benefit from one-input Phi. PR39673 SCEV does not propagate arguments through one-input Phis so as to make it easy for the SCEV expander (and related code) to preserve LCSSA. It's not entirely clear this restriction is neccessary, but for the moment it exists. For this reason, we don't analyze single-entry phi inputs. However it is possible that when an this input leaves the loop through LCSSA Phi, it is a provable constant. Missing that results in an order of optimization issue in loop exit value rewriting where we miss some oppurtunities based on order in which we visit sibling loops. This patch teaches computeSCEVAtScope about this case. We can generalize it later, but so far we can only replace LCSSA Phis with their constant loop-exiting values. We should probably also add similiar logic directly in the SCEV construction path itself. Patch by: mkazantsev (with revised commit message by me) Differential Revision: https://reviews.llvm.org/D58113 llvm-svn: 363180	2019-06-12 17:21:47 +00:00
Simon Pilgrim	5b0e0dd709	[X86][AVX] Fold concat(vpermilps(x,c),vpermilps(y,c)) -> vpermilps(concat(x,y),c) Handles PSHUFD/PSHUFLW/PSHUFHW (AVX2) + VPERMILPS (AVX1). An extra AVX1 PSHUFD->VPERMILPS combine will be added in a future commit. llvm-svn: 363178	2019-06-12 16:38:20 +00:00
Sanjay Patel	64006896ac	[InstCombine] add tests for fmin/fmax libcalls; NFC llvm-svn: 363175	2019-06-12 15:29:40 +00:00
Sam Parker	3d42959dd8	Revert rL363156. The patch was to fix buildbots, but rL363157 should now be fixing it in a cleaner way. llvm-svn: 363174	2019-06-12 15:28:00 +00:00
David Bolvansky	48365ec3e1	[NFC[ Updated tests for D54411 llvm-svn: 363173	2019-06-12 15:01:36 +00:00
Matt Arsenault	f29366b1f5	StackProtector: Use PointerMayBeCaptured This was using its own, outdated list of possible captures. This was at minimum not catching cmpxchg and addrspacecast captures. One change is now any volatile access is treated as capturing. The test coverage for this pass is quite inadequate, but this required removing volatile in the lifetime capture test. Also fixes some infrastructure issues to allow running just the IR pass. Fixes bug 42238. llvm-svn: 363169	2019-06-12 14:23:33 +00:00

... 5 6 7 8 9 ...

62877 Commits