llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	d8409b178e	AMDGPU/GlobalISel: Fix RegBankSelect for unaligned, uniform constant loads llvm-svn: 371416	2019-09-09 16:06:37 +00:00
Matt Arsenault	02eb308387	AMDGPU/GlobalISel: Fix regbankselect for uniform extloads There are no scalar extloads. llvm-svn: 371414	2019-09-09 16:03:45 +00:00
Matt Arsenault	ebbd6e4976	AMDGPU: Remove code address space predicates Fixes 8-byte, 8-byte aligned LDS loads. 16-byte case still broken due to not be reported as legal. llvm-svn: 371413	2019-09-09 16:02:07 +00:00
Matt Arsenault	c34b4036ff	AMDGPU/GlobalISel: Select G_PTR_MASK llvm-svn: 371412	2019-09-09 15:46:13 +00:00
Matt Arsenault	fdb7030117	AMDGPU/GlobalISel: Fix reg bank for uniform LDS loads The pointer is always a VGPR. Also fix hardcoding the pointer size to 64. llvm-svn: 371411	2019-09-09 15:44:16 +00:00
Matt Arsenault	2dd088ec7d	AMDGPU/GlobalISel: Use known bits for selection llvm-svn: 371409	2019-09-09 15:39:32 +00:00
Matt Arsenault	8e3bc9b572	AMDGPU/GlobalISel: Legalize wavefrontsize intrinsic llvm-svn: 371407	2019-09-09 15:20:49 +00:00
Simon Tatham	0e48bd24e2	[ARM] Remove some spurious MVE reduction instructions. The family of 'dual-accumulating' vector multiply-add instructions (VMLADAV, VMLALDAV and VRMLALDAVH) can all operate on both signed and unsigned integer types, and they all have an 'exchange' variant (with an X in the name) that modifies which pairs of vector lanes in the two inputs are multiplied together. But there's a clause in the spec that says that the X variants //don't// operate on unsigned integer types, only signed. You can have X, or unsigned, or neither, but not both. We didn't notice that clause when we implemented the MC support for these instructions, so LLVM believes that things like VMLADAVX.U8 do exist, contradicting the spec. Here I fix that by conditioning them out in Tablegen. In order to do that, I've reversed the nesting order of the Tablegen multiclasses for those instructions. Previously, the innermost multiclass generated the X and not-X variants, and the one outside that generated the A and not-A variants. Now X is done by the outer multiclass, which allows me to bypass that one when I only want the two not-X variants. Changing the multiclass nesting order also changes the names of the instruction ids unless I make a special effort not to. I decided that while I was changing them anyway I'd make them look nicer; so now the instructions have names like MVE_VMLADAVs32 or MVE_VMLADAVaxs32, instead of cumbersome _noacc_noexch suffixes. The corresponding multiply-subtract instructions are unaffected. Those don't accept unsigned types at all, either in the spec or in LLVM. Reviewers: ostannard, dmgreen Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67214 llvm-svn: 371405	2019-09-09 15:17:26 +00:00
Roman Lebedev	59608c0049	[NFC][InstCombine] Fixup test i added in rL371352. llvm-svn: 371401	2019-09-09 14:27:39 +00:00
James Molloy	b6c7fce67a	[DFAPacketizer] Reapply: Track resources for packetized instructions Reapply with fix to reduce resources required by the compiler - use unsigned[2] instead of std::pair. This causes clang and gcc to compile the generated file multiple times faster, and hopefully will reduce the resource requirements on Visual Studio also. This fix is a little ugly but it's clearly the same issue the previous author of DFAPacketizer faced (the previous tables use unsigned[2] rather uglily too). This patch allows the DFAPacketizer to be queried after a packet is formed to work out which resources were allocated to the packetized instructions. This is particularly important for targets that do their own bundle packing - it's not sufficient to know simply that instructions can share a packet; which slots are used is also required for encoding. This extends the emitter to emit a side-table containing resource usage diffs for each state transition. The packetizer maintains a set of all possible resource states in its current state. After packetization is complete, all remaining resource states are possible packetization strategies. The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default (most uses of the packetizer like MachinePipeliner don't care and don't need the extra maintained state). Differential Revision: https://reviews.llvm.org/D66936 llvm-svn: 371399	2019-09-09 13:17:55 +00:00
Clement Courbet	388b9794b6	[Inliner][NFC] Make test less brittle. Summary: This tests inlining size thresholds, but relies on the output of running the full O2 pipeline, making it brittle against changes in unrelated passes. Only run the inlining pass and set thresholds on the test RUN line instead. Found while investigating D60318. Reviewers: RKSimon, qcolombet Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67349 llvm-svn: 371397	2019-09-09 13:08:16 +00:00
Sam Parker	1ad508e8e2	[ARM][MVE] VCTP instruction selection Add codegen support for vctp{8,16,32}. Differential Revision: https://reviews.llvm.org/D67344 llvm-svn: 371395	2019-09-09 12:54:47 +00:00
Simon Pilgrim	462e3d8050	Revert rL371198 from llvm/trunk: [DFAPacketizer] Track resources for packetized instructions This patch allows the DFAPacketizer to be queried after a packet is formed to work out which resources were allocated to the packetized instructions. This is particularly important for targets that do their own bundle packing - it's not sufficient to know simply that instructions can share a packet; which slots are used is also required for encoding. This extends the emitter to emit a side-table containing resource usage diffs for each state transition. The packetizer maintains a set of all possible resource states in its current state. After packetization is complete, all remaining resource states are possible packetization strategies. The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default (most uses of the packetizer like MachinePipeliner don't care and don't need the extra maintained state). Differential Revision: https://reviews.llvm.org/D66936 ........ Reverted as this is causing "compiler out of heap space" errors on MSVC 2017/19 NDEBUG builds llvm-svn: 371393	2019-09-09 12:33:22 +00:00
Cullen Rhodes	55244beeee	[AArch64][SVE] Implement abs and neg intrinsics Summary: This patch implements two arithmetic intrinsics: * int_aarch64_sve_abs * int_aarch64_sve_neg testing the support for scalable vector types in intrinsics added in D65930. Reviewed By: greened Differential Revision: https://reviews.llvm.org/D65931 llvm-svn: 371388	2019-09-09 11:21:14 +00:00
Tim Northover	36147adc0b	GlobalISel: add combiner to form indexed loads. Loosely based on DAGCombiner version, but this part is slightly simpler in GlobalIsel because all address calculation is performed by G_GEP. That makes the inc/dec distinction moot so there's just pre/post to think about. No targets can handle it yet so testing is via a special flag that overrides target hooks. llvm-svn: 371384	2019-09-09 10:04:23 +00:00
George Rimar	c11af417e0	[yaml2obj] - Fix BB after r371380 Just a fix for an input file name. llvm-svn: 371383	2019-09-09 09:55:56 +00:00
George Rimar	3212ecfea8	[lib/ObjectYAML] - Improve and cleanup error reporting in ELFState<ELFT> class. The aim of this patch is to refactor how we handle and report error. I suggest to use the same approach we use in LLD: delayed error reporting. For that I introduced 'HasError' flag which triggers when we report an error. Now we do not exit instantly on any error. The benefits are: 1) There are no more 'exit(1)' calls in the library code. 2) Code was simplified significantly in a few places. 3) It is now possible to print multiple errors instead of only one. Also, I changed the messages to be lower case and removed a full stop. Differential revision: https://reviews.llvm.org/D67182 llvm-svn: 371380	2019-09-09 09:43:03 +00:00
Oliver Stannard	6b9aedaec6	[ARM][MVE] Decoding of uqrshl and sqrshl accepts unpredictable encodings Specify the Unpredictable bits, and return softfails when appropriate. Patch by Mark Murray! Differential revision: https://reviews.llvm.org/D66939 llvm-svn: 371374	2019-09-09 08:50:28 +00:00
Sam Parker	c363deb575	[ARM][ParallelDSP] Fix for sext input The incoming accumulator value can be discovered through a sext, in which case there will be a mismatch between the input and the result. So sign extend the accumulator input if we're performing a 64-bit mac. Differential Revision: https://reviews.llvm.org/D67220 llvm-svn: 371370	2019-09-09 08:39:14 +00:00
Craig Topper	a88f58ff0e	[X86] Add broadcast load unfolding support for vpcmpeq/vpcmpgt/vpcmp/vpcmpu. llvm-svn: 371368	2019-09-09 07:46:11 +00:00
Craig Topper	667f039c8c	[X86] Add broadcast load unfolding tests for vpcmpeq/vpcmpgt/vpcmp/vpcmpu. llvm-svn: 371367	2019-09-09 07:46:07 +00:00
Craig Topper	8c2ab1c4cb	[X86] Add broadcast load unfold support for smin/umin/smax/umax. llvm-svn: 371366	2019-09-09 06:32:24 +00:00
Craig Topper	68b2e1973f	[X86] Add broadcast load unfolding tests for smin/umin/smax/smin. llvm-svn: 371365	2019-09-09 06:32:20 +00:00
Craig Topper	ad7822329f	[X86] Add broadcast load unfolding support for VMAXPS/PD and VMINPS/PD. llvm-svn: 371363	2019-09-09 04:25:01 +00:00
Craig Topper	8d42a796c2	[X86] Add broadcast load unfolding tests for vmaxps/pd and vminps/pd llvm-svn: 371362	2019-09-09 04:24:57 +00:00
Craig Topper	197901081b	[X86] Add fp128 test cases for ceil/floor/trunc/nearbyint/rint/round libcalls. llvm-svn: 371360	2019-09-09 02:44:46 +00:00
Kai Luo	9115c477bb	[MachineCopyPropagation] Remove redundant copies after TailDup via machine-cp Summary: After tailduplication, we have redundant copies. We can remove these copies in machine-cp if it's safe to, i.e. ``` $reg0 = OP ... ... <<< No read or clobber of $reg0 and $reg1 $reg1 = COPY $reg0 <<< $reg0 is killed ... <RET> ``` will be transformed to ``` $reg1 = OP ... ... <RET> ``` Differential Revision: https://reviews.llvm.org/D65267 llvm-svn: 371359	2019-09-09 02:32:42 +00:00
Craig Topper	fb1e77505a	[X86] Add test cases for fptoui/fptosi/sitofp/uitofp between fp128 and i128. llvm-svn: 371358	2019-09-09 01:35:04 +00:00
Craig Topper	72624b0e59	[X86] Use xorps to create fp128 +0.0 constants. This matches what we do for f32/f64. gcc also does this for fp128. llvm-svn: 371357	2019-09-09 01:35:00 +00:00
Craig Topper	861d343949	[X86] Add avx and avx512f RUN lines to fp128-cast.ll llvm-svn: 371356	2019-09-09 01:34:55 +00:00
Douglas Yung	debac75dea	Relax opcode checks in test to check for only a number instead of a specific number. llvm-svn: 371355	2019-09-09 01:21:33 +00:00
Simon Pilgrim	e0ea746215	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add faux shuffle support. This patch decodes target and faux shuffles with getTargetShuffleInputs - a reduced version of resolveTargetShuffleInputs that doesn't resolve SM_SentinelZero cases, so we can correctly remove zero vectors if they aren't demanded. llvm-svn: 371353	2019-09-08 21:38:33 +00:00
Roman Lebedev	139a9d6c0e	[InstCombine][NFC] Some tests for usub overflow+nonzero check improvement (PR43251) https://rise4fun.com/Alive/kHq https://bugs.llvm.org/show_bug.cgi?id=43251 llvm-svn: 371352	2019-09-08 21:30:34 +00:00
Craig Topper	77dd86ee4a	[X86] Add a hack to combineVSelectWithAllOnesOrZeros to turn selects with two zero/undef vector inputs into an all zeroes vector. If the two zero vectors have undefs in different places they won't get combined by simplifySelect. This fixes a regression from an earlier commit. llvm-svn: 371351	2019-09-08 20:56:09 +00:00
Craig Topper	9c11901256	[X86] Remove call to getZeroVector from materializeVectorConstant. Add isel patterns for zero vectors with all types. The change to avx512-vec-cmp.ll is a regression, but should be easy to fix. It occurs because the getZeroVector call was canonicalizing both sides to the same node, then SimplifySelect was able to simplify it. But since only called getZeroVector on some VTs this isn't a robust way to combine this. The change to vector-shuffle-combining-ssse3.ll is more instructions, but removes a constant pool load so its unclear if its a regression or not. llvm-svn: 371350	2019-09-08 20:56:05 +00:00
Roman Lebedev	6e2c5c8710	[InstSimplify] simplifyUnsignedRangeCheck(): if we know that X != 0, handle more cases (PR43246) Summary: This is motivated by D67122 sanitizer check enhancement. That patch seemingly worsens `-fsanitize=pointer-overflow` overhead from 25% to 50%, which strongly implies missing folds. In this particular case, given ``` char* test(char& base, unsigned long offset) { return &base + offset; } ``` it will end up producing something like https://godbolt.org/z/LK5-iH which after optimizations reduces down to roughly ``` define i1 @t0(i8* nonnull %base, i64 %offset) { %base_int = ptrtoint i8* %base to i64 %adjusted = add i64 %base_int, %offset %non_null_after_adjustment = icmp ne i64 %adjusted, 0 %no_overflow_during_adjustment = icmp uge i64 %adjusted, %base_int %res = and i1 %non_null_after_adjustment, %no_overflow_during_adjustment ret i1 %res } ``` Without D67122 there was no `%non_null_after_adjustment`, and in this particular case we can get rid of the overhead: Here we add some offset to a non-null pointer, and check that the result does not overflow and is not a null pointer. But since the base pointer is already non-null, and we check for overflow, that overflow check will already catch the null pointer, so the separate null check is redundant and can be dropped. Alive proofs: https://rise4fun.com/Alive/WRzq There are more patterns of "unsigned-add-with-overflow", they are not handled here, but this is the main pattern, that we currently consider canonical, so it makes sense to handle it. https://bugs.llvm.org/show_bug.cgi?id=43246 Reviewers: spatel, nikic, vsk Reviewed By: spatel Subscribers: hiraditya, llvm-commits, reames Tags: #llvm Differential Revision: https://reviews.llvm.org/D67332 llvm-svn: 371349	2019-09-08 20:14:15 +00:00
Sanjay Patel	354a46444c	[InstCombine] add tests for icmp with srem operand; NFC llvm-svn: 371348	2019-09-08 19:48:47 +00:00
Roman Lebedev	94db67f0e1	[X86] X86DAGToDAGISel::combineIncDecVector(): call getSplatBuildVector() manually As reported in post-commit review of r370327, there is some case where the code crashes. As discussed with Craig Topper, the problem is that getConstant() internally calls getSplatBuildVector(), so we don't insert the constant itself. If we do that manually we're good. llvm-svn: 371346	2019-09-08 19:36:13 +00:00
Craig Topper	dac34f52d3	[DAGCombiner][X86][ARM] Teach visitMULO to fold multiplies with 0 to 0 and no carry. I modified the ARM test to use two inputs instead of 0 so the test hopefully still tests what was intended. llvm-svn: 371344	2019-09-08 19:24:39 +00:00
Craig Topper	30837abd96	[X86] Teach materializeVectorConstant to not call getZeroVector/getOnesVector on the types we already have isel patterns for. llvm-svn: 371343	2019-09-08 19:24:29 +00:00
Sanjay Patel	aff5bee35f	[InstCombine] fold extract+insert into identity shuffle This is similar to the existing fold for splats added with: rL365379 If we can adjust the shuffle mask to include another element in an identity mask (if it changes vector length, that's an extract/insert subvector operation in the backend), then that can eliminate extractelement/insertelement pairs in IR. All targets are expected to lower shuffles with identity masks efficiently. llvm-svn: 371340	2019-09-08 19:03:01 +00:00
Roman Lebedev	64965430db	[NFC][InstSimplify] Some tests for dropping null check after uadd.with.overflow of non-null (PR43246) https://rise4fun.com/Alive/WRzq Name: C <= Y && Y != 0 --> C <= Y iff C != 0 Pre: C != 0 %y_is_nonnull = icmp ne i64 %y, 0 %no_overflow = icmp ule i64 C, %y %r = and i1 %y_is_nonnull, %no_overflow => %r = %no_overflow Name: C <= Y \|\| Y != 0 --> Y != 0 iff C != 0 Pre: C != 0 %y_is_nonnull = icmp ne i64 %y, 0 %no_overflow = icmp ule i64 C, %y %r = or i1 %y_is_nonnull, %no_overflow => %r = %y_is_nonnull Name: C > Y \|\| Y == 0 --> C > Y iff C != 0 Pre: C != 0 %y_is_null = icmp eq i64 %y, 0 %overflow = icmp ugt i64 C, %y %r = or i1 %y_is_null, %overflow => %r = %overflow Name: C > Y && Y == 0 --> Y == 0 iff C != 0 Pre: C != 0 %y_is_null = icmp eq i64 %y, 0 %overflow = icmp ugt i64 C, %y %r = and i1 %y_is_null, %overflow => %r = %y_is_null https://bugs.llvm.org/show_bug.cgi?id=43246 llvm-svn: 371339	2019-09-08 17:50:40 +00:00
David Stenberg	5a583665f4	[DebugInfo][X86] Describe call site values for zero-valued imms Summary: Add zero-materializing XORs to X86's describeLoadedValue() hook in order to produce call site values. I have had to change the defs logic in collectCallSiteParameters() a bit to be able to describe the XORs. The XORs implicitly define $eflags, which would cause them to never be considered, due to a guard condition that I->getNumDefs() is one. I have changed that condition so that we now only consider instructions where a forwarded register overlaps with the instruction's single explicit define. We still need to collect the implicit defines of other forwarded registers to remove them from the work list. I'm not sure how to move towards supporting instructions with multiple explicit defines, cases where forwarded register are implicitly defined, and/or cases where an instruction produces values for multiple forwarded registers. Perhaps the describeLoadedValue() hook should take a register argument, and we then leave it up to the hook to describe the loaded value in that register? I have not yet encountered a situation where that would be necessary though. Reviewers: aprantl, vsk, djtodoro, NikolaPrica Reviewed By: vsk Subscribers: ychen, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D67225 llvm-svn: 371333	2019-09-08 14:22:06 +00:00
Simon Pilgrim	178cd2cd3a	[X86][SSE] Fix out of range shift introduced in D67070/rL371328 Use APInt to create the comparison mask instead. llvm-svn: 371330	2019-09-08 12:44:22 +00:00
Simon Pilgrim	9d57002070	[X86] Add test case for PR32546 llvm-svn: 371329	2019-09-08 11:56:07 +00:00
Simon Pilgrim	3262084384	[X86][SSE] Add support for <64 x i1> bool reduction This generalizes the existing <32 x i1> pre-AVX2 split code to support reductions from <64 x i1> as well, we can probably generalize to any larger pow2 case in the future if the (unlikely) need ever arises. We still need to tweak combineBitcastvxi1 to improve AVX512F codegen as its assumes vXi1 types should be handled on the mask registers even when they aren't legal. Differential Revision: https://reviews.llvm.org/D67070 llvm-svn: 371328	2019-09-08 11:46:21 +00:00
Craig Topper	37dd59298f	[X86] Make getZeroVector return floating point vectors in their native type on SSE2 and later. isel used to require zero vectors to be canonicalized to a single type to minimize the number of patterns needed to match. This is no longer required. I plan to do this to integers too, but floating point was simpler to start with. Integer has a complication where v32i16/v64i8 aren't legal when the other 512-bit integer types are. llvm-svn: 371325	2019-09-08 00:43:52 +00:00
Craig Topper	1829a09bea	[X86] Add support for unfold broadcast loads from FMA instructions. llvm-svn: 371323	2019-09-07 21:54:40 +00:00
Craig Topper	a461c26dd8	[X86] Add broadcast load unfolding tests for FMA instructions. llvm-svn: 371322	2019-09-07 21:54:36 +00:00
Sebastian Pop	eacb2c2c97	[aarch64] Add combine patterns for fp16 fmla This patch enables generation of fused multiply add/sub for instructions operating on fp16. Tested on aarch64-linux. Differential Revision: https://reviews.llvm.org/D67297 llvm-svn: 371321	2019-09-07 20:24:51 +00:00
Craig Topper	8cfff1e1bc	[X86] Add prefer-128-bit subtarget feature. Summary: Similar to the previous prefer-256-bit flag. We might want to enable this by default some CPUs. This just starts the initial work to implement and prove that it effects TTI's vector width. Reviewers: RKSimon, echristo, spatel, atdt Reviewed By: RKSimon Subscribers: lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67311 llvm-svn: 371319	2019-09-07 19:54:22 +00:00
Simon Pilgrim	31c98abda3	[X86][AVX] Add 'f5' v4f64 shuffle test mentioned in D66004 llvm-svn: 371314	2019-09-07 16:13:48 +00:00
Fangrui Song	72e99e63a2	[ELF][MC] Set types of aliases of IFunc to STT_GNU_IFUNC ``` .type foo,@gnu_indirect_function .set foo,foo_resolver .set foo2,foo .set foo3,foo2 ``` The types of foo2 and foo3 should be STT_GNU_IFUNC, but we currently resolve them to the type of foo_resolver. This patch fixes it. Differential Revision: https://reviews.llvm.org/D67206 Patch by Senran Zhang llvm-svn: 371312	2019-09-07 14:58:47 +00:00
Roman Lebedev	4e76f88072	[SimplifyCFG][NFC] Autogenerate PhiEliminate3.ll llvm-svn: 371311	2019-09-07 13:53:14 +00:00
Roman Lebedev	88bab08a88	[SimplifyCFG][NFC] Autogenerate two tests llvm-svn: 371310	2019-09-07 13:35:54 +00:00
Bjorn Pettersson	d065c81164	[CodeGen] Handle SMULFIXSAT with scale zero in TargetLowering::expandFixedPointMul Summary: Normally TargetLowering::expandFixedPointMul would handle SMULFIXSAT with scale zero by using an SMULO to compute the product and determine if saturation is needed (if overflow happened). But if SMULO isn't custom/legal it falls through and uses the same technique, using MULHS/SMUL_LOHI, as used for non-zero scales. Problem was that when checking for overflow (handling saturation) when not using MULO we did not expect to find a zero scale. So we ended up in an assertion when doing APInt::getLowBitsSet(VTSize, Scale - 1) This patch fixes the problem by adding a new special case for how saturation is computed when scale is zero. Reviewers: RKSimon, bevinh, leonardchan, spatel Reviewed By: RKSimon Subscribers: wuzish, nemanjai, hiraditya, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67071 llvm-svn: 371309	2019-09-07 12:16:23 +00:00
Bjorn Pettersson	5e331e4ce8	[Intrinsic] Add the llvm.umul.fix.sat intrinsic Summary: Add an intrinsic that takes 2 unsigned integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Patch by: leonardchan, bjope Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel Reviewed By: leonardchan Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57836 llvm-svn: 371308	2019-09-07 12:16:14 +00:00
Nikita Popov	314893cc4b	[X86] Fix pshuflw formation from repeated shuffle mask (PR43230) Fix for https://bugs.llvm.org/show_bug.cgi?id=43230. When creating PSHUFLW from a repeated shuffle mask, we have to apply the checks to the repeated mask, not the original one. For the test case from PR43230 the inspected part of the original mask is all undef. Differential Revision: https://reviews.llvm.org/D67314 llvm-svn: 371307	2019-09-07 12:13:44 +00:00
Nikita Popov	fdc6977ff3	[LVI] Look through extractvalue of insertvalue This addresses the issue mentioned on D19867. When we simplify with.overflow instructions in CVP, we leave behind extractvalue of insertvalue sequences that LVI no longer understands. This means that we can not simplify any instructions based on the with.overflow anymore (until some over pass like InstCombine cleans them up). This patch extends LVI extractvalue handling by calling SimplifyExtractValueInst (which doesn't do anything more than constant folding + looking through insertvalue) and using the block value of the simplification. A possible alternative would be to do something similar to SimplifyIndVars, where we instead directly try to replace extractvalue users of the with.overflow. This would need some additional structural changes to CVP, as it's currently not legal to remove anything but the current instruction -- we'd have to introduce a worklist with instructions scheduled for deletion or similar. Differential Revision: https://reviews.llvm.org/D67035 llvm-svn: 371306	2019-09-07 12:03:59 +00:00
Nikita Popov	5d02f259c0	[X86] Add test for PR43230; NFC llvm-svn: 371305	2019-09-07 12:03:48 +00:00
Bjorn Pettersson	2b698a13a1	[DwarfExpression] Disallow some rewrites to avoid undefined behavior Summary: The value operand in DW_OP_plus_uconst/DW_OP_constu value can be large (it uses uint64_t as representation internally in LLVM). This means that in the uint64_t to int conversions, previously done by DwarfExpression::addMachineRegExpression, could lose information. Also, the negation done in "-Offset" was undefined behavior in case Offset was exactly INT_MIN. To avoid the above problems, we now avoid transformation like [Reg, DW_OP_plus_uconst, Offset] --> [DW_OP_breg, Offset] and [Reg, DW_OP_constu, Offset, DW_OP_plus] --> [DW_OP_breg, Offset] when Offset > INT_MAX. And we avoid to transform [Reg, DW_OP_constu, Offset, DW_OP_minus] --> [DW_OP_breg,-Offset] when Offset > INT_MAX+1. The patch also adjusts DwarfCompileUnit::constructVariableDIEImpl to make sure that "DW_OP_constu, Offset, DW_OP_minus" is used instead of "DW_OP_plus_uconst, Offset" when creating DIExpressions with negative frame index offsets. Notice that this might just be the tip of the iceberg. There are lots of fishy handling related to these constants. I think both DIExpression::appendOffset and DIExpression::extractIfOffset may trigger undefined behavior for certain values. Reviewers: sdesmalen, rnk, JDevlieghere Reviewed By: JDevlieghere Subscribers: jholewinski, aprantl, hiraditya, ychen, uabelho, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D67263 llvm-svn: 371304	2019-09-07 11:40:10 +00:00
Bjorn Pettersson	e85acf946d	[DebugInfo] Pre-commit of test case for DW_OP_breg/DW_OP_fbreg folds This currently triggers undefined behavior if executed with an ubsan build. It is just a precommit of the test case to show that we got a problem. Fix is proposed in https://reviews.llvm.org/D67263 and plan is to commit the fix directly after this patch. llvm-svn: 371303	2019-09-07 11:39:57 +00:00
Roman Lebedev	395f254bf0	[SimplifyCFG][NFC] Make merge-cond-stores-cost.ll X86-specific, and rewrite it We clearly perform store-merging, even though div is really costly. llvm-svn: 371300	2019-09-07 10:55:04 +00:00
Roman Lebedev	0ff6d7f305	[SimplifyCFG][NFC] Show that we don't consider the cost when merging cond stores We count instruction count in each BB's separately, not their cost. llvm-svn: 371297	2019-09-07 09:25:26 +00:00
Roman Lebedev	8d3e4d3a4d	[SimplifyCFG][NFC] Regenerate merge-cond-stores* tests llvm-svn: 371296	2019-09-07 09:25:18 +00:00
Hideto Ueno	f2b9dc4758	[Attributor] ValueSimplify Abstract Attribute Summary: This patch introduces initial `AAValueSimplify` which simplifies a value in a context. example - (for function returned) If all the return values are the same and constant, then we can replace callsite returned with the constant. - If an internal function takes the same value(constant) as an argument in the callsite, then we can replace the argument with that constant. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66967 llvm-svn: 371291	2019-09-07 07:03:05 +00:00
Xing GUO	ed20dcb88b	Revert [CodeGen] Fix typos to run tests. NFC. This reverts r371286 (git commit `b38105bbd0`) r371286 caused build bots' failure. I'll check it. llvm-svn: 371289	2019-09-07 05:14:47 +00:00
Xing GUO	b38105bbd0	[CodeGen] Fix typos to run tests. NFC. llvm-svn: 371286	2019-09-07 04:57:53 +00:00
Teresa Johnson	9c27b59cec	Change TargetLibraryInfo analysis passes to always require Function Summary: This is the first change to enable the TLI to be built per-function so that -fno-builtin* handling can be migrated to use function attributes. See discussion on D61634 for background. This is an enabler for fixing handling of these options for LTO, for example. This change should not affect behavior, as the provided function is not yet used to build a specifically per-function TLI, but rather enables that migration. Most of the changes were very mechanical, e.g. passing a Function to the legacy analysis pass's getTLI interface, or in Module level cases, adding a callback. This is similar to the way the per-function TTI analysis works. There was one place where we were looking for builtins but not in the context of a specific function. See FindCXAAtExit in lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround could provide the wrong behavior in some corner cases. Suggestions welcome. Reviewers: chandlerc, hfinkel Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66428 llvm-svn: 371284	2019-09-07 03:09:36 +00:00
Craig Topper	dd507867ef	[X86] Add tests for fp128 frem, sqrt, sin, and cos. llvm-svn: 371283	2019-09-07 01:39:21 +00:00
Craig Topper	2dd5a205e6	[X86] Autogenerate fp128-libcalls.ll llvm-svn: 371282	2019-09-07 01:39:12 +00:00
Amara Emerson	a1cf4d9795	[AArch64][GlobalISel] Enable the localizer for optimized builds. Despite the fact that the localizer's original motivation was to fix horrendous constant spilling at -O0, shortening live ranges still has net benefits even with optimizations enabled. On an -Os build of CTMark, doing this improves code size by 0.5% geomean. There are a few regressions, bullet increasing in size by 0.5%. One example from bullet where code size increased slightly was due to GlobalISel actually now generating the same code as SelectionDAG. So we actually have an opportunity in future to implement better heuristics for localization and therefore be better than SDAG in some cases. In relation to other optimizations though that one is relatively minor. Differential Revision: https://reviews.llvm.org/D67303 llvm-svn: 371266	2019-09-06 22:27:09 +00:00
Craig Topper	03936cb0f9	[X86] Add a AVX512VBMI command line to min-legal-vector-width.ll. Always enable fast-variable-shuffle Trying to minimize the features we need to manipulate when this is updated for D67259. The VBMI is interesting because it enables some improved combining for truncates. I enabled fast-variable-shuffle because all the CPUs we're going to add implicitly enable it. So they can share check lines. llvm-svn: 371261	2019-09-06 21:49:01 +00:00
Craig Topper	a31112e357	[X86] Replace -mcpu with -mattr on some tests. llvm-svn: 371260	2019-09-06 21:48:44 +00:00
Matt Arsenault	cf10372119	GlobalISel: Add G_FMAD instruction llvm-svn: 371254	2019-09-06 20:49:10 +00:00
Matt Arsenault	3e45c70288	GlobalISel: Support physical register inputs in patterns llvm-svn: 371253	2019-09-06 20:32:37 +00:00
Puyan Lotfi	5476bd9432	[llvm-ifs] Improving detection of PlatformKind from triple for TBD generation. It was pointed out that I had hard-coded PlatformKind. This is rectifying that. Differential Revision: https://reviews.llvm.org/D67255 llvm-svn: 371248	2019-09-06 19:59:59 +00:00
Sean Fertile	eaf34a983c	[PowerPC][XCOFF] Remove basic test. [NFC] Test verified that we could compile an empty module and produce an XCOFF object file. Newer tests superssed this coverage, its safe to remove. llvm-svn: 371247	2019-09-06 19:55:44 +00:00
Evandro Menezes	88a98ea3f7	[ConstantFolding] Add new test cases for transcendentals (NFC) llvm-svn: 371246	2019-09-06 19:41:49 +00:00
Craig Topper	7bb433c87b	[X86] Use MOVSX by default instead of CBW to extend i8 to AX for i8 sdivrem. We can use a MOVSX16 here then rely on FixupBWInst to change to MOVSX32 if the upper bits are dead. With a special case to not promote if it could be turned into CBW. Then we can rely on X86MCInstLower to turn the MOVSX into CBW very late if register allocation worked out. Using MOVSX gives an opportunity to use the MOVSX as a both a copy and a sign extend since the input and output register aren't tied together. Differential Revision: https://reviews.llvm.org/D67192 llvm-svn: 371243	2019-09-06 19:17:02 +00:00
Craig Topper	22b35c4291	[X86] Use MOVZX16rr8/MOVZXrm8 when extending input for i8 udivrem. We can rely on X86FixupBWInsts to turn these into MOVZX32. This simplifies a follow up commit to use MOVSX for i8 sdivrem with a late optimization to use CBW when register allocation works out. llvm-svn: 371242	2019-09-06 19:15:04 +00:00
Craig Topper	0364d89b6d	[X86] Teach FixupBWInsts to turn MOVSX16rr8/MOVZX16rr8/MOVSX16rm8/MOVZX16rm8 into their 32-bit dest equivalents when the upper part of the register is dead. llvm-svn: 371240	2019-09-06 19:14:49 +00:00
Sean Fertile	74966aca35	[PowerPC][XCOFF] Verify symbol table in xcoff object files. [NFC] Extend the common/local-common testing for object files to also verify the symbol table now that the needed functionality has landed in llvm-readobj. Differential Revision: https://reviews.llvm.org/D66944 llvm-svn: 371237	2019-09-06 18:56:14 +00:00
Oliver Cruickshank	a050307c05	[ARM] Add patterns for VSUB with q and r registers Added patterns for VSUB to support q and r registers, which reduces pressure on q registers. llvm-svn: 371231	2019-09-06 17:02:42 +00:00
Oliver Cruickshank	3aed95af4e	[ARM] Add patterns for VADD with q and r registers Added support for VADD to use q and r registers, which reduces pressure on q registers. llvm-svn: 371230	2019-09-06 17:02:35 +00:00
Oliver Cruickshank	9bf27928e1	[ARM] Add patterns for VMUL with q and r registers Added support for VMUL to use an r register, this reduces pressure on the q registers. llvm-svn: 371229	2019-09-06 17:02:21 +00:00
Jessica Paquette	121d9114f5	[AArch64][GlobalISel] Always fall back on tail calls with -tailcallopt -tailcallopt requires that we perform different stack adjustments than with sibling calls. For example, the `@caller_to0_from8` function in test/CodeGen/AArch64/tail-call.ll requires that we adjust SP. Without -tailcallopt, this adjustment does not happen. With it, however, it is expected. So, to ensure that adding sibling call support doesn't break -tailcallopt, make CallLowering always fall back on possible tail calls when -tailcallopt is passed in. Update test/CodeGen/AArch64/tail-call.ll with a GlobalISel line to make sure that we don't differ from the SDAG implementation at any point. Differential Revision: https://reviews.llvm.org/D67245 llvm-svn: 371227	2019-09-06 16:49:13 +00:00
JF Bastien	52614dfc7f	[InstCombine] pow(x, +/- 0.0) -> 1.0 Summary: This isn't an important optimization at all... We're already doing: pow(x, 0.0) -> 1.0 My patch merely teaches instcombine that -0.0 does the same. However, doing this fixes an AMAZING bug! Compile this program: extern "C" double pow(double, double); double boom(double base) { return pow(base, -0.0); } With: clang++ ~/Desktop/fast-math.cpp -ffast-math -O2 -S And clang will crash with a signal. Wow, fast math is so fast it ICEs the compiler! Arguably, the generated math is infinitely fast. What's actually happening is that we recurse infinitely in getPow. In debug we hit its assertion: assert(Exp != 0 && "Incorrect exponent 0 not handled"); We avoid this entire mess if we instead recognize that an exponent of positive and negative zero yield 1.0. A separate commit, r371221, fixed the same problem. This only contains the added tests. <rdar://problem/54598300> Reviewers: scanon Subscribers: hiraditya, jkorous, dexonsmith, ributzka, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67248 llvm-svn: 371224	2019-09-06 16:26:59 +00:00
Sanjay Patel	4f0e429acc	[SimplifyLibCalls] handle pow(x,-0.0) before it can assert (PR43233) https://bugs.llvm.org/show_bug.cgi?id=43233 llvm-svn: 371221	2019-09-06 16:10:18 +00:00
Sam Tebbs	f1cdd95a2f	[ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selection This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together. This is useful for various MVE instructions, such as vmla and others that take R registers. Loop tests have been added to the vmla test file to make sure vmlas are generated in loops. Differential revision: https://reviews.llvm.org/D66295 llvm-svn: 371218	2019-09-06 16:01:32 +00:00
Valery Pykhtin	e8ade89bb3	[AMDGPU] Enable constant offset promotion to immediate operand for VMEM stores Differential revision: https://reviews.llvm.org/D66958 llvm-svn: 371214	2019-09-06 15:33:53 +00:00
George Rimar	edfd276cbc	[llvm-readelf] - Print unknown st_other value if present in GNU output. This is a fix for https://bugs.llvm.org/show_bug.cgi?id=40785. llvm-readelf does not print the st_value of the symbol when st_value has any non-visibility bits set. This patch: * Aligns "Ndx" row for the default and a new cases. (it was 1 space character off for the case when "PROTECTED" visibility was printed) * Prints "[<other>: 0x??]" for symbols which has an additional st_other bits set. In compare with GNU, this logic is a bit simpler and seems to be more consistent. For MIPS GNU can print named flags, though can't print a mix of them: 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00000000 0 NOTYPE GLOBAL DEFAULT [OPTIONAL] UND a1 2: 00000000 0 NOTYPE GLOBAL DEFAULT [MIPS PLT] UND a2 3: 00000000 0 NOTYPE GLOBAL DEFAULT [MIPS PIC] UND a3 4: 00000000 0 NOTYPE GLOBAL DEFAULT [MICROMIPS] UND a4 5: 00000000 0 NOTYPE GLOBAL DEFAULT [MIPS16] UND a5 6: 00000000 0 NOTYPE GLOBAL DEFAULT [<other>: c] UND b1 7: 00000000 0 NOTYPE GLOBAL DEFAULT [<other>: 28] UND b2 On PPC64 it can print a localentry value that is encoded in the high bits of st_other 63: 0000000000000850 208 FUNC GLOBAL DEFAULT [<localentry>: 8] 12 We chose to print the raw st_other field, prefixed with '0x'. Differential revision: https://reviews.llvm.org/D67094 llvm-svn: 371201	2019-09-06 13:05:34 +00:00
Djordje Todorovic	d409408e31	[test] Update the name of the debug entry values option. NFC llvm-svn: 371199	2019-09-06 12:23:37 +00:00
James Molloy	db2fa06722	[DFAPacketizer] Track resources for packetized instructions This patch allows the DFAPacketizer to be queried after a packet is formed to work out which resources were allocated to the packetized instructions. This is particularly important for targets that do their own bundle packing - it's not sufficient to know simply that instructions can share a packet; which slots are used is also required for encoding. This extends the emitter to emit a side-table containing resource usage diffs for each state transition. The packetizer maintains a set of all possible resource states in its current state. After packetization is complete, all remaining resource states are possible packetization strategies. The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default (most uses of the packetizer like MachinePipeliner don't care and don't need the extra maintained state). Differential Revision: https://reviews.llvm.org/D66936 llvm-svn: 371198	2019-09-06 12:20:08 +00:00
Jeremy Morse	5d9cd3b4ca	[DebugInfo] LiveDebugValues: explicitly terminate overwritten stack locations If a stack spill location is overwritten by another spill instruction, any variable locations pointing at that slot should be terminated. We cannot rely on spills always being restored to registers or variable locations being moved by a DBG_VALUE: the register allocator is entitled to spill a value and then forget about it when it goes out of liveness. To address this, scan for memory writes to spill locations, even those we don't consider to be normal "spills". isSpillInstruction and isLocationSpill distinguish the two now. After identifying spill overwrites, terminate the open range, and insert a $noreg DBG_VALUE for that variable. Differential Revision: https://reviews.llvm.org/D66941 llvm-svn: 371193	2019-09-06 10:08:22 +00:00
Jay Foad	6c0204c794	[AMDGPU] Mark s_barrier as having side effects but not accessing memory. Summary: This fixes poor scheduling in a function containing a barrier and a few load instructions. Without this fix, ScheduleDAGInstrs::buildSchedGraph adds an artificial edge in the dependency graph from the barrier instruction to the exit node representing live-out latency, with a latency of about 500 cycles. Because of this it thinks the critical path through the graph also has a latency of about 500 cycles. And because of that it does not think that any of the load instructions are on the critical path, so it schedules them with no regard for their (80 cycle) latency, which gives poor results. Reviewers: arsenm, dstuttard, tpr, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67218 llvm-svn: 371192	2019-09-06 10:07:28 +00:00
Sam Parker	29bf68fcfa	[ARM] Fix for buildbot llvm-svn: 371187	2019-09-06 09:36:23 +00:00
Fangrui Song	d20c41dd31	[yaml2obj] Rename SHOffset (e_shoff) field to SHOff. NFC `struct Elf*_Shdr` has a field `sh_offset`, named `ShOffset` in llvm::ELFYAML::Section. Rename SHOffset (e_shoff) to SHOff to prevent confusion. Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D67254 llvm-svn: 371185	2019-09-06 09:23:17 +00:00
Sam Parker	312409e464	[ARM] MVE Tail Predication The MVE and LOB extensions of Armv8.1m can be combined to enable 'tail predication' which removes the need for a scalar remainder loop after vectorization. Lane predication is performed implicitly via a system register. The effects of predication is described in Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points being: - For vector operations that perform reduction across the vector and produce a scalar result, whether the value is accumulated or not. - For non-load instructions, the predicate flags determine if the destination register byte is updated with the new value or if the previous value is preserved. - For vector store instructions, whether the store occurs or not. - For vector load instructions, whether the value that is loaded or whether zeros are written to that element of the destination register. This patch implements a pass that takes a hardware loop, containing masked vector instructions, and converts it something that resembles an MVE tail predicated loop. Currently, if we had code generation, we'd generate a loop in which the VCTP would generate the predicate and VPST would then setup the value of VPR.PO. The loads and stores would be placed in VPT blocks so this is not tail predication, but normal VPT predication with the predicate based upon a element counting induction variable. Further work needs to be done to finally produce a true tail predicated loop. Because only the loads and stores are predicated, in both the LLVM IR and MIR level, we will restrict support to only lane-wise operations (no horizontal reductions). We will perform a final check on MIR during loop finalisation too. Another restriction, specific to MVE, is that all the vector instructions need operate on the same number of elements. This is because predication is performed at the byte level and this is set on entry to the loop, or by the VCTP instead. Differential Revision: https://reviews.llvm.org/D65884 llvm-svn: 371179	2019-09-06 08:24:41 +00:00
Kang Zhang	f879c68755	[CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks Summary: Fix a bug of not update the jump table and recommit it again. In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun. But the `early-ret` pass is before `block-placement`, we don't want to run it again. This patch is to do the simple early return to optimize the blocks at the last of `block-placement`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D63972 llvm-svn: 371177	2019-09-06 08:16:18 +00:00
Mikael Holmen	dee0702b2a	[MIR] Change test case to read from stdin instead of file The ;CHECK: bb ;CHECK-NEXT: %namedVReg1353:_(p0) = COPY $d0 parts of the test case failed when the tests were placed in a directory including "bb" in the path, since the full path of the file is then output in the ; ModuleID = '/repo/bb/ line which the CHECK matched on and then the CHECK-NEXT failed. llvm-svn: 371171	2019-09-06 06:55:54 +00:00
Craig Topper	463c8e5eeb	[X86] Add tests for extending and truncating between v16i8 and v16i64 with min-legal-vector-width=256. It looks like we might be able to do these in fewer steps, but I'm not sure. llvm-svn: 371170	2019-09-06 06:02:17 +00:00
Alex Brachet	dfacf8851e	Fix rL371162 again llvm-svn: 371164	2019-09-06 03:31:42 +00:00
Alex Brachet	27d42af603	Fix failing test from rL371162 llvm-svn: 371163	2019-09-06 02:56:48 +00:00
Alex Brachet	0b69c59656	[yaml2obj] Make e_phoff and e_phentsize 0 if there are no program headers Summary: It says [[ http://www.sco.com/developers/gabi/latest/ch4.eheader.html \| here ]] that if there are no program headers than e_phoff should be 0, but currently it is always set after the header. GNU's `readelf` (but not `llvm-readelf`) complains about this: `readelf: Warning: possibly corrupt ELF header - it has a non-zero program header offset, but no program headers`. Reviewers: jhenderson, grimar, MaskRay, rupprecht Reviewed By: jhenderson, grimar, MaskRay Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67054 llvm-svn: 371162	2019-09-06 02:27:55 +00:00
Alina Sbirlea	57fcb1d7fc	Cleanup test. llvm-svn: 371158	2019-09-06 00:58:03 +00:00
Fangrui Song	9d2504b6d8	[llvm-readobj][yaml2obj] Support SHT_LLVM_SYMPART, SHT_LLVM_PART_EHDR and SHT_LLVM_PART_PHDR See http://lists.llvm.org/pipermail/llvm-dev/2019-February/130583.html and D60242 for the lld partition feature. This patch: * Teaches yaml2obj to parse the 3 section types. * Teaches llvm-readobj/llvm-readelf to dump the 3 section types. There is no test for SHT_LLVM_DEPENDENT_LIBRARIES in llvm-readobj. Add it as well. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D67228 llvm-svn: 371157	2019-09-06 00:53:28 +00:00
Matt Arsenault	4d90625271	AMDGPU/GlobalISel: Fix load/store of types in other address spaces There should probably be a size only matcher. llvm-svn: 371155	2019-09-06 00:36:06 +00:00
Matt Arsenault	9ceb6edf11	GlobalISel/TableGen: Fix handling of EXTRACT_SUBREG constraints This was only using the correct register constraints if this was the final result instruction. If the extract was a sub instruction of the result, it would attempt to use GIR_ConstrainSelectedInstOperands on a COPY, which won't work. Move the handling to createAndImportSubInstructionRenderer so it works correctly. I don't fully understand why runOnPattern and createAndImportSubInstructionRenderer both need to handle these special cases, and constrain them with slightly different methods. If I remove the runOnPattern handling, it does break the constraint when the final result instruction is EXTRACT_SUBREG. llvm-svn: 371150	2019-09-06 00:05:58 +00:00
Matt Arsenault	60c8b8bcf2	AMDGPU: Allow getMemOperandWithOffset to analyze stack accesses Report soffset as a base register if the scratch resource can be ignored. llvm-svn: 371149	2019-09-05 23:54:35 +00:00
Matt Arsenault	59ff77ee38	AMDGPU: Fix emitting multiple stack loads for stack passed workitems The same stack is loaded for each workitem ID, and each use. Nothing prevents you from creating multiple fixed stack objects with the same offsets, so this was creating a load for each unique frame index, despite them being the same offset. Re-use the same frame index so the loads are CSEable. llvm-svn: 371148	2019-09-05 23:40:14 +00:00
Eli Friedman	9dd453ce8d	[AArch64] Add testcase for codegen for sdiv by 2. llvm-svn: 371147	2019-09-05 23:40:03 +00:00
Matt Arsenault	524a9d5774	InstCombine: Fix crash on icmp of gep with addrspacecasted null llvm-svn: 371146	2019-09-05 23:39:21 +00:00
David Blaikie	707be7ef9c	llvm-reduce: Use %python from lit to get the correct/valid python binary for the reduction script llvm-svn: 371143	2019-09-05 23:33:44 +00:00
Alina Sbirlea	35548e80d6	[AliasSetTracker] Correct AAInfo check. Properly check if NewAAInfo conflicts with AAInfo. Update local variable and alias set that a change occured when a conflict is found. Resolves PR42969. llvm-svn: 371139	2019-09-05 23:00:36 +00:00
Vitaly Buka	9020f11377	[SimplifyCFG] Don't SimplifyBranchOnICmpChain with ExtraCase Summary: Here we try to avoid issues with "explicit branch" with SimplifyBranchOnICmpChain which can check on undef. Msan by design reports branches on uninitialized memory and undefs, so we have false report here. In general msan does not like when we convert ``` // If at least one of them is true we can MSAN is ok if another is undefs if (a \|\| b) return; ``` into ``` // If 'a' is undef MSAN will complain even if 'b' is true if (a) return; if (b) return; ``` Example Before optimization we had something like this: ``` while (true) { bool maybe_undef = doStuff(); while (true) { char c = getChar(); if (c != 10 && c != 13) continue break; } // we know that c == 10 \|\| c == 13 if we get here, // so msan know that branch is not affected by maybe_undef if (maybe_undef \|\| c == 10 \|\| c == 13) continue; return; } ``` SimplifyBranchOnICmpChain will convert that into ``` while (true) { bool maybe_undef = doStuff(); while (true) { char c = getChar(); if (c != 10 && c != 13) continue; break; } // however msan will complain here: if (maybe_undef) continue; // we know that c == 10 \|\| c == 13, so either way we will get continue switch(c) { case 10: continue; case 13: continue; } return; } ``` Reviewers: eugenis, efriedma Reviewed By: eugenis, efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67205 llvm-svn: 371138	2019-09-05 22:49:34 +00:00
Puyan Lotfi	dc97ca9f25	[MIR] MIRNamer pass for improving MIR test authoring experience. This patch reuses the MIR vreg renamer from the MIRCanonicalizerPass to cleanup names of vregs in a MIR file for MIR test authors. I found it useful when writing a regression test for a globalisel failure I encountered recently and thought it might be useful for other folks as well. Differential Revision: https://reviews.llvm.org/D67209 llvm-svn: 371121	2019-09-05 20:44:33 +00:00
Jessica Paquette	20e8667098	Recommit "[AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling calls" Recommit basic sibling call lowering (https://reviews.llvm.org/D67189) The issue was that if you have a return type other than void, call lowering will emit COPYs to get the return value after the call. Disallow sibling calls other than ones that return void for now. Also proactively disable swifterror tail calls for now, since there's a similar issue with COPYs there. Update call-translator-tail-call.ll to include test cases for each of these things. llvm-svn: 371114	2019-09-05 20:18:34 +00:00
Eli Friedman	cae1e47f6e	[IfConversion] Fix diamond conversion with unanalyzable branches. The code was incorrectly counting the number of identical instructions, and therefore tried to predicate an instruction which should not have been predicated. This could have various effects: a compiler crash, an assembler failure, a miscompile, or just generating an extra, unnecessary instruction. Instead of depending on TargetInstrInfo::removeBranch, which only works on analyzable branches, just remove all branch instructions. Fixes https://bugs.llvm.org/show_bug.cgi?id=43121 and https://bugs.llvm.org/show_bug.cgi?id=41121 . Differential Revision: https://reviews.llvm.org/D67203 llvm-svn: 371111	2019-09-05 20:02:38 +00:00
Roman Lebedev	071ce66729	[NFC][InstCombine] Overhaul 'unsigned add overflow' tests, ensure that all 3 patterns have full test coverage llvm-svn: 371108	2019-09-05 19:13:15 +00:00
Craig Topper	0fde412140	[X86] Enable BuildSDIVPow2 for i16. We're able to use a 32-bit ADD and CMOV here and should work well with our other i16->i32 promotion optimizations. llvm-svn: 371107	2019-09-05 18:49:52 +00:00
Craig Topper	b8d6ba3ca2	[X86] Override BuildSDIVPow2 for X86. As noted in PR43197, we can use test+add+cmov+sra to implement signed division by a power of 2. This is based off the similar version in AArch64, but I've adjusted it to use target independent nodes where AArch64 uses target specific CMP and CSEL nodes. I've also blocked INT_MIN as the transform isn't valid for that. I've limited this to i32 and i64 on 64-bit targets for now and only when CMOV is supported. i8 and i16 need further investigation to be sure they get promoted to i32 well. I adjusted a few tests to enable cmov to demonstrate the new codegen. I also changed twoaddr-coalesce-3.ll to 32-bit mode without cmov to avoid perturbing the scenario that is being set up there. Differential Revision: https://reviews.llvm.org/D67087 llvm-svn: 371104	2019-09-05 18:15:07 +00:00
Roman Lebedev	8360c42e25	[InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned sub overflow' check A follow-up for r329011. This may be changed to produce @llvm.sub.with.overflow in a later patch, but for now just make things more consistent overall. A few observations stem from this: * There does not seem to be a similar one-instruction fold for uadd-overflow * I'm not sure we'll want to canonicalize `B u> A` as `usub.with.overflow`, so since the `icmp` here no longer refers to `sub`, reconstructing `usub.with.overflow` will be problematic, and will likely require standalone pass (similar to DivRemPairs). https://rise4fun.com/Alive/Zqs Name: (A - B) u> A --> B u> A %t0 = sub i8 %A, %B %r = icmp ugt i8 %t0, %A => %r = icmp ugt i8 %B, %A Name: (A - B) u<= A --> B u<= A %t0 = sub i8 %A, %B %r = icmp ule i8 %t0, %A => %r = icmp ule i8 %B, %A Name: C u< (C - D) --> C u< D %t0 = sub i8 %C, %D %r = icmp ult i8 %C, %t0 => %r = icmp ult i8 %C, %D Name: C u>= (C - D) --> C u>= D %t0 = sub i8 %C, %D %r = icmp uge i8 %C, %t0 => %r = icmp uge i8 %C, %D llvm-svn: 371101	2019-09-05 17:41:02 +00:00
Roman Lebedev	ecb7ea1ae7	[InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned add overflow' check A follow-up for r342004. This will be changed to produce @llvm.add.with.overflow in a later patch, but for now just make things more consistent overall. https://rise4fun.com/Alive/qxE Name: (Op1 + X) u< Op1 --> ~Op1 u< X %t0 = add i8 %Op1, %X %r = icmp ult i8 %t0, %Op1 => %n = xor i8 %Op1, -1 %r = icmp ult i8 %n, %X Name: (Op1 + X) u>= Op1 --> ~Op1 u>= X %t0 = add i8 %Op1, %X %r = icmp uge i8 %t0, %Op1 => %n = xor i8 %Op1, -1 %r = icmp uge i8 %n, %X ;------------------------------------------------------------------------------- Name: Op0 u> (Op0 + X) --> X u> ~Op0 %t0 = add i8 %Op0, %X %r = icmp ugt i8 %Op0, %t0 => %n = xor i8 %Op0, -1 %r = icmp ugt i8 %X, %n Name: Op0 u<= (Op0 + X) --> X u<= ~Op0 %t0 = add i8 %Op0, %X %r = icmp ule i8 %Op0, %t0 => %n = xor i8 %Op0, -1 %r = icmp ule i8 %X, %n llvm-svn: 371100	2019-09-05 17:40:49 +00:00
Roman Lebedev	1d9e0dcc9d	[InstCombine][NFC] Tests for 'unsigned sub overflow' check ---------------------------------------- Name: unsigned sub, overflow, v0 %sub = sub i8 %x, %y %ov = icmp ugt i8 %sub, %x => %agg = usub_overflow i8 %x, %y %sub = extractvalue {i8, i1} %agg, 0 %ov = extractvalue {i8, i1} %agg, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: unsigned sub, no overflow, v0 %sub = sub i8 %x, %y %ov = icmp ule i8 %sub, %x => %agg = usub_overflow i8 %x, %y %sub = extractvalue {i8, i1} %agg, 0 %not.ov = extractvalue {i8, i1} %agg, 1 %ov = xor %not.ov, -1 Done: 1 Optimization is correct! llvm-svn: 371099	2019-09-05 17:40:37 +00:00
Roman Lebedev	745046c23f	[InstCombine][NFC] Tests for 'unsigned add overflow' check ---------------------------------------- Name: unsigned add, overflow, v0 %add = add i8 %x, %y %ov = icmp ult i8 %add, %x => %agg = uadd_overflow i8 %x, %y %add = extractvalue {i8, i1} %agg, 0 %ov = extractvalue {i8, i1} %agg, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: unsigned add, overflow, v1 %add = add i8 %x, %y %ov = icmp ult i8 %add, %y => %agg = uadd_overflow i8 %x, %y %add = extractvalue {i8, i1} %agg, 0 %ov = extractvalue {i8, i1} %agg, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: unsigned add, no overflow, v0 %add = add i8 %x, %y %ov = icmp uge i8 %add, %x => %agg = uadd_overflow i8 %x, %y %add = extractvalue {i8, i1} %agg, 0 %not.ov = extractvalue {i8, i1} %agg, 1 %ov = xor %not.ov, -1 Done: 1 Optimization is correct! ---------------------------------------- Name: unsigned add, no overflow, v1 %add = add i8 %x, %y %ov = icmp uge i8 %add, %y => %agg = uadd_overflow i8 %x, %y %add = extractvalue {i8, i1} %agg, 0 %not.ov = extractvalue {i8, i1} %agg, 1 %ov = xor %not.ov, -1 Done: 1 Optimization is correct! llvm-svn: 371098	2019-09-05 17:40:28 +00:00
Sanjay Patel	10412a69f9	[x86] fix horizontal math bug exposed by improved demanded elements analysis (PR43225) https://bugs.llvm.org/show_bug.cgi?id=43225 llvm-svn: 371095	2019-09-05 17:28:17 +00:00
Craig Topper	673da001c5	[X86] Remove unneeded CHECK lines from a test. NFC llvm-svn: 371093	2019-09-05 17:24:25 +00:00
Denis Bakhvalov	58f172f05a	[MergedLoadStoreMotion] Sink stores to BB with more than 2 predecessors If we have: bb5: br i1 %arg3, label %bb6, label %bb7 bb6: %tmp = getelementptr inbounds i32, i32* %arg1, i64 2 store i32 3, i32* %tmp, align 4 br label %bb9 bb7: %tmp8 = getelementptr inbounds i32, i32* %arg1, i64 2 store i32 3, i32* %tmp8, align 4 br label %bb9 bb9: ; preds = %bb4, %bb6, %bb7 ... We can't sink stores directly into bb9. This patch creates new BB that is successor of %bb6 and %bb7 and sinks stores into that block. SplitFooterBB is the parameter to the pass that controls that behavior. Change-Id: I7fdf50a772b84633e4b1b860e905bf7e3e29940f Differential: https://reviews.llvm.org/D66234 llvm-svn: 371089	2019-09-05 17:00:32 +00:00
Sanjay Patel	3856512334	[x86] add test for horizontal math bug (PR43225); NFC llvm-svn: 371088	2019-09-05 16:58:18 +00:00
Hiroshi Yamauchi	d842f2eec4	[PGO][CHR] Speed up following long, interlinked use-def chains. Summary: Avoid visiting an instruction more than once by using a map. This is similar to https://reviews.llvm.org/rL361416. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67198 llvm-svn: 371086	2019-09-05 16:56:55 +00:00
Alina Sbirlea	ae900d3882	[MemorySSA] Update MemorySSA when removing debug.value calls. llvm-svn: 371084	2019-09-05 16:25:24 +00:00
Krzysztof Parzyszek	0ce93194fe	[Hexagon] Fix type in HexagonTargetLowering::ReplaceNodeResults llvm-svn: 371083	2019-09-05 16:19:47 +00:00
Simon Pilgrim	29361c704d	[X86][SSE] EltsFromConsecutiveLoads - ignore non-zero offset base loads (PR43227) As discussed on D64551 and PR43227, we don't correctly handle cases where the base load has a non-zero byte offset. Until we can properly handle this, we must bail from EltsFromConsecutiveLoads. llvm-svn: 371078	2019-09-05 15:07:07 +00:00
Fangrui Song	c3bc697974	[yaml2obj] Write the section header table after section contents Linkers (ld.bfd/gold/lld) place the section header table at the very end. This allows tools to strip it, which is optional in executable/shared objects. In addition, if we add or section, the size of the section header table will change. Placing the section header table in the end keeps section offsets unchanged. yaml2obj currently places the section header table immediately after the program header. Follow what linkers do to make offset updating easier. Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D67221 llvm-svn: 371074	2019-09-05 14:25:57 +00:00
George Rimar	4e14bf71b7	[llvm-readelf] - Allow dumping dynamic symbols when there is no program headers. D62179 introduced a regression. llvm-readelf lose the ability to dump the dynamic symbols when there is .dynamic section with a DT_SYMTAB, but there are no program headers: https://reviews.llvm.org/D62179#1652778 Below is a program flow before the D62179 change: 1) Find SHT_DYNSYM. 2) Find there is no PT_DYNAMIC => don't try to parse it. 3) Print dynamic symbols using information about them found on step (1). And after the change it became: 1) Find SHT_DYNSYM. 2) Find there is no PT_DYNAMIC => find SHT_DYNAMIC. 3) Parse dynamic table, but fail to handle the DT_SYMTAB because of the absence of the PT_LOAD. Report the "Virtual address is not in any segment" error. This patch fixes the issue. For doing this it checks that the value of DT_SYMTAB was mapped to a segment. If not - it ignores it. Differential revision: https://reviews.llvm.org/D67078 llvm-svn: 371071	2019-09-05 14:02:58 +00:00
David Green	83a3341246	[ARM] Fixup the creation of VPT blocks This attempts to just fix the creation of VPT blocks, fixing up the iterating, which instructions are considered in the bundle, and making sure that we do not overrun the end of the block. Differential Revision: https://reviews.llvm.org/D67219 llvm-svn: 371064	2019-09-05 13:37:04 +00:00
Simon Pilgrim	215910eeb2	[X86][SSE] Add (failing) test case for PR43227 llvm-svn: 371061	2019-09-05 12:36:11 +00:00
Petar Avramovic	a4bfc8dfda	[MIPS GlobalISel] Select G_FENCE G_FENCE comes form fence instruction. For MIPS fence is generated in AtomicExpandPass when atomic instruction gets surrounded with fence instruction when needed. G_FENCE arguments don't have LLT, because of that there is no job for legalizer and regbankselect. Instruction select G_FENCE for MIPS32. Differential Revision: https://reviews.llvm.org/D67181 llvm-svn: 371056	2019-09-05 11:20:32 +00:00
Petar Avramovic	f5c7fe0795	[MIPS GlobalISel] Select llvm.trap intrinsic Select G_INTRINSIC_W_SIDE_EFFECTS for Intrinsic::trap for MIPS32 via legalizeIntrinsic. Differential Revision: https://reviews.llvm.org/D67180 llvm-svn: 371055	2019-09-05 11:16:37 +00:00
Petar Avramovic	d2574d79b6	[MIPS GlobalISel] Lower SRet pointer arguments Instead of returning structure by value clang usually adds pointer to that structure as an argument. Pointers don't require special handling no matter the SRet flag. Remove unsuccessful exit from lowerCall for arguments with SRet flag if they are pointers. Differential Revision: https://reviews.llvm.org/D67179 llvm-svn: 371054	2019-09-05 11:12:01 +00:00
Simon Pilgrim	071287c5a9	Revert rL370996 from llvm/trunk: [AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling calls This adds support for basic sibling call lowering in AArch64. The intent here is to only handle tail calls which do not change the ABI (hence, sibling calls.) At this point, it is very restricted. It does not handle - Vararg calls. - Calls with outgoing arguments. - Calls whose calling conventions differ from the caller's calling convention. - Tail/sibling calls with BTI enabled. This patch adds - `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent to the same function in AArch64ISelLowering.cpp (albeit with the restrictions above.) - `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in AArch64ISelLowering.cpp. - `getCallOpcode`, which is exactly what it sounds like. Tail/sibling calls are lowered by checking if they pass target-independent tail call positioning checks, and checking if they satisfy `isEligibleForTailCallOptimization`. If they do, then a tail call instruction is emitted instead of a normal call. If we have a sibling call (which is always the case in this patch), then we do not emit any stack adjustment operations. When we go to lower a return, we check if we've already emitted a tail call. If so, then we skip the return lowering. For testing, this patch - Adds call-translator-tail-call.ll to test which tail calls we currently lower, which ones we don't, and which ones we shouldn't. - Updates branch-target-enforcement-indirect-calls.ll to show that we fall back as expected. Differential Revision: https://reviews.llvm.org/D67189 ........ This fails on EXPENSIVE_CHECKS builds due to a -verify-machineinstrs test failure in CodeGen/AArch64/dllimport.ll llvm-svn: 371051	2019-09-05 10:38:39 +00:00
Jonas Paulsson	821858780e	[SystemZ] Recognize INLINEASM_BR in backend Handle the remaining cases also by handling asm goto in SystemZInstrInfo::getBranchInfo(). Review: Ulrich Weigand https://reviews.llvm.org/D67151 llvm-svn: 371048	2019-09-05 10:20:05 +00:00
Guillaume Chatelet	aff45e4b23	[LLVM][Alignment] Make functions using log of alignment explicit Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045	2019-09-05 10:00:22 +00:00
George Rimar	e7b4d20998	Recommit r371023 "[lib/ObjectYAML] - Stop calling error(1) when mapping the st_other field of a symbol." Fix: added missing return "return 0;" Original commit message: This eliminates one of the error(1) call in this lib. It is different from the others because happens on a fields mapping stage and can be easily fixed. Differential revision: https://reviews.llvm.org/D67150 llvm-svn: 371030	2019-09-05 08:52:26 +00:00
George Rimar	7f1f50de41	Revert r371023 "[lib/ObjectYAML] - Stop calling error(1) when mapping the st_other field of a symbol." It broke BBots: http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/36387/steps/build_Lld/logs/stdio http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/17117/steps/test/logs/stdio llvm-svn: 371024	2019-09-05 08:38:29 +00:00
George Rimar	2c9c432256	[lib/ObjectYAML] - Stop calling error(1) when mapping the st_other field of a symbol. This eliminates one of the error(1) call in this lib. It is different from the others because happens on a fields mapping stage and can be easily fixed. Differential revision: https://reviews.llvm.org/D67150 llvm-svn: 371023	2019-09-05 08:28:43 +00:00
Igor Kudrin	e46639620d	[DWARF] Fix referencing Range List Tables from CUs for DWARF64. As DW_AT_rnglists_base points after the header and headers have different sizes for DWARF32 and DWARF64, we have to use the format of the CU to adjust the offset correctly in order to extract the referenced range list table. The patch also changes the type of RangeSectionBase because in DWARF64 it is 8-bytes long. Differential Revision: https://reviews.llvm.org/D67098 llvm-svn: 371016	2019-09-05 07:02:28 +00:00
Igor Kudrin	991f0fb149	[DWARF] Support DWARF64 in DWARFListTableHeader. This enables 64-bit DWARF support for parsing range and location list tables. Differential Revision: https://reviews.llvm.org/D66643 llvm-svn: 371014	2019-09-05 06:49:05 +00:00
Matt Arsenault	f581d575ce	AMDGPU: Add intrinsics for address space identification The library currently uses ptrtoint and directly checks the queue ptr for this, which counts as a pointer capture. llvm-svn: 371009	2019-09-05 02:20:39 +00:00
Matt Arsenault	69b1a2ae65	AMDGPU/GlobalISel: Restore insert point when getting aperture Avoids SSA violations in a future patch. llvm-svn: 371008	2019-09-05 02:20:32 +00:00
Matt Arsenault	25156ae7ea	AMDGPU/GlobalISel: Fix placeholder value used for addrspacecast llvm-svn: 371007	2019-09-05 02:20:29 +00:00
Matt Arsenault	d51a3746d0	AMDGPU/GlobalISel: Fix assert on load from constant address llvm-svn: 371006	2019-09-05 02:20:25 +00:00
Puyan Lotfi	6d3ea2d9b6	[mir-canon][NFC] Adding -verify-machineinstrs to mir-canon tests. In the review process for some of the refactoring of MIRCanonicalizationPass it was noted that some of the tests didn't have verifier enabled. Enabling here. llvm-svn: 371005	2019-09-05 02:10:41 +00:00
Reid Kleckner	29ccc8523a	Use -mtriple to fix AMDGPU test sensitive to object file format GOTPCREL32 doesn't exist on COFF, so it isn't used when this test runs on Windows. llvm-svn: 371000	2019-09-05 00:34:01 +00:00
Jessica Paquette	b78324fc40	[AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling calls This adds support for basic sibling call lowering in AArch64. The intent here is to only handle tail calls which do not change the ABI (hence, sibling calls.) At this point, it is very restricted. It does not handle - Vararg calls. - Calls with outgoing arguments. - Calls whose calling conventions differ from the caller's calling convention. - Tail/sibling calls with BTI enabled. This patch adds - `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent to the same function in AArch64ISelLowering.cpp (albeit with the restrictions above.) - `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in AArch64ISelLowering.cpp. - `getCallOpcode`, which is exactly what it sounds like. Tail/sibling calls are lowered by checking if they pass target-independent tail call positioning checks, and checking if they satisfy `isEligibleForTailCallOptimization`. If they do, then a tail call instruction is emitted instead of a normal call. If we have a sibling call (which is always the case in this patch), then we do not emit any stack adjustment operations. When we go to lower a return, we check if we've already emitted a tail call. If so, then we skip the return lowering. For testing, this patch - Adds call-translator-tail-call.ll to test which tail calls we currently lower, which ones we don't, and which ones we shouldn't. - Updates branch-target-enforcement-indirect-calls.ll to show that we fall back as expected. Differential Revision: https://reviews.llvm.org/D67189 llvm-svn: 370996	2019-09-04 22:54:52 +00:00
Matt Arsenault	2df41a8e38	AMDGPU/GlobalISel: Select G_BITREVERSE llvm-svn: 370980	2019-09-04 20:46:31 +00:00
Matt Arsenault	5ff310e298	GlobalISel: Add basic legalization for G_BITREVERSE llvm-svn: 370979	2019-09-04 20:46:15 +00:00
Johannes Doerfert	7ab5253704	[Attributor][Fix] Make sure we do not delete live code Summary: Liveness needs to mark edges, not blocks as dead. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67191 llvm-svn: 370975	2019-09-04 20:34:52 +00:00
Leonard Chan	eca01b031d	[NewPM][Sancov] Make Sancov a Module Pass instead of 2 Passes This patch merges the sancov module and funciton passes into one module pass. The reason for this is because we ran into an out of memory error when attempting to run asan fuzzer on some protobufs (pc.cc files). I traced the OOM error to the destructor of SanitizerCoverage where we only call appendTo[Compiler]Used which calls appendToUsedList. I'm not sure where precisely in appendToUsedList causes the OOM, but I am able to confirm that it's calling this function repeatedly that causes the OOM. (I hacked sancov a bit such that I can still create and destroy a new sancov on every function run, but only call appendToUsedList after all functions in the module have finished. This passes, but when I make it such that appendToUsedList is called on every sancov destruction, we hit OOM.) I don't think the OOM is from just adding to the SmallSet and SmallVector inside appendToUsedList since in either case for a given module, they'll have the same max size. I suspect that when the existing llvm.compiler.used global is erased, the memory behind it isn't freed. I could be wrong on this though. This patch works around the OOM issue by just calling appendToUsedList at the end of every module run instead of function run. The same amount of constants still get added to llvm.compiler.used, abd we make the pass usage and logic simpler by not having any inter-pass dependencies. Differential Revision: https://reviews.llvm.org/D66988 llvm-svn: 370971	2019-09-04 20:30:29 +00:00
Evandro Menezes	bf78e39cbb	[InstCombine] Add more test cases (NFC) Add more test cases simplifying `log()`. llvm-svn: 370966	2019-09-04 20:01:09 +00:00
Alina Sbirlea	6da79ce1fe	[MemorySSA] Re-enable MemorySSA use. Differential Revision: https://reviews.llvm.org/D58311 llvm-svn: 370957	2019-09-04 19:16:04 +00:00
David Bolvansky	420cbb6190	[InstCombine] sub(xor(x, y), or(x, y)) -> neg(and(x, y)) Summary: ``` Name: sub(xor(x, y), or(x, y)) -> neg(and(x, y)) %or = or i32 %y, %x %xor = xor i32 %x, %y %sub = sub i32 %xor, %or => %sub1 = and i32 %x, %y %sub = sub i32 0, %sub1 Optimization: sub(xor(x, y), or(x, y)) -> neg(and(x, y)) Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/8OI Reviewers: lebedev.ri Reviewed By: lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67188 llvm-svn: 370945	2019-09-04 18:03:21 +00:00
David Bolvansky	f6233d90f0	[NFC] Added tests for new fold llvm-svn: 370941	2019-09-04 17:37:06 +00:00
David Bolvansky	2ceb00db76	[NFC] Adjust test filename llvm-svn: 370939	2019-09-04 17:33:53 +00:00
Craig Topper	f0081dac81	[X86] Pre-commit test cases and test run line changes for D67087 llvm-svn: 370937	2019-09-04 17:33:38 +00:00
David Bolvansky	0e07248704	[InstCombine] Fold sub (and A, B) (or A, B)) to neg (xor A, B) Summary: ``` Name: sub(and(x, y), or(x, y)) -> neg(xor(x, y)) %or = or i32 %y, %x %and = and i32 %x, %y %sub = sub i32 %and, %or => %sub1 = xor i32 %x, %y %sub = sub i32 0, %sub1 Optimization: sub(and(x, y), or(x, y)) -> neg(xor(x, y)) Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/VI6 Found by @lebedev.ri. Also author of the proof. Reviewers: lebedev.ri, spatel Reviewed By: lebedev.ri Subscribers: llvm-commits, lebedev.ri Tags: #llvm Differential Revision: https://reviews.llvm.org/D67155 llvm-svn: 370934	2019-09-04 17:30:53 +00:00
Matt Arsenault	84489b34f6	AMDGPU: Handle frame index expansion with no free SGPRs pre gfx9 Since an add instruction must produce an unused carry out, this requires additional SGPRs. This can be avoided by keeping the entire offset computation in SGPRs. If one SGPR is still available, this only costs one extra mov. If none are available, the entire computation can be done in place and reversed. This does assume the use is a VGPR operand. This was already assumed, and we currently only select frame indexes to VALU instructions. This should probably be fixed at some point to handle more possible MIR. llvm-svn: 370929	2019-09-04 17:12:57 +00:00
Matt Arsenault	70becc20fa	GlobalISel: Add G_BITREVERSE This is the first failing pattern for AMDGPU and is trivial to handle. llvm-svn: 370927	2019-09-04 17:06:53 +00:00
Johannes Doerfert	2f6220633c	[Attributor] Look at internal functions only on-demand Summary: Instead of building attributes for internal functions which we do not update as long as we assume they are dead, we now do not create attributes until we assume the internal function to be live. This improves the number of required iterations, as well as the number of required updates, in real code. On our tests, the results are mixed. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66914 llvm-svn: 370924	2019-09-04 16:35:20 +00:00
Johannes Doerfert	97fd582b91	[Attributor] Use the white list for attributes consistently Summary: We create attributes on-demand so we need to check the white list on-demand. This also unifies the location at which we create, initialize, and eventually invalidate new abstract attributes. The tests show mixed results, a few more call site attributes are determined which can cause more iterations. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66913 llvm-svn: 370922	2019-09-04 16:26:20 +00:00
Matt Arsenault	d9af712da4	AMDGPU/GlobalISel: Make 16-bit constants legal This is mostly for the benefit of patterns which use 16-bit constants. llvm-svn: 370921	2019-09-04 16:19:45 +00:00
Johannes Doerfert	b0412e437c	[Attributor] Deal more explicit with non-exact definitions Summary: Before we tried to rule out non-exact definitions early but that lead to on-demand attributes created for them anyway. As a consequence we needed to look at the definition in the initialize of each attribute again. This patch centralized this lookup and tightens the condition under which we give up on non-exact definitions. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67115 llvm-svn: 370917	2019-09-04 16:16:13 +00:00
Krzysztof Parzyszek	08a09822a5	[Hexagon] Improve generated code for test-if-bit-clear, one more time Adjust isel patterns after recent commit. Fixes https://llvm.org/PR43194. llvm-svn: 370913	2019-09-04 15:22:36 +00:00
Sanjay Patel	4a2cd7be5a	[InstSimplify] guard against unreachable code (PR43218) This would crash: https://bugs.llvm.org/show_bug.cgi?id=43218 llvm-svn: 370911	2019-09-04 15:12:55 +00:00
Alexey Lapshin	cbf1f3b771	[Debuginfo][SROA] Need to handle dbg.value in SROA pass. SROA pass processes debug info incorrecly if applied twice. Specifically, after SROA works first time, instcombine converts dbg.declare intrinsics into dbg.value. Inlining creates new opportunities for SROA, so it is called again. This time it does not handle correctly previously inserted dbg.value intrinsics. Differential Revision: https://reviews.llvm.org/D64595 llvm-svn: 370906	2019-09-04 14:19:49 +00:00
Sanjay Patel	791949afe5	[InstCombine] add tests for insert/extract with identity shuffles; NFC llvm-svn: 370901	2019-09-04 13:38:49 +00:00
David Bolvansky	b9e9478244	[NFC] Added a negative test for new fold llvm-svn: 370890	2019-09-04 12:46:25 +00:00
David Bolvansky	13dadedc29	[NFC] Fixed test llvm-svn: 370888	2019-09-04 12:43:14 +00:00
David Bolvansky	3747c48d64	[NFC] Adjust tests for new fold llvm-svn: 370886	2019-09-04 12:22:28 +00:00
David Bolvansky	163b05b45d	[NFC] Added tests for new fold llvm-svn: 370885	2019-09-04 12:18:53 +00:00
David Bolvansky	358b80b340	[InstCombine] Fold sub (or A, B) (and A, B) to (xor A, B) Summary: ``` Name: sub or and to xor %or = or i32 %y, %x %and = and i32 %x, %y %sub = sub i32 %or, %and => %sub = xor i32 %x, %y Optimization: sub or and to xor Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/eJu Reviewers: spatel, lebedev.ri Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67153 llvm-svn: 370883	2019-09-04 12:00:33 +00:00
Pavel Labath	98634c2e11	Fix address sizes in the dwarfdump-debug-loc-error-cases test the test is building a 64-bit executable, so the addresses should be 64-bit too. The test was still passing even with smaller address size, but it was hitting the "unexpected end of data" error sooner than it should. llvm-svn: 370882	2019-09-04 11:47:20 +00:00
David Bolvansky	54f3a651f3	[NFC] Added a new test for D67153 llvm-svn: 370881	2019-09-04 11:44:00 +00:00
David Bolvansky	75d734475a	[NFC] Added tests for 'SUB of OR and AND to XOR' fold llvm-svn: 370878	2019-09-04 11:17:08 +00:00
Jeremy Morse	337a7cb55e	[DebugInfo] LiveDebugValues: locations with different exprs should not be merged When comparing variable locations, LiveDebugValues currently considers only the machine location, ignoring any DIExpression applied to it. This is a problem because that DIExpression can do pretty much anything to the machine location, for example dereferencing it. This patch adds DIExpressions to that comparison; now variables based on the same register/memory-location but with different expressions will compare differently, and be dropped if we attempt to merge them between blocks. This reduces variable coverage-range a little, but only because we were producing broken locations. Differential Revision: https://reviews.llvm.org/D66942 llvm-svn: 370877	2019-09-04 11:09:05 +00:00
Pavel Labath	88b4e28a67	DWARF: Fix a regression in location list dumping Summary: While fixing the handling of some error cases, r370363 introduced new problems -- assertion failures due to unchecked errors (my excuse is that a very early version of that patch used Optional<T> instead of Expected). This patch adds proper handling of parsing errors encountered when dumping location lists from inside DWARF DIEs, and adds a bunch of additional tests. I reorder the arguments of the location list dumping functions to make them consistent, and also be able to dump the two kinds of location lists generically. Reviewers: JDevlieghere, dblaikie, probinson Subscribers: aprantl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67102 llvm-svn: 370868	2019-09-04 10:09:12 +00:00
Fangrui Song	441d450115	[yaml2obj] Support PT_GNU_STACK and PT_GNU_RELRO PT_GNU_STACK is used in an llvm-objcopy test. I plan to use PT_GNU_RELRO in a patch to improve nested segment processing in llvm-objcopy (PR42963). Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D67146 llvm-svn: 370857	2019-09-04 09:19:31 +00:00
Sam Parker	fea532230b	[ARM][ParallelDSP] SExt mul for accumulation For any unpaired muls, we accumulate them as an input to the reduction. Check the type of the mul and perform a sext if the existing accumlator input type is not the same. Differential Revision: https://reviews.llvm.org/D66993 llvm-svn: 370851	2019-09-04 08:41:34 +00:00
Taewook Oh	1975e635e6	[IRPrinting] Improve module pass printer to work better with -filter-print-funcs Summary: Previously module pass printer pass prints the banner even when the module doesn't include any function provided with `-filter-print-funcs` option. This introduced a lot of noise, especailly with ThinLTO. This diff addresses the issue and makes the banner printed only when the module includes functions in `-filter-print-funcs` list. Reviewers: fedor.sergeev Subscribers: mehdi_amini, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66560 llvm-svn: 370849	2019-09-04 08:08:58 +00:00
Jim Lin	b77aa1d248	[RISCV] Enable tail call opt for variadic function Summary: Tail call opt can treat variadic function call the same as normal function call Reviewers: mgrang, asb, lenary, lewis-revill Reviewed By: lenary Subscribers: luismarques, pzheng, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66278 llvm-svn: 370835	2019-09-04 02:03:36 +00:00
Puyan Lotfi	954d6d661f	[NFC][llvm-ifs] Adding .ifs files to the test list for llvm-ifs tool. llvm-svn: 370830	2019-09-04 00:07:49 +00:00
Reid Kleckner	3fa07dee94	Revert [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn This reverts r370525 (git commit `0bb1630685`) Also reverts r370543 (git commit `185ddc08ee`) The approach I took only works for functions marked `noreturn`. In general, a call that is not known to be noreturn may be followed by unreachable for other reasons. For example, there could be multiple call sites to a function that throws sometimes, and at some call sites, it is known to always throw, so it is followed by unreachable. We need to insert an `int3` in these cases to pacify the Windows unwinder. I think this probably deserves its own standalone, Win64-only fixup pass that runs after block placement. Implementing that will take some time, so let's revert to TrapUnreachable in the mean time. llvm-svn: 370829	2019-09-03 22:27:27 +00:00
Heejin Ahn	49e7ee4dd5	[WebAssembly] Compare functions by names in Emscripten Sjlj Summary: This removes all string constants for function names and compares functions by string directly when needed. Many of these constants are used only once or twice so the benefit of defining them separately is not very clear, and this actually fixes a bug. When we already have a `malloc` declaration which is an alias to something else within the module, ``` @malloc = weak hidden alias i8* (i32), i8* (i32)* @dlmalloc ``` (this happens compiling with emscripten with `-s WASM_OBJECT_FILES=0` because all bc files are merged before being fed into `wasm-ld` which runs the backend optimizations as LTO) `Module::getFunction("malloc")` in `canLongjmp` returns `nullptr` because `Module::getFunction` dyncasts pointer into `Function`, but the alias is a `GlobalValue` but not a `Function`. This makes `canLongjmp` return false for `malloc` in this case, and we end up adding a lot of longjmp handling code around malloc. This is not only a code size increase but actually a bug because `malloc` is used in the entry block when preparing for setjmp tables for emscripten sjlj handling, and this makes initial setjmp preparation, which has to happen in the entry block, move to another split block, and this interferes with SSA update later. This also adds two more functions, `getTempRet0` and `setTempRet0`, in the list of not longjmp-able functions. Fixes https://github.com/emscripten-core/emscripten/issues/8935. Reviewers: sbc100 Subscribers: mehdi_amini, jgravelle-google, hiraditya, sunfish, dexonsmith, dschuff, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67129 llvm-svn: 370828	2019-09-03 22:26:49 +00:00
Vedant Kumar	0fcfe89717	[llvm-profdata] Add mode to recover from profile read failures Add a mode in which profile read errors are not immediately treated as fatal. In this mode, merging makes forward progress and reports failure only if no inputs can be read. Differential Revision: https://reviews.llvm.org/D66985 llvm-svn: 370827	2019-09-03 22:23:16 +00:00
Vedant Kumar	95fb23ab37	[InstrProf] Tighten a check for malformed data records in raw profiles The check needs to validate a counter offset before performing pointer arithmetic with the (potentially corrupt) offset. Found by UBSan's pointer overflow check. rdar://54843625 Differential Revision: https://reviews.llvm.org/D66979 llvm-svn: 370826	2019-09-03 22:23:14 +00:00
Amara Emerson	2a2c25ba48	[AArch64][GlobalISel] Legalize 128 bit divisions to libcalls. Now that we have the infrastructure to support s128 types as parameters we can expand these to libcalls. Differential Revision: https://reviews.llvm.org/D66185 llvm-svn: 370823	2019-09-03 21:42:32 +00:00
Amara Emerson	fbaf425b79	[GlobalISel][CallLowering] Add support for splitting types according to calling conventions. On AArch64, s128 types have to be split into s64 GPRs when passed as arguments. This change adds the generic support in call lowering for dealing with multiple registers, for incoming and outgoing args. Support for splitting for return types not yet implemented. Differential Revision: https://reviews.llvm.org/D66180 llvm-svn: 370822	2019-09-03 21:42:28 +00:00
Alina Sbirlea	ccb1862bc9	[MemorySSA] Disable MemorySSA use. Differential Revision: https://reviews.llvm.org/D58311 llvm-svn: 370821	2019-09-03 21:20:46 +00:00
Johannes Doerfert	b19cd27b28	[Attributor] Use the delete API for liveness Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66833 llvm-svn: 370818	2019-09-03 20:42:16 +00:00
Johannes Doerfert	7516a5e045	[Attributor] Deduce "no-capture" argument attribute Add the no-capture argument attribute deduction to the Attributor fixpoint framework. The new string attributed "no-capture-maybe-returned" is introduced to allow deduction of no-capture through functions that "capture" an argument but only by "returning" it. It is only used by the Attributor for testing. Differential Revision: https://reviews.llvm.org/D59922 llvm-svn: 370817	2019-09-03 20:37:24 +00:00
Bjorn Pettersson	b0eb394417	[CodeGen] Use FSHR in DAGTypeLegalizer::ExpandIntRes_MULFIX Summary: Simplify the right shift of the intermediate result (given in four parts) by using funnel shift. There are some impact on lit tests, but that seems to be related to register allocation differences due to how FSHR is expanded on X86 (giving a slightly different operand order for the OR operations compared to the old code). Reviewers: leonardchan, RKSimon, spatel, lebedev.ri Reviewed By: RKSimon Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, pzheng, bevinh, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67036 llvm-svn: 370813	2019-09-03 19:35:07 +00:00
Alina Sbirlea	e331d50534	[MemorySSA] Re-enable MemorySSA use. Differential Revision: https://reviews.llvm.org/D58311 llvm-svn: 370811	2019-09-03 19:28:37 +00:00
Reid Kleckner	b2d10cf22e	[MC] Pass through .code16/32/64 and .syntax unified for COFF These flags should simply be passed through to the target, which will do the right thing. Add an MC/X86 test that uses these directives with the three primary object file formats and shows that they disassemble the same everywhere. There is a missing test for .code32 on Windows ARM, since I'm not sure exactly how to construct one. Fixes PR43203 llvm-svn: 370805	2019-09-03 18:16:52 +00:00
Philip Reames	37e2f5f125	[GVN] Propagate simple equalities from assumes within the tail of the block This extends the existing logic for propagating constant expressions in an analogous manner for what we do across basic blocks. The core point is that we chose some order of operands, and canonicalize uses towards that one. The heuristic used is inspired by the one used across blocks; in a follow up change, I'd plan to common them so that the cross block version uses the slightly stronger ordering herein. As noted by the TODOs in the code, there's a good amount of room for improving the existing code and making it more powerful. Some follow up work planned. Differential Revision: https://reviews.llvm.org/D66977 llvm-svn: 370791	2019-09-03 17:31:19 +00:00
Jessica Paquette	15036acb05	[AArch64][GlobalISel] Don't import i64imm_32bit pattern at -O0 This pattern, when imported at -O0 adds an extra copy via the SUBREG_TO_REG. This is because the SUBREG_TO_REG is not eliminated. At all other opt levels, it is eliminated. This is a 1% geomean code size savings at -O0 on CTMark. Differential Revision: https://reviews.llvm.org/D67027 llvm-svn: 370789	2019-09-03 17:21:12 +00:00
Roman Lebedev	bdd65351d3	Revert r370454 "[LoopIdiomRecognize] BCmp loop idiom recognition" https://bugs.llvm.org/show_bug.cgi?id=43206 was filed, claiming that there is a miscompilation. Reverting until i investigate. This reverts commit r370454 llvm-svn: 370788	2019-09-03 17:14:56 +00:00
Philip Reames	154a944a80	[Tests/GVN] Precommit requested test additions from D66977 llvm-svn: 370784	2019-09-03 17:02:55 +00:00
Jonas Paulsson	a0a811739d	[SystemZ] Recognize INLINEASM_BR in backend. SystemZInstrInfo::analyzeBranch() needs to check for INLINEASM_BR instructions, or it will crash. Review: Ulrich Weigand llvm-svn: 370753	2019-09-03 13:31:22 +00:00
David Green	2f3574c168	[ARM] Ignore Implicit CPSR regs when lowering from Machine to MC operands The code here seems to date back to r134705, when tablegen lowering was first being added. I don't believe that we need to include CPSR implicit operands on the MCInst. This now works more like other backends (like AArch64), where all implicit registers are skipped. This allows the AliasInst for CSEL's to match correctly, as can be seen in the test changes. Differential revision: https://reviews.llvm.org/D66703 llvm-svn: 370745	2019-09-03 11:30:54 +00:00
Jonas Paulsson	f12415812c	[SystemZ] Add support for fentry. SystemZAsmPrinter now properly emits function calls to __fentry__. Review: Ulrich Weigand llvm-svn: 370743	2019-09-03 11:21:12 +00:00
David Green	61973d978b	[ARM] Invert CSEL predicates if the opposite is a simpler constant to materialise This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can also be used in ISel Lowering, adding codesize values to the computed costs, to be able to compare either approximate instruction counts or codesize costs. It also adds a HasLowerConstantMaterializationCost, which compares the ConstantMaterializationCost of two values, returning true if the first is smaller either in instruction count/codesize, or falling back to the other in the case that they are equal. This is used in constant CSEL lowering to invert the predicate if the opposite is easier to materialise. Differential revision: https://reviews.llvm.org/D66701 llvm-svn: 370741	2019-09-03 11:06:24 +00:00
David Green	57cc65ff47	[ARM] Generate 8.1-m CSINC, CSNEG and CSINV instructions. Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value. This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used. Code by Ranjeet Singh and Simon Tatham, with some modifications from me. Differential revision: https://reviews.llvm.org/D66483 llvm-svn: 370739	2019-09-03 10:53:07 +00:00
David Green	a1ae7e3734	[ARM] Add csel tests. NFC llvm-svn: 370738	2019-09-03 10:32:46 +00:00
Simon Atanasyan	25d5b54542	[mips] Switch to the `.text` section after emitting asm file preamble Now the last `.section` directive in the MIPS asm file preamble is the `.section .mdebug.abi`. If assembler code injected for example by the LLVM `module asm` or the C ` __asm` directives do not contain explicit switching to the `.text` section it goes to the `.mdebug.abi` section. It might be unexpected to the user and in fact for example breaks building some existing code like FreeBSD libc [1]. The patch forces switching to the `.text` section after emitting MIPS assembler file preamble. [1] https://bugs.llvm.org/show_bug.cgi?id=43119 Fix PR43119. Differential Revision: https://reviews.llvm.org/D67014 llvm-svn: 370735	2019-09-03 10:24:07 +00:00
David Green	3e8d5f335d	[ARM] Fix MVE ldst offset ranges We were using isShiftedInt<7, Shift>(RHSC) to detect the ranges of offsets to fold into MVE loads/stores. The instructions actually take a 7 bit unsigned integer which is either added or subtracted. So something more like isShiftedUInt<7, Shift>(abs(RHSC)). Instead I've changes this to use the isScaledConstantInRange method, same as in SelectT2AddrModeImm7Offset used by pre/post inc, which seemed to already be getting this correct. Differential revision: https://reviews.llvm.org/D66997 llvm-svn: 370731	2019-09-03 09:57:02 +00:00
Oliver Stannard	3be2df2418	[ARM][MVE] Decoding of VMSR doesn't diagnose some unpredictable encodings Decoding of VMSR doesn't diagnose some unpredictable encodings, as the unpredictable bits are not correctly set. Diff-reduce this instruction's internals WRT VMRS so I can see the differences better. Mostly this is s/src/Rt/g. Fill in the "should-be-(0)" bits. Designate the Unpredictable{} bits for both VMRS and VMSR. Patch by Mark Murray! Differential revision: https://reviews.llvm.org/D66938 llvm-svn: 370729	2019-09-03 09:55:30 +00:00
Oliver Stannard	39bf484d92	Bug fix on function epilog optimization (ARM backend) To save a 'add sp,#val' instruction by adding registers to the final pop instruction, the first register transferred by this pop instruction need to be found. If the function to be optimized has a non-void return value, the operand list contains r0 (implicit) which prevents the optimization to take place. Therefore implicit register references should be skipped in the search loop, because this registers are never popped from the stack. Patch by Rainer Herbertz (rOptimizer)! Differential revision: https://reviews.llvm.org/D66730 llvm-svn: 370728	2019-09-03 09:51:19 +00:00
David Green	855caf2335	[ARM] More MVE load/store tests for offsets around the negative limit. NFC llvm-svn: 370726	2019-09-03 09:42:16 +00:00
Bjorn Pettersson	dd18ce4501	[LV] Fix miscompiles by adding non-header PHI nodes to AllowedExit Summary: Fold-tail currently supports reduction last-vector-value live-out's, but has yet to support last-scalar-value live-outs, including non-header phi's. As it relies on AllowedExit in order to detect them and bail out we need to add the non-header PHI nodes to AllowedExit, otherwise we end up with miscompiles. Solves https://bugs.llvm.org/show_bug.cgi?id=43166 Reviewers: fhahn, Ayal Reviewed By: fhahn, Ayal Subscribers: anna, hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67074 llvm-svn: 370721	2019-09-03 09:33:55 +00:00
Bjorn Pettersson	0760d348eb	[LV] Precommit test case showing miscompile from PR43166. NFC Summary: Precommit test case showing miscompile from PR43166. Reviewers: fhahn, Ayal Reviewed By: fhahn Subscribers: rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67072 llvm-svn: 370720	2019-09-03 09:33:40 +00:00
James Molloy	935499579c	[MachinePipeliner] Add a way to unit-test the schedule emitter Emitting a schedule is really hard. There are lots of corner cases to take care of; in fact, of the 60+ SWP-specific testcases in the Hexagon backend most of those are testing codegen rather than the schedule creation itself. One issue is that to test an emission corner case we must craft an input such that the generated schedule uses that corner case; sometimes this is very hard and convolutes testcases. Other times it is impossible but we want to test it anyway. This patch adds a simple test pass that will consume a module containing a loop and generate pipelined code from it. We use post-instr-symbols as a way to annotate instructions with the stage and cycle that we want to schedule them at. We also provide a flag that causes the MachinePipeliner to generate these annotations instead of actually emitting code; this allows us to generate an input testcase with: llc < %s -stop-after=pipeliner -pipeliner-annotate-for-testing -o test.mir And run the emission in isolation with: llc < test.mir -run-pass=modulo-schedule-test llvm-svn: 370705	2019-09-03 08:20:31 +00:00
Sam Tebbs	8b2df85d02	[ARM] Select vmla This patch adds vmla selection. Differential revision: https://reviews.llvm.org/D66297 llvm-svn: 370704	2019-09-03 08:17:46 +00:00
Craig Topper	9dc8c448ed	[X86] Don't use Expand for i32 fp_to_uint on SSE1/2 targets on 32-bit target. Use Custom lowering instead. Fall back to default expansion only when the scalar FP type belongs in an XMM register. This improves lowering for i32 to fp80, and also i32 to double on SSE1 only. llvm-svn: 370699	2019-09-03 05:57:18 +00:00
Craig Topper	f255f44336	[X86] Add an exhaustive test for i32 fptosi/fptoui across different triples and features. llvm-svn: 370698	2019-09-03 05:57:14 +00:00
Craig Topper	dcecc7ea46	[X86] Custom promote i32->f80 uint_to_fp on AVX512 64-bit targets. Reuse the same code to promote all i32 uint_to_fp on 64-bit targets to simplify the X86ISelLowering constructor. llvm-svn: 370693	2019-09-03 02:51:10 +00:00
Simon Pilgrim	cacf4db571	[CostModel][X86] Add scalar sext/zext cost tests llvm-svn: 370684	2019-09-02 21:02:51 +00:00
Craig Topper	45cd185109	[X86] Enable fp128 as a legal type with SSE1 rather than with MMX. FP128 values are passed in xmm registers so should be asssociated with an SSE feature rather than MMX which uses a different set of registers. llc enables sse1 and sse2 by default with x86_64. But does not enable mmx. Clang enables all 3 features by default. I've tried to add command lines to test with -sse where possible, but any test that returns a value in an xmm register fails with a fatal error with -sse since we have no defined ABI for that scenario. llvm-svn: 370682	2019-09-02 20:16:30 +00:00
David Green	a5fd8d8f47	[ARM] MVE predicate bitcast test and VPSEL adjustment. NFC llvm-svn: 370678	2019-09-02 19:03:35 +00:00
David Green	a95ec59fa5	[ARM] Use MQPR not QPR for MVE registers We should be using MQPR, and if we don't we can get COPYs and PHIs created for QPR. These get folded into instructions, failing verification checks. Differential revision: https://reviews.llvm.org/D66214 llvm-svn: 370676	2019-09-02 17:18:23 +00:00
Robert Lougher	13190c4225	[TargetLowering][PS4] Add sincos(f) lib functions when target is PS4 PS4 supports sincosf and sincos. Adding the library functions enables the sin(f)+cos(f) -> sincos(f) optimization. Differential Revision: https://reviews.llvm.org/D67009 llvm-svn: 370675	2019-09-02 16:53:32 +00:00
Ulrich Weigand	b21e245711	[SystemZ] Support constrained fpto[su]i intrinsics Now that constrained fpto[su]i intrinsic are available, add codegen support to the SystemZ backend. In addition to pure back-end changes, I've also needed to add the strict_fp_to_[su]int and any_fp_to_[su]int pattern fragments in the obvious way. llvm-svn: 370674	2019-09-02 16:49:29 +00:00
Kerry McLaughlin	da4ef9b4c8	[SVE][Inline-Asm] Support for SVE asm operands Summary: Adds the following inline asm constraints for SVE: - w: SVE vector register with full range, Z0 to Z31 - x: Restricted to registers Z0 to Z15 inclusive. - y: Restricted to registers Z0 to Z7 inclusive. This change also adds the "z" modifier to interpret a register as an SVE register. Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness. Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened Reviewed By: sdesmalen Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66302 llvm-svn: 370673	2019-09-02 16:12:31 +00:00
George Rimar	78e8011a29	Recommit r370661 "[llvm-nm] - Add a test case for case when we dump a symbol that belongs to a section with a broken sh_name." Fix: add a 'consumeError()' call to ObjectFile.cpp. This error was never checked. Original commit message: It adds a test case for a problem fixed by D66976 <https://reviews.llvm.org/D66976>. It was introduced by me in D66089 <https://reviews.llvm.org/D66089>. The error reported was never consumed because of a wrong variable name used, so it could fail when LLVM_ENABLE_ABI_BREAKING_CHECKS is used. Differential revision: https://reviews.llvm.org/D67002 llvm-svn: 370669	2019-09-02 14:57:35 +00:00
Sanjay Patel	4e54cf3e0e	[DAGCombiner] try to form test+set out of shift+mask patterns The motivating bugs are: https://bugs.llvm.org/show_bug.cgi?id=41340 https://bugs.llvm.org/show_bug.cgi?id=42697 As discussed there, we could view this as a failure of IR canonicalization, but then we would need to implement a backend fixup with target overrides to get this right in all cases. Instead, we can just view this as a codegen opportunity. It's not even clear for x86 exactly when we should favor test+set; some CPUs have better theoretical throughput for the ALU ops than bt/test. This patch is made more complicated than I expected because there's an early DAGCombine for 'and' that can change types of the intermediate ops via trunc+anyext. Differential Revision: https://reviews.llvm.org/D66687 llvm-svn: 370668	2019-09-02 14:52:09 +00:00
Jay Foad	6e18266aa4	Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0" Summary: D61491 caused us to use relocs when they're not strictly necessary, to refer to symbols in the text section. This is a pessimization and it's a problem for some loaders that don't support relocs yet. Reviewers: nhaehnle, arsenm, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65813 llvm-svn: 370667	2019-09-02 14:40:57 +00:00
Dmitry Preobrazhensky	4aa90ea58e	[AMDGPU][MC][GFX10] Corrected constant bus checks to exclude null See AMD SWDEV-157286 Reviewers: atamazov, arsenm Differential Revision: https://reviews.llvm.org/D65229 llvm-svn: 370665	2019-09-02 14:19:52 +00:00
Thomas Preud'homme	a291b950db	[FileCheck] Forbid using var defined on same line Summary: Commit r366897 introduced the possibility to set a variable from an expression, such as [[#VAR2:VAR1+3]]. While introducing this feature, it introduced extra logic to allow using such a variable on the same line later on. Unfortunately that extra logic is flawed as it relies on a mapping from variable to expression defining it when the mapping is from variable definition to expression. This flaw causes among other issues PR42896. This commit avoids the problem by forbidding all use of a variable defined on the same line, and removes the now useless logic. Redesign will be done in a later commit because it will require some amount of refactoring first for the solution to be clean. One example is the need for some sort of transaction mechanism to set a variable temporarily and from an expression and rollback if the CHECK pattern does not match so that diagnostics show the right variable values. Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk Subscribers: JonChesterfield, rogfer01, hfinkel, kristina, rnk, tra, arichardson, grimar, dblaikie, probinson, llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D66141 llvm-svn: 370663	2019-09-02 14:04:00 +00:00
George Rimar	b567ce7680	Revert r370661 "[llvm-nm] - Add a test case for case when we dump a symbol that belongs to a section with a broken sh_name" It broke BB: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/16955/steps/test/logs/stdio Expected<T> must be checked before access or destruction. Unchecked Expected<T> contained error: a section [index 1] has an invalid sh_name (0xffff) offset which goes past the end of the section name string tableStack dump: 0. Program arguments: /srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm /srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/test/tools/llvm-nm/Output/format-sysv-section.test.tmp2.o --format=sysv #0 0x00000000008af7c4 PrintStackTraceSignalHandler(void) (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x8af7c4) #1 0x00000000008ad8be llvm::sys::RunSignalHandlers() (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x8ad8be) #2 0x00000000008afbd8 SignalHandler(int) (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x8afbd8) #3 0x00007f0a6b989730 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12730) #4 0x00007f0a6b48d7bb raise (/lib/x86_64-linux-gnu/libc.so.6+0x377bb) #5 0x00007f0a6b478535 abort (/lib/x86_64-linux-gnu/libc.so.6+0x22535) #6 0x000000000042004b llvm::Expected<llvm::StringRef>::fatalUncheckedExpected() const (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x42004b) #7 0x00000000008367f5 (/sv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x8367f5) #8 0x0000000000817b80 llvm::object::IRObjectFile::findBitcodeInObject(llvm::object::ObjectFile const&) (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x817b80) #9 0x0000000000838416 llvm::object::SymbolicFile::createSymbolicFile(llvm::MemoryBufferRef, llvm::file_magic, llvm::LLVMContext) (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x838416) #10 0x00000000007f36cb llvm::object::createBinary(llvm::MemoryBufferRef, llvm::LLVMContext*) (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x7f36cb) #11 0x0000000000413123 dumpSymbolNamesFromFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x413123) #12 0x0000000000412e38 main (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x412e38) #13 0x00007f0a6b47a09b __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409b) #14 0x00000000004120da _start (/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/llvm-nm+0x4120da) FileCheck error: '-' is empty. FileCheck command line: /srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck /srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/test/tools/llvm-nm/format-sysv-section.test --check-prefix=ERR -- llvm-svn: 370662	2019-09-02 14:03:50 +00:00
George Rimar	4b9233cafb	[llvm-nm] - Add a test case for case when we dump a symbol that belongs to a section with a broken sh_name. It adds a test case for a problem fixed by D66976. It was introduced by me in D66089. The error reported was never consumed because of a wrong variable name used, so it could fail when LLVM_ENABLE_ABI_BREAKING_CHECKS is used. Differential revision: https://reviews.llvm.org/D67002 llvm-svn: 370661	2019-09-02 13:54:45 +00:00
Dmitry Preobrazhensky	9c68eddbbe	[AMDGPU][MC][GFX10] Enabled null with 64-bit operands See Bug 42745: https://bugs.llvm.org/show_bug.cgi?id=42745 Reviewers: atamazov, arsenm https://reviews.llvm.org/D65231 llvm-svn: 370660	2019-09-02 13:42:25 +00:00
Sanjay Patel	561c39994b	[InstCombine] recognize bswap disguised as shufflevector bitcast <N x i8> (shuf X, undef, <N, N-1,...0>) to i{N8} --> bswap (bitcast X to i{N8}) In PR43146: https://bugs.llvm.org/show_bug.cgi?id=43146 ...we have a more complicated case where SLP is making a mess of bswap. This patch won't do anything for that currently, but we need to improve bswap recognition in instcombine, SLP, and/or a standalone pass to avoid that problem. This is limited using the data-layout so we don't try to do this transform with actual vector types. The backend does not appear to have folds to convert in either direction, so we don't want to mess up something that is actually better lowered as a shuffle. On x86, we're trading something like this: vmovd %edi, %xmm0 vpshufb LCPI0_0(%rip), %xmm0, %xmm0 ## xmm0 = xmm0[3,2,1,0,u,u,u,u,u,u,u,u,u,u,u,u] vmovd %xmm0, %eax For: movl %edi, %eax bswapl %eax Differential Revision: https://reviews.llvm.org/D66965 llvm-svn: 370659	2019-09-02 13:33:20 +00:00
Martin Storsjo	491fc23a60	[test] [llvm-dlltool] Improve test strictness a little. NFC. llvm-svn: 370657	2019-09-02 13:28:21 +00:00
Martin Storsjo	1cec6b2970	[llvm-dlltool] Handle external and internal names with differing decoration Also add a missed part of the test from SVN r369747. Differential Revision: https://reviews.llvm.org/D66996 llvm-svn: 370656	2019-09-02 13:28:16 +00:00
Dmitry Preobrazhensky	fe2ee4c46a	[AMDGPU][MC][GFX10] Corrected constant bus limit for 64-bit shift instructions See bug 42744: https://bugs.llvm.org/show_bug.cgi?id=42744 Reviewers: atamazov, arsenm Differential Revision: https://reviews.llvm.org/D65228 llvm-svn: 370652	2019-09-02 12:50:05 +00:00
Andrea Di Biagio	528f68144b	[X86][BtVer2] Fix latency and throughput of conditional SIMD store instructions. On BtVer2 conditional SIMD stores are heavily microcoded. The latency is directly proportional to the number of packed elements extracted from the input vector. Also, according to micro-benchmarks, most of the computation seems to be done in the integer unit. Only a minority of the uOPs is executed by the FPU. The observed behaviour on the FPU looks similar to this: - The input MASK value is moved to the Integer Unit -- [ a VMOVMSK-like uOP-executed on JFPU0]. - In parallel, each element of the input XMM/YMM is extracted and then sent to the IntegerUnit through JFPU1. As expected, a (conditional) store is executed for every extracted element. Interestingly, a (speculative) load is executed for every extracted element too. It is as-if a "LOAD - BIT_EXTRACT- CMOV" sequence of uOPs is repeated by the integer unit for every contionally stored element. VMASKMOVDQU is a special case: the number of speculative loads is always 2 (presumably, one load per quadword). That means, extra shifts and masking is performed on (one of) the loaded quadwords before each conditional store (that also explains the big number of non-FP uOPs retired). This patch replaces the existing writes for conditional SIMD stores (i.e. WriteFMaskedStore, and WriteFMaskedStoreY) with the following new writes: WriteFMaskedStore32 [ XMM Packed Single ] WriteFMaskedStore32Y [ YMM Packed Single ] WriteFMaskedStore64 [ XMM Packed Double ] WriteFMaskedStore64Y [ YMM Packed Double ] Added a wrapper class named X86SchedWriteMaskMove in X86Schedule.td to describe both RM and MR variants for conditional SIMD moves in a single tablegen definition. Instances of that class are then passed in input to multiclass avx_movmask_rm when constructing MASKMOVPS/PD definitions. Since this patch introduces new writes, I had to update all the X86 scheduling models. Differential Revision: https://reviews.llvm.org/D66801 llvm-svn: 370649	2019-09-02 12:32:28 +00:00
Jeremy Morse	22493f66f1	[DebugInfo] LiveDebugValues: correctly discriminate kinds of variable locations The missing line added by this patch ensures that only spilt variable locations are candidates for being restored from the stack. Otherwise, register or constant-value information can be interpreted as a spill location, through a union. The added regression test replicates a scenario where this occurs: the stack load from [rsp] causes the register-location DBG_VALUE to be "restored" to rsi, when it should be left alone. See PR43058 for details. Un x-fail a test that was suffering from this from a previous patch. Differential Revision: https://reviews.llvm.org/D66895 llvm-svn: 370648	2019-09-02 12:28:36 +00:00
James Henderson	43e9ead1ed	[llvm-strings][test] Merge two closely related tests This is a follow-up to feedback on D66015. Reviewed by: grimar Differential Revision: https://reviews.llvm.org/D67069 llvm-svn: 370643	2019-09-02 11:42:30 +00:00
Piotr Sobczak	252a584cbd	[AMDGPU] Add test Summary: Add test checking that the redundant immediate MOV instruction (by-product of handling phi nodes) is not found in the generated code. Reviewers: arsenm, anton-afanasyev, craig.topper, rtereshin, bogner Reviewed By: arsenm Subscribers: kzhuravl, yaxunl, dstuttard, tpr, t-tye, wdng, jvesely, nhaehnle, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63860 llvm-svn: 370634	2019-09-02 10:02:54 +00:00
George Rimar	86cc736df1	[yaml2obj] - Allow overriding sh_name fields of the sections. This is in line with the previous changes which allowed to override the sh_offset/sh_size and useful for writing test cases. Differential revision: https://reviews.llvm.org/D66998 llvm-svn: 370633	2019-09-02 09:47:17 +00:00
Djordje Todorovic	5c6b82a756	[DWARFVerifier] Verify GNU extensions of call site DWARF symbols Verify that the call site DWARF symbols (added during the implementation of the debug entry values feature) are generated properly. Differential Revision: https://reviews.llvm.org/D66865 llvm-svn: 370631	2019-09-02 09:20:46 +00:00
Amara Emerson	453ef4e376	[AArch64][GlobalISel] Fix zext narrowScalar to use the right type when creating the merges. Fixes PR43171. llvm-svn: 370627	2019-09-02 08:18:55 +00:00
Craig Topper	3ab210862a	[X86] Add initial support for unfolding broadcast loads from arithmetic instructions to enable LICM hoisting of the load MachineLICM can hoist an invariant load, but if that load is folded it needs to be unfolded. On AVX512 sometimes this load is an broadcast load which we were previously unable to unfold. This patch adds initial support for that with a very basic list of supported instructions as a starting point. Differential Revision: https://reviews.llvm.org/D67017 llvm-svn: 370620	2019-09-01 22:14:36 +00:00
Sanjay Patel	c882208367	[DAGCombiner] improve throughput of shift+logic+shift The motivating case for this is a long way from here: https://bugs.llvm.org/show_bug.cgi?id=43146 ...but I think this is where we have to start. We need to canonicalize/optimize sequences of shift and logic to ease pattern matching for things like bswap and improve perf in general. But without the artificial limit of '!LegalTypes' (early combining), there are a lot of test diffs, and not all are good. In the minimal tests added for this proposal, x86 should have better throughput in all cases. AArch64 is neutral for scalar tests because it can fold shifts into bitwise logic ops. There are 3 shift opcodes and 3 logic opcodes for a total of 9 possible patterns: https://rise4fun.com/Alive/VlI https://rise4fun.com/Alive/n1m https://rise4fun.com/Alive/1Vn Differential Revision: https://reviews.llvm.org/D67021 llvm-svn: 370617	2019-09-01 18:38:15 +00:00
Roman Lebedev	ff21e3f055	[ConstantFolding] Fix 'undef' folding for @llvm.[us]{add,sub}.with.overflow ops (PR43188) As we have already established/fixed in https://bugs.llvm.org/show_bug.cgi?id=42209 https://reviews.llvm.org/D63065 https://reviews.llvm.org/rL363522 the InstSimplify handling for @llvm.with.overflow ops with undefs is correct. Therefore if ConstantFolding produces different results, then it is wrong. This duplication of code hints at the need for some refactoring, but for now address the brokenness of ConstantFolding by copying the known-good handling from rL363522. Fixes https://bugs.llvm.org/show_bug.cgi?id=43188 llvm-svn: 370608	2019-09-01 11:56:52 +00:00
David Green	8469a39af3	[ARM] Remove MVE masked loads/stores These were never enabled correctly and are causing other problems. Taking them out for the moment, whilst we work on the issues. This reverts r370329. llvm-svn: 370607	2019-09-01 10:11:40 +00:00
Shiva Chen	adfdcb9c26	[TargetLowering] Fix Bugzilla ID 43183 to avoid soften comparison broken with constant inputs Summary: This fixes the bugzilla id 43183 which triggerd by the following commit: [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall llvm-svn: 370604	2019-09-01 04:52:54 +00:00
Puyan Lotfi	75a8a212d4	[GlobalISel][NFC] Regression test cases for aarch64 legalizer (s128 sext+icmp). There were legalizer asserts in aarch64 globalisel (in debug mode) with s128 sext+icmp before r367060 and r366943 landed. These are just a couple reduced mir and ir regression tests that came from a build where these were encountered. llvm-svn: 370602	2019-09-01 00:45:28 +00:00
David Bolvansky	ff0ad3c43d	[InstCombine] mempcpy(d,s,n) to memcpy(d,s,n) + n Summary: Back-end currently expands mempcpy, but middle-end should work with memcpy instead of mempcpy to enable more memcpy-optimization. GCC backend emits mempcpy, so LLVM backend could form it too, if we know mempcpy libcall is better than memcpy + n. https://godbolt.org/z/dOCG96 Reviewers: efriedma, spatel, craig.topper, RKSimon, jdoerfert Reviewed By: efriedma Subscribers: hjl.tools, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65737 llvm-svn: 370593	2019-08-31 18:19:05 +00:00
Simon Pilgrim	f8d1d00190	[X86] EltsFromConsecutiveLoads - Don't confuse elt count with vector element count (PR43170) EltsFromConsecutiveLoads was assuming that the number of input elts was the same as the number of elements in the output vector type when creating a zeroing shuffle, causing an assert when subvectors were being combined instead of just scalars. llvm-svn: 370592	2019-08-31 16:21:31 +00:00
Simon Pilgrim	20be06db97	[X86][AVX512] Regenerate tests with common prefixes llvm-svn: 370591	2019-08-31 16:04:39 +00:00
Sanjay Patel	11704d0f51	[AArch64][x86] increase value type coverage in tests; NFC This goes with D67021. llvm-svn: 370590	2019-08-31 15:49:16 +00:00
Amaury Sechet	82825ab882	[DAGCombiner] Match (add X, X) as (shl X, 1) when detecting rotate. Summary: The combiner transforms (shl X, 1) into (add X, X). Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66882 llvm-svn: 370578	2019-08-31 11:40:02 +00:00
Nikita Popov	a91f729279	[CVP] Add tests for simplified with.overflow + icmp; NFC These tests are based on D19867. llvm-svn: 370574	2019-08-31 09:58:42 +00:00
Nikita Popov	b9e668f2e7	[CVP] Generate simpler code for elided with.overflow intrinsics Use a { iN undef, i1 false } struct as the base, and only insert the first operand, instead of using { iN undef, i1 undef } as the base and inserting both. This is the same as what we do in InstCombine. Differential Revision: https://reviews.llvm.org/D67034 llvm-svn: 370573	2019-08-31 09:58:37 +00:00
Wei Mi	798e59b81f	[SampleFDO] Add profile symbol list section to discriminate function being cold versus function being newly added. This is the second half of https://reviews.llvm.org/D66374. Profile symbol list is the collection of function symbols showing up in the binary which generates the current profile. It is used to discriminate function being cold versus function being newly added. Profile symbol list is only added for profile with ExtBinary format. During profile use compilation, when profile-sample-accurate is enabled, a function without profile will be regarded as cold only when it is contained in that list. Differential Revision: https://reviews.llvm.org/D66766 llvm-svn: 370563	2019-08-31 02:27:26 +00:00
Thomas Lively	d0d9317061	[WebAssembly] Add SIMD QFMA/QFMS Summary: Adds clang builtins and LLVM intrinsics for these experimental instructions. They are not implemented in engines yet, but that is ok because the user must opt into using them by calling the builtins. Reviewers: aheejin, dschuff Reviewed By: aheejin Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D67020 llvm-svn: 370556	2019-08-31 00:12:29 +00:00
Alina Sbirlea	3d03769ba0	[MemorySSA] Rename all phi entries. When renaming Phis incoming values, there may be multiple edges incoming from the same block (switch). Rename all. llvm-svn: 370548	2019-08-30 23:02:53 +00:00
Wei Mi	5ef5829fb0	[GVN] Verify value equality before doing phi translation for call instruction This is an updated version of https://reviews.llvm.org/D66909 to fix PR42605. Basically, current phi translatation translates an old value number to an new value number for a call instruction based on the literal equality of call expression, without verifying there is no clobber in between. This is incorrect. To get a finegrain check, use MachineDependence analysis to do the job. However, this is still not ideal. Although given a call instruction, `MemoryDependenceResults::getCallDependencyFrom` returns identical call instructions without clobber in between using MemDepResult with its DepType to be `Def`. However, identical is too strict here and we want it to be relaxed a little to consider phi-translation -- callee is the same, param operands can be different. That means changing the semantic of `MemDepResult::Def` and I don't know the potential impact. So currently the patch is still conservative to only handle MemDepResult::NonFuncLocal, which means the current call has no function local clobber. If there is clobber, even if the clobber doesn't stand in between the current call and the call with the new value, we won't do phi-translate. Differential Revision: https://reviews.llvm.org/D67013 llvm-svn: 370547	2019-08-30 23:01:22 +00:00
Reid Kleckner	657a06c619	[MC] Avoid crashes from improperly nested or wrong target .seh_handlerdata directives llvm-svn: 370540	2019-08-30 22:25:55 +00:00
Reid Kleckner	a33474d595	[X86] Print register names in .seh_* directives Also improve assembler parser register validation for .seh_ directives. This requires moving X86-specific seh directive handling into the x86 backend, which addresses some assembler FIXMEs. Differential Revision: https://reviews.llvm.org/D66625 llvm-svn: 370533	2019-08-30 21:23:05 +00:00
Sanjay Patel	cfe959709f	[x86] add tests for shift-logic-shift; NFC llvm-svn: 370529	2019-08-30 20:51:51 +00:00
Sanjay Patel	82847b50e9	[AArch64] add tests for shift-logic-shift; NFC llvm-svn: 370528	2019-08-30 20:48:43 +00:00
Reid Kleckner	0bb1630685	[Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn Users have complained llvm.trap produce two ud2 instructions on Win64, one for the trap, and one for unreachable. This change fixes that. TrapUnreachable was added and enabled for Win64 in r206684 (April 2014) to avoid poorly understood issues with the Windows unwinder. There seem to be two major things in play: - the unwinder - C++ EH, _CxxFrameHandler3 & co The unwinder disassembles forward from the return address to scan for epilogues. Inserting a ud2 had the effect of stopping the unwinder, and ensuring that it ran the EH personality function for the current frame. However, it's not clear what the unwinder does when the return address happens to be the last address of one function and the first address of the next function. The Visual C++ EH personality, _CxxFrameHandler3, needs to figure out what the current EH state number is. It does this by consulting the ip2state table, which maps from PC to state number. This seems to go wrong when the return address is the last PC of the function or catch funclet. I'm not sure precisely which system is involved here, but in order to address these real or hypothetical problems, I believe it is enough to insert int3 after a call site if it would otherwise be the last instruction in a function or funclet. I was able to reproduce some similar problems locally by arranging for a noreturn call to appear at the end of a catch block immediately before an unrelated function, and I confirmed that the problems go away when an extra trailing int3 instruction is added. MSVC inserts int3 after every noreturn function call, but I believe it's only necessary to do it if the call would be the last instruction. This change inserts a pseudo instruction that expands to int3 if it is in the last basic block of a function or funclet. I did what I could to run the Microsoft compiler EH tests, and the ones I was able to run showed no behavior difference before or after this change. Differential Revision: https://reviews.llvm.org/D66980 llvm-svn: 370525	2019-08-30 20:46:39 +00:00
Sanjay Patel	a39ef6dea6	[Thumb2] tighten CHECK lines in test; NFC The sequence between the function call and the asm start may change without affecting what this test is looking for, but we should have a better idea about what that sequence looks like. llvm-svn: 370518	2019-08-30 20:15:01 +00:00
Craig Topper	4b61b6476b	[X86] Fix mul test cases in avx512-broadcast-unfold.ll to not get canonicalized to fadd. Remove the fsub test cases which were also testing fadd. Not sure how to prevent an fsub by constant getting turned into an fadd by negative constant. llvm-svn: 370515	2019-08-30 20:04:23 +00:00
Craig Topper	a707ced18f	[X86] Regenerate the test cases added in r370506. Something weird happened with the v2i64/v2f64 test cases which don't use broadcast. So they should already be hoisted, but weren't in the version I submitted in r370506. This fixes that. Not sure if something changed or I screwed up. llvm-svn: 370507	2019-08-30 19:42:48 +00:00
Craig Topper	2396919200	[X86] Add test caes for opportunities for machine LICM to unfold broadcasted constant pool loads. MachineLICM is able to unfold loads to move an invariant load out a loop, but X86 infrastructure currently lacks the ability to do this when avx512 embedded broadcasting is used. This test adds examples for the basic float point operations, add, mul, and, or, and xor. llvm-svn: 370506	2019-08-30 19:26:06 +00:00
Jinsong Ji	fb4b86af92	[PowerPC][NFC] Avoid checking non-relevant .cfi instructions Summary: This is brought up in https://reviews.llvm.org/D64662?id=209923#inline-599490 CFI information are non-relevant to quite some testcases, we should get rid of checking them when its unecessary. This patch avoid generating cfi info in testcases that are not testing prolog/epilog or exception handling. Reviewers: kbarton, hfinkel, nemanjai, #powerpc Reviewed By: hfinkel Subscribers: MaskRay, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67016 llvm-svn: 370505	2019-08-30 19:24:25 +00:00
Puyan Lotfi	d719c50655	[llvm-ifs][IFS] llvm Interface Stubs merging + object file generation tool. This tool merges interface stub files to produce a merged interface stub file or a stub library. Currently it for stub library generation it can produce an ELF .so stub file, or a TBD file (experimental). It will be used by the clang -emit-interface-stubs compilation pipeline to merge and assemble the per-CU stub files into a stub library. The new IFS format is as follows: --- !experimental-ifs-v1 IfsVersion: 1.0 Triple: <llvm triple> ObjectFileFormat: <ELF \| TBD> Symbols: _ZSymbolName: { Type: <type>, etc... } ... Differential Revision: https://reviews.llvm.org/D66405 llvm-svn: 370499	2019-08-30 18:26:05 +00:00
Craig Topper	18e8d02e8c	[X86] Pass v32i16/v64i8 in zmm registers on KNL target. gcc and icc pass these types in zmm registers in zmm registers. This patch implements a quick hack to override the register type before calling convention handling to one that is legal. Longer term we might want to do something similar to 256-bit integer registers on AVX1 where we just split all the operations. Fixes PR42957 Differential Revision: https://reviews.llvm.org/D66708 llvm-svn: 370495	2019-08-30 17:35:08 +00:00
Evgeniy Stepanov	04647f5e22	MemTag: unchecked load/store optimization. Summary: MTE allows memory access to bypass tag check iff the address argument is [SP, #imm]. This change takes advantage of this to demote uses of tagged addresses to regular FrameIndex operands, reducing register pressure in large functions. MO_TAGGED target flag is used to signal that the FrameIndex operand refers to memory that might be tagged, and needs to be handled with care. Such operand must be lowered to [SP, #imm] directly, without a scratch register. The transformation pass attempts to predict when the offset will be out of range and disable the optimization. AArch64RegisterInfo::eliminateFrameIndex has an escape hatch in case this prediction has been wrong, but it is quite inefficient and should be avoided. Reviewers: pcc, vitalybuka, ostannard Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66457 llvm-svn: 370490	2019-08-30 17:23:02 +00:00
Johannes Doerfert	3fac668d83	[Attributor] Use existing function information for the call site Summary: Instead of recomputing information for call sites we now use the function information directly. This is always valid and once we have call site specific information we can improve here. This patch also bootstraps attributes that are created on-demand through an initial update call. Information that is known will then directly be available in the new attribute without causing an iteration delay. The tests show how this improves the iteration count. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66781 llvm-svn: 370480	2019-08-30 15:24:52 +00:00
Johannes Doerfert	81df452d82	[Attributor] Manifest load/store alignment generally Summary: Any pointer could have load/store users not only floating ones so we move the manifest logic for alignment into the AAAlignImpl class. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66922 llvm-svn: 370479	2019-08-30 15:22:28 +00:00
Piotr Sobczak	67b979466a	[InstCombine][AMDGPU] Simplify tbuffer loads Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66926 llvm-svn: 370475	2019-08-30 14:20:04 +00:00
George Rimar	4e71702cd4	[yaml2obj][obj2yaml] - Use a single "Other" field instead of "Other", "Visibility" and "StOther". Currenly we can encode the 'st_other' field of symbol using 3 fields. 'Visibility' is used to encode STV_* values. 'Other' is used to encode everything except the visibility, but it can't handle arbitrary values. 'StOther' is used to encode arbitrary values when 'Visibility'/'Other' are not helpfull enough. 'st_other' field is used to encode symbol visibility and platform-dependent flags and values. Problem to encode it is that it consists of Visibility part (STV_* values) which are enumeration values and the Other part, which is different and inconsistent. For MIPS the Other part contains flags for all STO_MIPS_* values except STO_MIPS_MIPS16. (Like comment in ELFDumper says: "Someones in their infinite wisdom decided to make STO_MIPS_MIPS16 flag overlapped with other ST_MIPS_xxx flags."...) And for PPC64 the Other part might actually encode any value. This patch implements custom logic for handling the st_other and removes 'Visibility' and 'StOther' fields. Here is an example of a new YAML style this patch allows: - Name: foo Other: [ 0x4 ] - Name: bar Other: [ STV_PROTECTED, 4 ] - Name: zed Other: [ STV_PROTECTED, STO_MIPS_OPTIONAL, 0xf8 ] Differential revision: https://reviews.llvm.org/D66886 llvm-svn: 370472	2019-08-30 13:39:22 +00:00
Simon Atanasyan	68f73bf262	[mips] Merge common checkings under the same check prefix. NFC llvm-svn: 370467	2019-08-30 12:15:12 +00:00
Luis Marques	c2b3d527fa	[RISCV] Fix a couple of tests' CHECKs llvm-svn: 370466	2019-08-30 12:11:47 +00:00
Amaury Sechet	485760f4c0	[X86] Add tests for rotate matching. NFC llvm-svn: 370464	2019-08-30 11:35:28 +00:00
Chris Jackson	fa1fe93789	[llvm-objcopy] Allow the visibility of symbols created by --binary and --add-symbol to be specified with --new-symbol-visibility llvm-svn: 370458	2019-08-30 10:17:16 +00:00
Hideto Ueno	6381b143f6	[Attributor] Implement AANoAliasCallSiteArgument initialization Summary: This patch adds an appropriate `initialize` method for `AANoAliasCallSiteArgument`. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66927 llvm-svn: 370456	2019-08-30 10:00:32 +00:00
Roman Lebedev	5c9f3cfec7	[LoopIdiomRecognize] BCmp loop idiom recognition Summary: @mclow.lists brought up this issue up in IRC. It is a reasonably common problem to compare some two values for equality. Those may be just some integers, strings or arrays of integers. In C, there is `memcmp()`, `bcmp()` functions. In C++, there exists `std::equal()` algorithm. One can also write that function manually. libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ libc++ does not do anything like that, it simply relies on simple C++'s `operator==()`. https://godbolt.org/z/er0Zwf (GOOD!) So likely, there exists a certain performance opportunities. Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that is using `memcmp()` (in this case, compiled with modified compiler). {F8768213} ``` #include <algorithm> #include <cmath> #include <cstdint> #include <iterator> #include <limits> #include <random> #include <type_traits> #include <utility> #include <vector> #include "benchmark/benchmark.h" template <class T> bool equal(T* a, T* a_end, T* b) noexcept { for (; a != a_end; ++a, ++b) { if (a != b) return false; } return true; } template <typename T> std::vector<T> getVectorOfRandomNumbers(size_t count) { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(), std::numeric_limits<T>::max()); std::vector<T> v; v.reserve(count); std::generate_n(std::back_inserter(v), count, [&dis, &gen]() { return dis(gen); }); assert(v.size() == count); return v; } struct Identical { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto Tmp = getVectorOfRandomNumbers<T>(count); return std::make_pair(Tmp, std::move(Tmp)); } }; struct InequalHalfway { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto V0 = getVectorOfRandomNumbers<T>(count); auto V1 = V0; V1[V1.size() / size_t(2)]++; // just change the value. return std::make_pair(std::move(V0), std::move(V1)); } }; template <class T, class Gen> void BM_bcmp(benchmark::State& state) { const size_t Length = state.range(0); const std::pair<std::vector<T>, std::vector<T>> Data = Gen::template Gen<T>(Length); const std::vector<T>& a = Data.first; const std::vector<T>& b = Data.second; assert(a.size() == Length && b.size() == a.size()); benchmark::ClobberMemory(); benchmark::DoNotOptimize(a); benchmark::DoNotOptimize(a.data()); benchmark::DoNotOptimize(b); benchmark::DoNotOptimize(b.data()); for (auto _ : state) { const bool is_equal = equal(a.data(), a.data() + a.size(), b.data()); benchmark::DoNotOptimize(is_equal); } state.SetComplexityN(Length); state.counters["eltcnt"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant); state.counters["eltcnt/sec"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate); const size_t BytesRead = 2 * sizeof(T) * Length; state.counters["bytes_read/iteration"] = benchmark::Counter(BytesRead, benchmark::Counter::kDefaults, benchmark::Counter::OneK::kIs1024); state.counters["bytes_read/sec"] = benchmark::Counter( BytesRead, benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024); } template <typename T> static void CustomArguments(benchmark::internal::Benchmark* b) { const size_t L2SizeBytes = []() { for (const benchmark::CPUInfo::CacheInfo& I : benchmark::CPUInfo::Get().caches) { if (I.level == 2) return I.size; } return 0; }(); // What is the largest range we can check to always fit within given L2 cache? const size_t MaxLen = L2SizeBytes / /total bufs/ 2 / /maximal elt size/ sizeof(T) / /safety margin/ 2; b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN); } BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical) ->Apply(CustomArguments<uint64_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway) ->Apply(CustomArguments<uint64_t>); ``` {F8768210} ``` $ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx 2019-04-25 21:17:11 Running build-old/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 0.65, 3.90, 4.14 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 432131 ns 432101 ns 1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s BM_bcmp<uint8_t, Identical>_BigO 0.86 N 0.86 N BM_bcmp<uint8_t, Identical>_RMS 8 % 8 % <...> BM_bcmp<uint16_t, Identical>/256000 161408 ns 161409 ns 4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s BM_bcmp<uint16_t, Identical>_BigO 0.67 N 0.67 N BM_bcmp<uint16_t, Identical>_RMS 25 % 25 % <...> BM_bcmp<uint32_t, Identical>/128000 81497 ns 81488 ns 8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s BM_bcmp<uint32_t, Identical>_BigO 0.71 N 0.71 N BM_bcmp<uint32_t, Identical>_RMS 42 % 42 % <...> BM_bcmp<uint64_t, Identical>/64000 50138 ns 50138 ns 10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s BM_bcmp<uint64_t, Identical>_BigO 0.84 N 0.84 N BM_bcmp<uint64_t, Identical>_RMS 27 % 27 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 192405 ns 192392 ns 3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.38 N 0.38 N BM_bcmp<uint8_t, InequalHalfway>_RMS 3 % 3 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 127858 ns 127860 ns 5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint16_t, InequalHalfway>_RMS 0 % 0 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 49140 ns 49140 ns 14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.40 N 0.40 N BM_bcmp<uint32_t, InequalHalfway>_RMS 18 % 18 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 32101 ns 32099 ns 21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint64_t, InequalHalfway>_RMS 1 % 1 % RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0 2019-04-25 21:19:29 Running build-new/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 1.01, 2.85, 3.71 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 18593 ns 18590 ns 37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s BM_bcmp<uint8_t, Identical>_BigO 0.04 N 0.04 N BM_bcmp<uint8_t, Identical>_RMS 37 % 37 % <...> BM_bcmp<uint16_t, Identical>/256000 18950 ns 18948 ns 37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s BM_bcmp<uint16_t, Identical>_BigO 0.08 N 0.08 N BM_bcmp<uint16_t, Identical>_RMS 34 % 34 % <...> BM_bcmp<uint32_t, Identical>/128000 18627 ns 18627 ns 37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s BM_bcmp<uint32_t, Identical>_BigO 0.16 N 0.16 N BM_bcmp<uint32_t, Identical>_RMS 35 % 35 % <...> BM_bcmp<uint64_t, Identical>/64000 18855 ns 18855 ns 37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s BM_bcmp<uint64_t, Identical>_BigO 0.32 N 0.32 N BM_bcmp<uint64_t, Identical>_RMS 33 % 33 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 9570 ns 9569 ns 73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.02 N 0.02 N BM_bcmp<uint8_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 9547 ns 9547 ns 74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.04 N 0.04 N BM_bcmp<uint16_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 9396 ns 9394 ns 73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.08 N 0.08 N BM_bcmp<uint32_t, InequalHalfway>_RMS 30 % 30 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 9499 ns 9498 ns 73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.16 N 0.16 N BM_bcmp<uint64_t, InequalHalfway>_RMS 28 % 28 % Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench Benchmark Time CPU Time Old Time New CPU Old CPU New --------------------------------------------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 -0.9570 -0.9570 432131 18593 432101 18590 <...> BM_bcmp<uint16_t, Identical>/256000 -0.8826 -0.8826 161408 18950 161409 18948 <...> BM_bcmp<uint32_t, Identical>/128000 -0.7714 -0.7714 81497 18627 81488 18627 <...> BM_bcmp<uint64_t, Identical>/64000 -0.6239 -0.6239 50138 18855 50138 18855 <...> BM_bcmp<uint8_t, InequalHalfway>/512000 -0.9503 -0.9503 192405 9570 192392 9569 <...> BM_bcmp<uint16_t, InequalHalfway>/256000 -0.9253 -0.9253 127858 9547 127860 9547 <...> BM_bcmp<uint32_t, InequalHalfway>/128000 -0.8088 -0.8088 49140 9396 49140 9394 <...> BM_bcmp<uint64_t, InequalHalfway>/64000 -0.7041 -0.7041 32101 9499 32099 9498 ``` What can we tell from the benchmark? * Performance of naive equality check somewhat improves with element size, maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s for uint64_t. I think, that instability implies performance problems. * Performance of `memcmp()`-aware benchmark always maxes out at around bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the naive variant! * eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and linearly decreases with element size. For uint64_t, it's ~4x+ the elements/second. * The call obvious is more pricey than the loop, with small element count. As it can be seen from the full output {F8768210}, the `memcmp()` is almost universally worse, independent of the element size (and thus buffer size) when element count is less than 8. So all in all, bcmp idiom does indeed pose untapped performance headroom. This diff does implement said idiom recognition. I think a reasonable test coverage is present, but do tell if there is anything obvious missing. Now, quality. This does succeed to build and pass the test-suite, at least without any non-bundled elements. {F8768216} {F8768217} This transform fires 91 times: ``` $ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json Tests: 1149 Metric: loop-idiom.NumBCmp Program result-new MultiSourc...Benchmarks/7zip/7zip-benchmark 79.00 MultiSource/Applications/d/make_dparser 3.00 SingleSource/UnitTests/vla 2.00 MultiSource/Applications/Burg/burg 1.00 MultiSourc.../Applications/JM/lencod/lencod 1.00 MultiSource/Applications/lemon/lemon 1.00 MultiSource/Benchmarks/Bullet/bullet 1.00 MultiSourc...e/Benchmarks/MallocBench/gs/gs 1.00 MultiSourc...gs-C/TimberWolfMC/timberwolfmc 1.00 MultiSourc...Prolangs-C/simulator/simulator 1.00 ``` The size changes are: I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look. ``` $ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash Tests: 1149 Same hash: 907 (filtered out) Remaining: 242 Metric: size..text Program result-old result-new diff test-suite...ingleSource/UnitTests/vla.test 753.00 833.00 10.6% test-suite...marks/7zip/7zip-benchmark.test 1001697.00 966657.00 -3.5% test-suite...ngs-C/simulator/simulator.test 32369.00 32321.00 -0.1% test-suite...plications/d/make_dparser.test 89585.00 89505.00 -0.1% test-suite...ce/Applications/Burg/burg.test 40817.00 40785.00 -0.1% test-suite.../Applications/lemon/lemon.test 47281.00 47249.00 -0.1% test-suite...TimberWolfMC/timberwolfmc.test 250065.00 250113.00 0.0% test-suite...chmarks/MallocBench/gs/gs.test 149889.00 149873.00 -0.0% test-suite...ications/JM/lencod/lencod.test 769585.00 769569.00 -0.0% test-suite.../Benchmarks/Bullet/bullet.test 770049.00 770049.00 0.0% test-suite...HMARK_ANISTROPIC_DIFFUSION/128 NaN NaN nan% test-suite...HMARK_ANISTROPIC_DIFFUSION/256 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/64 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/32 NaN NaN nan% test-suite...ENCHMARK_BILATERAL_FILTER/64/4 NaN NaN nan% Geomean difference nan% result-old result-new diff count 1.000000e+01 10.00000 10.000000 mean 3.152090e+05 311695.40000 0.006749 std 3.790398e+05 372091.42232 0.036605 min 7.530000e+02 833.00000 -0.034981 25% 4.243300e+04 42401.00000 -0.000866 50% 1.197370e+05 119689.00000 -0.000392 75% 6.397050e+05 639705.00000 -0.000005 max 1.001697e+06 966657.00000 0.106242 ``` I don't have timings though. And now to the code. The basic idea is to completely replace the whole loop. If we can't fully kill it, don't transform. I have left one or two comments in the code, so hopefully it can be understood. Also, there is a few TODO's that i have left for follow-ups: * widening of `memcmp()`/`bcmp()` * step smaller than the comparison size * Metadata propagation * more than two blocks as long as there is still a single backedge? * ??? Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet Reviewed By: courbet Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists Tags: #llvm Differential Revision: https://reviews.llvm.org/D61144 llvm-svn: 370454	2019-08-30 09:51:23 +00:00
David Stenberg	b35d4699d0	[LiveDebugValues] Insert entry values after bundles Summary: Change LiveDebugValues so that it inserts entry values after the bundle which contains the clobbering instruction. Previously it would insert the debug value after the bundle head using insertAfter(), breaking the bundle. Reviewers: djtodoro, NikolaPrica, aprantl, vsk Reviewed By: vsk Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D66888 llvm-svn: 370448	2019-08-30 09:06:50 +00:00
Martin Storsjo	9438221785	[COFF] Add a ResourceSectionRef method for getting resource contents This allows llvm-readobj to print the contents of each resource when printing resources from an object file or executable, like it already does for plain .res files. This requires providing the whole COFFObjectFile to ResourceSectionRef. This supports both object files and executables. For executables, the DataRVA field is used as is to look up the right section. For object files, ideally we would need to complete linking of them and fix up all relocations to know what the DataRVA field would end up being. In practice, the only thing that makes sense for an RVA field is an ADDR32NB relocation. Thus, find a relocation pointing at this field, verify that it has the expected type, locate the symbol it points at, look up the section the symbol points at, and read from the right offset in that section. This works both for GNU windres object files (which use one single .rsrc section, with all relocations against the base of the .rsrc section, with the original value of the DataRVA field being the offset of the data from the beginning of the .rsrc section) and cvtres object files (with two separate .rsrc$01 and .rsrc$02 sections, and one symbol per data entry, with the original pre-relocated DataRVA field being set to zero). Differential Revision: https://reviews.llvm.org/D66820 llvm-svn: 370433	2019-08-30 06:55:49 +00:00
Petar Avramovic	e96892a8aa	[MIPS GlobalISel] Lower uitofp Add custom lowering for G_UITOFP for MIPS32. Differential Revision: https://reviews.llvm.org/D66930 llvm-svn: 370432	2019-08-30 05:51:12 +00:00
Petar Avramovic	6412b56513	[MIPS GlobalISel] Lower fptoui Add lower for G_FPTOUI. Algorithm is similar to the SDAG version in TargetLowering::expandFP_TO_UINT. Lower G_FPTOUI for MIPS32. Differential Revision: https://reviews.llvm.org/D66929 llvm-svn: 370431	2019-08-30 05:44:02 +00:00
Dan Gohman	8cfeeaf9de	[CodeGen] Fix lowering for returning the result of an extractvalue When the number of return values exceeds the number of registers available, SelectionDAGBuilder::visitRet transforms a function's return to use a pointer to a buffer to hold return values. When the returned value is an operator such as extractvalue, the value may have a non-zero result number. Add that number to the indexing when obtaining the values to store. This fixes https://bugs.llvm.org/show_bug.cgi?id=43132. Differential Revision: https://reviews.llvm.org/D66978 llvm-svn: 370430	2019-08-30 04:33:22 +00:00
Jinsong Ji	54a1ad5bd7	[PowerPC][NFC] Use -mtriple in RUN line, remove target triple in tls.ll To avoid confusion, especially when -mtriple are also added for PPC32. llvm-svn: 370427	2019-08-30 02:57:33 +00:00
Fangrui Song	7704b54389	[PPC32] Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO Unlike ppc64, which has ADDISgotTprelHA+LDgotTprelL pairs, ppc32 just uses LDgotTprelL32, so it does not make lots of sense to use _LO without a paired _HA. Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO to match GCC, and get better linker relocation check. Note, R_PPC_GOT_TPREL16_{HA,LO} don't have good linker support: (a) lld does not support R_PPC_GOT_TPREL16_{HA,LO}. (b) Top of tree ld.bfd does not support R_PPC_GOT_REL16_HA Initial-Exec -> Local-Exec relaxation: // a.o addis 3, 3, tsd_tls@got@tprel@ha lwz 3, tsd_tls@got@tprel@l(3) add 3, 3, tsd_tls@tls // b.o .section .tdata,"awT"; .globl tsd_tls; tsd_tls: // ld/ld-new a.o b.o internal error, aborting at ../../bfd/elf32-ppc.c:7952 in ppc_elf_relocate_section Reviewed By: adalava Differential Revision: https://reviews.llvm.org/D66925 llvm-svn: 370426	2019-08-30 02:20:49 +00:00
Dan Gohman	da84b688f9	[WebAssembly] Make __attribute__((used)) not imply export. Add an WASM_SYMBOL_NO_STRIP flag, so that __attribute__((used)) doesn't need to imply exporting. When targeting Emscripten, have WASM_SYMBOL_NO_STRIP imply exporting. Differential Revision: https://reviews.llvm.org/D62542 llvm-svn: 370415	2019-08-29 22:40:00 +00:00

... 4 5 6 7 8 ...

65071 Commits