llvm-project

Commit Graph

Author	SHA1	Message	Date
Qiu Chaofan	ebfbdebe96	[PowerPC] Fix store-fptoi combine of f128 on Power8 llc would crash for (store (fptosi-f128-i32)) when -mcpu=pwr8, we should not generate FP_TO_(S\|U)INT_IN_VSR for f128 types at this time. This patch fixes it. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D86686	2020-09-17 10:21:35 +08:00
Daniel Kiss	f70baaf71f	[AArch64] Add -mmark-bti-property flag. Writing the .note.gnu.property manually is error prone and hard to maintain in the assembly files. The -mmark-bti-property is for the assembler to emit the section with the GNU_PROPERTY_AARCH64_FEATURE_1_BTI. To be used when C/C++ is compiled with -mbranch-protection=bti. This patch refactors the .note.gnu.property handling. Reviewed By: chill, nickdesaulniers Differential Revision: https://reviews.llvm.org/D81930 Reland with test dependency on aarch64 target.	2020-09-17 01:18:36 +02:00
Daniel Kiss	60e244f82c	Revert "[AArch64] Add -mmark-bti-property flag." This reverts commit `95e43f84b7`.	2020-09-17 01:17:23 +02:00
Daniel Kiss	95e43f84b7	[AArch64] Add -mmark-bti-property flag. Writing the .note.gnu.property manually is error prone and hard to maintain in the assembly files. The -mmark-bti-property is for the assembler to emit the section with the GNU_PROPERTY_AARCH64_FEATURE_1_BTI. To be used when C/C++ is compiled with -mbranch-protection=bti. This patch refactors the .note.gnu.property handling. Reviewed By: chill, nickdesaulniers Differential Revision: https://reviews.llvm.org/D81930	2020-09-17 00:24:14 +02:00
Stanislav Mekhanoshin	91f503c3af	[AMDGPU] gfx1030 RT support Differential Revision: https://reviews.llvm.org/D87782	2020-09-16 11:40:58 -07:00
Amara Emerson	6ad33d8360	[AArch64][GlobalISel] Make G_BUILD_VECTOR os <16 x s8> legal.	2020-09-16 11:19:47 -07:00
Matt Arsenault	367248956e	AMDGPU: Clear offset register when using local stack area eliminateFrameIndex won't fix up the offset register when the direct frame index reference is moved to a separate move instruction. Switch the offset to a base 0 (which it probably should be to begin with).	2020-09-16 12:56:40 -04:00
Sjoerd Meijer	b5c3efeb7b	[ARM][MVE] Tail-predication: predicate new elementcount checks on force-enabled Additional sanity checks were added to get.active.lane.mask's second argument, the loop tripcount/elementcount, in rG635b87511ec3. Like the other (overflow) checks, skip this if tail-predication is forced. Differential Revision: https://reviews.llvm.org/D87769	2020-09-16 17:05:14 +01:00
Jay Foad	cb64455faa	[AMDGPU] Remove obsolete comment Obsoleted by `e4464bf3d4` "AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTOR"	2020-09-16 17:03:55 +01:00
Dmitry Preobrazhensky	06d058afec	[AMDGPU] Corrected directive to use for ELF weak refs WeakRefDirective should specify a directive to declare "a global as being a weak undefined symbol". The directive used by AMDGPU was incorrect - ".weakref" was intended for other purposes. The correct directive is ".weak" and it is already defined as default for ELF. So the redefinition was removed. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D87762	2020-09-16 18:51:26 +03:00
Simon Pilgrim	b2c931eff3	[X86] EmitInstrWithCustomInserter - remove redundant getDebugLoc() calls. NFCI. Use the same DebugLoc that is called at the top of the method. Fixes some Wshadow static analyzer warnings.	2020-09-16 16:29:56 +01:00
Mircea Trofin	6e85c3d5c7	[NFC][Regalloc] accessors for 'reg' and 'weight' Also renamed the fields to follow style guidelines. Accessors help with readability - weight mutation, in particular, is easier to follow this way. Differential Revision: https://reviews.llvm.org/D87725	2020-09-16 08:28:57 -07:00
Matt Arsenault	71131db689	AMDGPU: Improve <2 x i24> arguments and return value handling This was asserting for GlobalISel. For SelectionDAG, this was passing this on the stack. Instead, scalarize this as if it were a 32-bit vector.	2020-09-16 11:21:56 -04:00
Sebastian Neubauer	833b3b0d3a	[AMDGPU] Add v3f16/v3i16 support to SDag Fix lowering and instruction selection for v3x16 types and enable InstCombine to emit them. This patch only implements it for the selection dag. GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work. Differential Revision: https://reviews.llvm.org/D84420	2020-09-16 17:20:27 +02:00
Simon Pilgrim	cd46151202	[X86] Assert that we've found a terminator instruction. NFCI. Fixes clang static analayzer null dereference warning.	2020-09-16 16:17:49 +01:00
Jay Foad	90777e2924	[AMDGPU] Enable scheduling around FP MODE-setting instructions Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is marked as having unmodeled side effects, which makes the machine scheduler treat it as a barrier. Now that we have proper implicit $mode operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead for setregs that only touch the FP MODE bits, to give the scheduler more freedom. Differential Revision: https://reviews.llvm.org/D87446	2020-09-16 16:10:47 +01:00
Simon Pilgrim	aa4b0b755a	[X86][SSE] Move VZEXT_MOVL(INSERT_SUBVECTOR(UNDEF,X,0)) handling into combineTargetShuffle. Now that we're getting better at combining shuffles of different vector widths, this can now be performed as part of the standard target shuffle combines and isn't required for cleanup. Exposed a minor issue in combineX86ShufflesRecursively where we failed to check if a shuffle's src ops were simple types.	2020-09-16 16:08:31 +01:00
Sam Parker	3ce9ec0cfa	[ARM] Reorder some logic Re-order some checks in ValidateMVEInst.	2020-09-16 13:39:22 +01:00
Sam Parker	a63b2a4614	[ARM] Fix tail predication predicate tracking Clear the CurrentPredicate when we find an instruction which would completely overwrite the VPR. This fix essentially means we're back to not really being able to handle VPT instructions when tail predicating. Differential Revision: https://reviews.llvm.org/D87610	2020-09-16 11:59:29 +01:00
Sam Parker	86172ce378	[ARM] Add more validForTailPredication Modify the unit test to inspect all MVE instructions and mark the load/store/move of vpr/p0 as valid, as well as the remaining scalar shifts. Differential Revision: https://reviews.llvm.org/D87753	2020-09-16 11:51:50 +01:00
Sam Tebbs	ef0b9f3307	[ARM][LowOverheadLoops] Combine a VCMP and VPST into a VPT This patch combines a VCMP followed by a VPST into a VPT, which has the same semantics as the combination of the former two.	2020-09-16 09:27:10 +01:00
Yvan Roux	070b96962f	[ARM][MachineOutliner] Add calls handling. Handles calls inside outlined regions, by saving and restoring the link register. Differential Revision: https://reviews.llvm.org/D87136	2020-09-16 09:54:26 +02:00
Craig Topper	41f4cd60d5	[X86] Don't scalarize gather/scatters with non-power of 2 element counts. Widen instead. We can pad the mask with zeros in order to widen. We already do this for power 2 types that are smaller than a legal type.	2020-09-15 23:22:53 -07:00
Craig Topper	2ce1a697f0	[X86] Always use 16-bit displacement in 16-bit mode when there is no base or index register. Previously we only did this if the immediate fit in 16 bits, but the GNU assembler seems to just truncate. Fixes PR46952	2020-09-15 19:31:48 -07:00
Krzysztof Parzyszek	5f4abb7fab	[Hexagon] Replace incorrect pattern for vpackl HWI32 -> HVi8 V6_vdealb4w is not correct for pairs, use V6_vpackeh/V6_vpackeb instead.	2020-09-15 20:34:50 -05:00
Jessica Paquette	ffe9986de4	[AArch64][GlobalISel] Refactor + improve CMN, ADDS, and ADD emit functions These functions were extremely similar: - `emitADD` - `emitADDS` - `emitCMN` Refactor them a little, introducing a more generic `emitInstr` function to do most of the work. Also add support for the immediate + shifted register addressing modes in each of them. Update select-uaddo.mir to show that selecing ADDS now supports folding immediates + shifts. (I don't think this can impact CMN, because the CMN checks require a G_SUB with a non-constant on the RHS.) This is around a 0.02% code size improvement on CTMark at -O3. Differential Revision: https://reviews.llvm.org/D87529	2020-09-15 17:18:05 -07:00
Stanislav Mekhanoshin	277de43d88	[AMDGPU] Unify intrinsic ret/nortn interface We have a single noret intrinsic an a lot of special handling around it. Declare it just as any other but do not define rtn instructions itself instead. Differential Revision: https://reviews.llvm.org/D87719	2020-09-15 15:26:42 -07:00
Muhammad Asif Manzoor	d417488ef5	[AArch64][SVE] Add lowering for llvm fsqrt Add the functionality to lower fsqrt for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D87707	2020-09-15 15:26:17 -04:00
Albion Fung	05aa997d51	[PowerPC] Implement __int128 vector divide operations This patch implements __int128 vector divide operations for ISA3.1. Differential Revision: https://reviews.llvm.org/D85453	2020-09-15 15:19:35 -04:00
Craig Topper	05134877e6	[X86] Use Align in reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore. Correct pointer info. If we offset the pointer, we also need to offset the pointer info Differential Revision: https://reviews.llvm.org/D87593	2020-09-15 11:22:02 -07:00
Simon Pilgrim	a43e68b58b	[X86][AVX] lowerShuffleWithSHUFPS - handle missed canonicalization cases. PR47534 exposes a case where calling lowerShuffleWithSHUFPS directly from a derived repeated mask (found by is128BitLaneRepeatedShuffleMask) results in us using an non-canonicalized mask. The missed canonicalization in this case is trivial - just commute the mask so we have more (swapped) LHS than RHS references so lowerShuffleWithSHUFPS can handle it.	2020-09-15 17:31:08 +01:00
Simon Pilgrim	97a23ab28a	AMDGPUPrintfRuntimeBinding.cpp - drop unnecessary casts/dyn_casts. NFCI. GetElementPtrInst::Create returns a GetElementPtrInst* so we don't need to cast. Similarly IntegerType inherits from the Type base class. Also, I've used auto* in a few places to cleanup the code. Helps fix some clang-tidy warnings which saw the dyn_casts and warned that these can return null.	2020-09-15 14:49:04 +01:00
Sjoerd Meijer	635b87511e	[ARM][MVE] Tail-predication: use unsigned SCEV ranges for tripcount Loop tripcount expressions have a positive range, so use unsigned SCEV ranges for them. Differential Revision: https://reviews.llvm.org/D87608	2020-09-15 13:23:02 +01:00
Meera Nakrani	1119bf95be	[ARM] Corrected condition in isSaturatingConditional Fixed a small error in an if condition to prevent usat/ssat being generated if (upper constant + 1) is not a power of 2.	2020-09-15 10:14:30 +00:00
Sjoerd Meijer	b4b1b84106	[MVE] fix typo in llvm debug message. NFC.	2020-09-15 10:13:54 +01:00
Simon Pilgrim	fc446935d7	[X86] detectAVGPattern - accept non-pow2 vectors by padding. Drop the pow2 vector limitation for AVG generation by padding the vector to the next pow2, creating the PAVG nodes and then extracting the final subvector. Fixes some poor codegen that has been annoying me for years.....	2020-09-15 10:07:03 +01:00
Craig Topper	46673763fe	[X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract Fixes PR47482	2020-09-14 16:59:04 -07:00
Craig Topper	da1aaa0b70	Revert "[X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract." I got the bug number wrong. This reverts commit `3251593890`.	2020-09-14 16:58:57 -07:00
Philip Reames	e6bc7037d3	[AArch64] Statepoint support for AArch64. Differential Revision: https://reviews.llvm.org/D66012 Patch By: loicottet (with major rebase by me)	2020-09-14 16:43:08 -07:00
Craig Topper	3251593890	[X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract. Fixes PR47525	2020-09-14 16:28:37 -07:00
Krzysztof Parzyszek	bb877d1af2	[Hexagon] Widen loads and handle any-/sign-/zero-extensions	2020-09-14 18:10:23 -05:00
Krzysztof Parzyszek	6352381039	[Hexagon] Some HVX DAG combines 1. VINSERTW0 x, undef -> x 2. VROR (VROR x, a), b) -> VROR x, a+b	2020-09-14 18:10:23 -05:00
Craig Topper	c193a689b4	[SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore. The versions that take 'unsigned' will be removed in the future. I tried to use getOriginalAlign instead of getAlign in some places. getAlign factors in the minimum alignment implied by the offset in the pointer info. Since we're also passing the pointer info we can use the original alignment. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87592	2020-09-14 13:54:50 -07:00
Austin Kerbow	f859c30ecb	[AMDGPU] Add XDL resource to scheduling model Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D87621	2020-09-14 13:48:54 -07:00
Kamau Bridgeman	c0f199e566	[PowerPC] Implement Thread Local Storage Support for Local Exec This patch is the initial support for the Local Exec Thread Local Storage model to produce code sequence and relocations correct to the ABI for the model when using PC relative memory operations. Patch by: Kamau Bridgeman Differential Revision: https://reviews.llvm.org/D83404	2020-09-14 14:16:28 -05:00
Nikita Popov	53f36f06af	[Legalize][ARM][X86] Add float legalization for VECREDUCE This adds SoftenFloatRes, PromoteFloatRes and SoftPromoteHalfRes legalizations for VECREDUCE, to fill the remaining hole in the SDAG legalization. These legalizations simply expand the reduction and let it be recursively legalized. For the PromoteFloatRes case at least it is possible to do better than that, but it's pretty tricky (because we need to consider the interaction of three different vector legalizations and the type promotion) and probably not really worthwhile. I haven't added ExpandFloatRes support, as I am not familiar with ppc_fp128. Differential Revision: https://reviews.llvm.org/D87569	2020-09-14 20:42:09 +02:00
Eric Astor	23a2b03221	[ms] [llvm-ml] Add basic support for SEH, including PROC FRAME Add basic support for SEH, including PROC FRAME Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D86948	2020-09-14 14:32:55 -04:00
Eric Astor	20201dc76a	[ms] [llvm-ml] Add support for size queries in MASM Add support for size inference, sizeof, typeof, and lengthof. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D86947	2020-09-14 14:27:06 -04:00
jasonliu	9868ea764f	[XCOFF][AIX] Handle TOC entries that could not be reached by positive range in small code model Summary: In small code model, AIX assembler could not deal with labels that could not be reached within the [-0x8000, 0x8000) range from TOC base. So when generating the assembly, we would need to help the assembler by subtracting an offset from the label to keep the actual value within [-0x8000, 0x8000). Reviewed By: hubert.reinterpretcast, Xiangling_L Differential Revision: https://reviews.llvm.org/D86879	2020-09-14 13:41:34 +00:00
Jay Foad	c799f873cb	[AMDGPU] Don't cluster stores Clustering loads has caching benefits, but as far as I know there is no advantage to clustering stores on any AMDGPU subtargets. The disadvantage is that it tends to increase register pressure and restricts scheduling freedom. Differential Revision: https://reviews.llvm.org/D85530	2020-09-14 13:40:17 +01:00
Simon Pilgrim	98eaacd73d	Assert we've found both vector types. NFCI. Fixes clang static analyzer warning about potential null dereferences.	2020-09-14 13:24:17 +01:00
Simon Pilgrim	7109fc9e42	Don't dereference from a dyn_cast<>. NFCI. Use cast<> instead which will assert if it fails and not just return null. Fixes clang static analyzer warning.	2020-09-14 13:05:17 +01:00
Meera Nakrani	dd519bf0b0	[ARM] Selects SSAT/USAT from correct LLVM IR LLVM will canonicalize conditional selectors to a different pattern than the old code that was used. This is updating the function to match the new expected patterns and select SSAT or USAT when successful. Tests have also been updated to use the new patterns. Differential Review: https://reviews.llvm.org/D87379	2020-09-14 10:58:21 +00:00
Sjoerd Meijer	676febc044	[ARM][MVE] Tail-predication: check get.active.lane.mask's TC value This adds additional checks for the original scalar loop tripcount value, i.e. get.active.lane.mask second argument, and perform several sanity checks to see if it is of the form that we expect similarly like we already do for the IV which is the first argument of get.active.lane. Differential Revision: https://reviews.llvm.org/D86074	2020-09-14 11:32:15 +01:00
Petar Avramovic	6e2a86ed5a	AMDGPU/GlobalISel Check for NoNaNsFPMath in isKnownNeverSNaN Check for NoNaNsFPMath function attribute in isKnownNeverSNaN. Function attributes are in held in 'TargetMachine.Options'. Among other things, this allows selection of some patterns imported in D87351 since G_FCANONICALIZE is not generated when isKnownNeverSNaN returns true in lowerFMinNumMaxNum. However we notice some incorrect results since function attributes are not correctly written in TargetMachine.Options when next function is processed. Take a look at @v_test_no_global_nnans_med3_f32_pat0_srcmod0, it has "no-nans-fp-math"="false" but TargetMachine.Options still has it set to true since first function in test file had this attribute set to true. This will be fixed in D87511. Differential Revision: https://reviews.llvm.org/D87456	2020-09-14 12:11:00 +02:00
Petar Avramovic	09b8871f8d	AMDGPU/GlobalISel/Emitter Support for predicate code that uses operands Predicates with 'let PredicateCodeUsesOperands = 1' want to examine matched operands. When we encounter predicate code that uses operands, analyze its named operand arguments and create a map between argument index and name. Later, when leaf node with name is encountered, emit GIM_RecordNamedOperand that will store that operand at its argument index in operand list. This operand list will be an argument to c++ code of the predicate. Differential Revision: https://reviews.llvm.org/D87285	2020-09-14 10:39:56 +02:00
Simon Wallis	4946802c5f	[ARM] Fix so immediates and pc relative checks Treating an SoImm offset as a multiple of 4 between -1020 and 1020 mis-handles the second of a pair of 16-bit constants where the offset is a multiple of 2 but not a multiple of 4, leading to an LLVM ERROR: out of range pc-relative fixup value For 32-bit and larger (64-bit) constants, continue to treat an SoImm offset as a multiple of 4 between -1020 and 1020. For smaller (16-bit) constants, treat an SoImm offset as a multiple of 1 between -255 and 255. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86949	2020-09-14 08:52:59 +01:00
David Blaikie	ce89eeee16	PPCInstrInfo: Fix readability-inconsistent-declaration-parameter-name clang-tidy warning Reduces the chance of confusion when calling the function with autocomplete (will show the more accurate/informative variable name), etc.	2020-09-13 13:08:17 -07:00
Qiu Chaofan	bec81dc67d	Reland "[PowerPC] Implement instruction clustering for stores" Commit `3c0b3250` introduced store fusion for PowerPC target, but it brought failure under UB sanitizer and was reverted. This patch fixes them.	2020-09-13 19:51:01 +08:00
Craig Topper	758732a34e	[X86] Use ISD::PARITY directly instead of emitting CTPOP and AND from combineHorizontalPredicateResult. We have a PARITY ISD node now so might as well use it. It will get re-expanded later.	2020-09-12 20:01:17 -07:00
Krzysztof Parzyszek	9d300bc8d2	[Hexagon] Avoid widening vectors with non-HVX element types	2020-09-12 20:26:54 -05:00
Craig Topper	ad3d6f993d	[SelectionDAG][X86][ARM][AArch64] Add ISD opcode for __builtin_parity. Expand it to shifts and xors. Clang emits (and (ctpop X), 1) for __builtin_parity. If ctpop isn't natively supported by the target, this leads to poor codegen due to the expansion of ctpop being more complex than what is needed for parity. This adds a DAG combine to convert the pattern to ISD::PARITY before operation legalization. Type legalization is updated to handled Expanding and Promoting this operation. If after type legalization, CTPOP is supported for this type, LegalizeDAG will turn it back into CTPOP+AND. Otherwise LegalizeDAG will emit a series of shifts and xors followed by an AND with 1. I've avoided vectors in this patch to avoid more legalization complexity for this patch. X86 previously had a custom DAG combiner for this. This is now moved to Custom lowering for the new opcode. There is a minor regression in vector-reduce-xor-bool.ll, but a follow up patch can easily fix that. Fixes PR47433 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87209	2020-09-12 11:42:18 -07:00
David Green	74760bb00f	[LV][ARM] Add preferInloopReduction target hook. This allows the backend to tell the vectorizer to produce inloop reductions through a TTI hook. For the moment on ARM under MVE this means allowing integer add reductions of the correct size. In the future this can include integer min/max too, under -Os. Differential Revision: https://reviews.llvm.org/D75512	2020-09-12 17:47:04 +01:00
Simon Pilgrim	3170d54842	[InstCombine][X86] Covert masked load/stores with (sign extended) bool vector masks to generic intrinsics. As detailed on PR11210, if the mask is known to come from a (sign extended) bool vector (e.g. comparisons) then we can represent with a generic masked load/store without losing anything. We already do something similar for BLENDV -> SELECT conversion.	2020-09-12 15:09:28 +01:00
Evgeny Leviant	2e61cd1295	[MachineScheduler] Fix operand scheduling for pre/post-increment loads Differential revision: https://reviews.llvm.org/D87557	2020-09-12 16:53:12 +03:00
David Green	6cfd38d03d	[ARM] Fixup single source mla reductions. This fixes a complication on top of D87276. If we are sign extending around a mul with the two operands that are the same, instcombine will helpfully convert one of the sext to a zext. Reverse that so that we again generate a reduction. Differnetial Revision: https://reviews.llvm.org/D87287	2020-09-12 14:31:26 +01:00
Sanjay Patel	3a8ea8609b	[Intrinsics] define semantics for experimental fmax/fmin vector reductions As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html This is hopefully the final remaining showstopper before we can remove the 'experimental' from the reduction intrinsics. No behavior was specified for the FP min/max reductions, so we have a mess of different interpretations. There are a few potential options for the semantics of these max/min ops. I think this is the simplest based on current behavior/implementation: make the reductions inherit from the existing llvm.maxnum/minnum intrinsics. These correspond to libm fmax/fmin, and those are similar to the (now deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing data). So the default expansion creates calls to libm functions. Another option would be to inherit from llvm.maximum/minimum (NaNs propagate), but most targets just crash in codegen when given those nodes because no default expansion was ever implemented AFAICT. We could also just assume 'nnan' semantics by default (we are already assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets (AArch64, PowerPC) support the more defined behavior, so it doesn't make much sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to loosen the semantics. (Note that D67507 was proposed to update the LangRef to acknowledge the more recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do update based on the new standard, the reduction instructions can seamlessly inherit from whatever updates are made to the max/min intrinsics.) x86 sees a regression here on 'nnan' tests because we have underlying, longstanding bugs in FMF creation/propagation. Those need to be fixed apart from this change (for example: https://llvm.org/PR35538). The expansion sequence before this patch may not have been correct. Differential Revision: https://reviews.llvm.org/D87391	2020-09-12 09:10:28 -04:00
Simon Pilgrim	50ee0b99ec	[InstCombine][X86] getNegativeIsTrueBoolVec - use ConstantExpr evaluators. NFCI. Don't do this manually, we can just use the ConstantExpr evaluators to do it more tidily for us.	2020-09-12 13:58:58 +01:00
David Green	c437446d90	[ARM] Recognize "double extend" reduction patterns We can sometimes get code that does: xe = zext i16 x to i32 ye = zext i16 y to i32 m = mul i32 xe, ye me = zext i32 m to i64 r = vecreduce.add(me) This "double extend" can trip up the reduction identification, but should give identical results. This extends the pattern matching to handle them. Differential Revision: https://reviews.llvm.org/D87276	2020-09-12 13:51:42 +01:00
Simon Pilgrim	35dc91aee2	[X86][SSE] lowerShuffleAsDecomposedShuffleBlend - support decomposed unpacks for some vXi8/vXi16 cases Follow up to D86429 to handle the remaining regressions. This patch generalizes lowerShuffleAsDecomposedShuffleBlend to lowerShuffleAsDecomposedShuffleMerge, and attempts to use an UNPCKL shuffle mask instead of a blend for the cases where the inputs are coming from alternating vXi8/vXi16 sources. Technically they don't have to be alternating (just as long as they can fit into a lower lane half for the unpack) but I didn't find as many general cases and it needed a lot more of the function to be altered. For vXi32/vXi64 cases this could still be beneficial but in most cases the existing permute+blend approach was better. Differential Revision: https://reviews.llvm.org/D87405	2020-09-12 13:39:33 +01:00
QingShan Zhang	0680a3d56d	[Power10] Enable the heuristic for Power10 and switch the sched model with P9 Model Enable the pre-ra and post-ra scheduler strategy for Power10 as we want to customize the heuristic later. And switch the scheduler model with P9 model before P10 Model is available. The NoSchedModel is modelled as in-order cpu and the pre-ra scheduler is not bi-directional which will have big impact on the scheduler. Reviewed By: jji Differential Revision: https://reviews.llvm.org/D86865	2020-09-12 02:49:47 +00:00
QingShan Zhang	528554c39b	[PowerPC] Set the mayRaiseFPException for FCMPUS/FCMPUD From ISA, fcmpu will raise the Floating-Point Invalid Operation Exception (SNaN) if either of the operands is a Signaling NaN by setting the bit VXSNAN. But the instruction description didn't set the mayRaiseFPException which might have impact on the scheduling or some backend optimization. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D83937	2020-09-12 02:42:22 +00:00
Sam Clegg	fa2a8acc71	[WebAssembly] Add assembly syntax for mutable globals This adds and optional ", immutable" to the end of a `.globaltype` declaration. I would have prefered to match the `.wat` syntax where immutable is the default and `mut` is the signifier for mutable globals. Sadly changing the default would break backwards compat with existing assembly in the wild so I think its best to stick with this approach. Differential Revision: https://reviews.llvm.org/D87515	2020-09-11 11:11:02 -07:00
Simon Pilgrim	70a05ee288	[X86] Keep variables from getDataLayout/getDebugLoc calls as const reference. NFCI. These are only ever used as references in the called functions, so just pass the original reference instead of copying it.	2020-09-11 10:44:42 +01:00
Martin Storsjö	700fbe591a	[MC] [Win64EH] Canonicalize ARM64 unwind opcodes Convert 2-byte opcodes to equivalent 1-byte ones. Adjust the existing exhaustive testcase to avoid being altered by the simplification rules (to keep that test exercising all individual opcodes). Fix the assembler parser limits for register pairs; for .seh_save_regp and .seh_save_regp_x, we can allow up to x29, for a x29+x30 pair (which gets remapped to the UOP_SaveFPLR(X) opcodes), for .seh_save_fregp and .seh_save_fregpx, allow up to d14+d15. Not creating .seh_save_next for float register pairs, as the actual unwinder implementation in current versions of Windows is buggy for that case. This gives a minimal but measurable size reduction. (For a 6.5 MB DLL with 300 KB .xdata, the .xdata shrinks by 48 bytes. The opcode sequences are padded to a 4 byte boundary, so very small improvements might not end up mattering directly.) Differential Revision: https://reviews.llvm.org/D87367	2020-09-11 10:31:04 +03:00
Zarko Todorovski	035396197a	Remove unused variable introduce in `0448d11a06` causing build failures with -Werror on.	2020-09-10 20:07:11 -04:00
Amara Emerson	0448d11a06	[AArch64][GlobalISel] Don't emit a branch for a fallthrough G_BR at -O0. With optimizations we leave the decision to eliminate fallthrough branches to bock placement, but at -O0 we should do it in the selector to save code size. This regressed -O0 with a recent change to a combiner.	2020-09-10 15:01:26 -07:00
Krzysztof Parzyszek	783e28a508	[Hexagon] Split pair-based masked memops	2020-09-10 14:24:42 -05:00
Dominic Chen	4252f3009b	[WebAssembly] Set unreachable as canonical to permit disassembly Currently, using llvm-objdump to disassemble a function containing unreachable will trigger an assertion while decoding the opcode, since both unreachable and debug_unreachable have the same encoding. To avoid this, set unreachable as the canonical decoding. Differential Revision: https://reviews.llvm.org/D87431	2020-09-10 15:04:16 -04:00
Anna Thomas	46329f6079	[ImplicitNullCheck] Handle instructions that preserve zero value This is the first in a series of patches to make implicit null checks more general. This patch identifies instructions that preserves zero value of a register and considers that as a valid instruction to hoist along with the faulting load. See added testcases. Reviewed-By: reames, dantrushin Differential Revision: https://reviews.llvm.org/D87108	2020-09-10 13:39:50 -04:00
Kit Barton	009cd4e491	[PPC][GlobalISel] Add initial GlobalIsel infrastructure This adds the initial GlobalISel skeleton for PowerPC. It can only run ir-translator and legalizer for `ret void`. This is largely based on the initial GlobalISel patch for RISCV (https://reviews.llvm.org/D65219). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D83100	2020-09-10 11:58:01 -05:00
Owen Anderson	3d9c85e4d8	Mark FMOV constant materialization as being as cheap as a move. This prevents us from doing things like LICM'ing it out of a loop, which is usually a net loss because we end up having to spill a callee-saved FPR to accomodate it. This does perturb instruction scheduling around this instruction, so a number of tests had to be updated to account for it. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D87316	2020-09-10 16:38:59 +00:00
Simon Pilgrim	601557e9f9	Hexagon.h - remove unnecessary includes. NFCI. Replace with forward declarations and move includes to implicit dependent files.	2020-09-10 16:59:43 +01:00
Simon Pilgrim	b585fdae24	[X86] Use Register instead of unsigned. NFCI. Fixes llvm-prefer-register-over-unsigned clang-tidy warnings.	2020-09-10 16:05:33 +01:00
Simon Pilgrim	9f830e0af7	AArch64MachineFunctionInfo.h - remove unnecessary TargetFrameLowering.h include. NFCI.	2020-09-10 16:05:33 +01:00
Simon Pilgrim	de25ebaac6	[CostModel][X86] Add vXi32 division by uniform constant costs (PR47476) Other types can be handled in future patches but their uniform / non-uniform costs are more similar and don't appear to cause many vectorization issues.	2020-09-10 12:17:54 +01:00
Simon Pilgrim	fc49abee56	[X86][SSE] lowerShuffleAsSplitOrBlend always returns a shuffle. lowerShuffleAsSplitOrBlend always returns a target shuffle result (and is the default operation for lowering some shuffle types), so we don't need to check for null.	2020-09-10 11:45:08 +01:00
Simon Pilgrim	e80605e242	[X86] Remove WaitInsert::TTI member. NFCI. This is only ever set/used inside WaitInsert::runOnMachineFunction so don't bother storing it in the class.	2020-09-10 11:45:08 +01:00
Kerry McLaughlin	cd89f5c91b	[SVE][CodeGen] Legalisation of truncate for scalable vectors Truncating from an illegal SVE type to a legal type, e.g. `trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>` fails after PromoteIntOp_CONCAT_VECTORS attempts to create a BUILD_VECTOR. This patch changes the promote function to create a sequence of INSERT_SUBVECTORs if the return type is scalable, and replaces these with UNPK+UZP1 for AArch64. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86548	2020-09-10 11:35:33 +01:00
Sam Tebbs	b81c57d646	[ARM][LowOverheadLoops] Allow tail predication on predicated instructions with unknown lane values The effects of unpredicated vector instruction with unknown lanes cannot be predicted and therefore cannot be tail predicated. This does not apply to predicated vector instructions and so this patch allows tail predication on them. Differential Revision: https://reviews.llvm.org/D87376	2020-09-10 10:34:32 +01:00
Sam Parker	1919b65052	[ARM] Tail predicate VQDMULH and VQRDMULH Mark the family of instructions as valid for tail predication. Differential Revision: https://reviews.llvm.org/D87348	2020-09-10 08:20:07 +01:00
Qiu Chaofan	6afb279100	[PowerPC] [FPEnv] Disable strict FP mutation by default `22a0edd0` introduced a config IsStrictFPEnabled, which controls the strict floating point mutation (transforming some strict-fp operations into non-strict in ISel). This patch disables the mutation by default since we've finished PowerPC strict-fp enablement in backend. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D87222	2020-09-10 13:28:09 +08:00
Jordan Rupprecht	52f0837778	[NFC] Move definition of variable now only used in debug builds	2020-09-09 20:23:59 -07:00
Matt Arsenault	e15215e041	AMDGPU: Hoist check for VGPRs	2020-09-09 19:45:40 -04:00
Matt Arsenault	85490874b2	AMDGPU: Skip all meta instructions in hazard recognizer This was not adding a necessary nop due to thinking the kill counted.	2020-09-09 19:45:40 -04:00
Matt Arsenault	82cbc9330a	AMDGPU: Fix inserting waitcnts before kill uses	2020-09-09 19:45:40 -04:00
dfukalov	c259d3a061	[AMDGPU] Fix for folding v2.16 literals. It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are incorrectly processed so one of two packed values were lost. Introduced new function to check immediate 32-bit operand can be folded. Converted condition about current op_sel flags value to fall-through. Fixes: SWDEV-247595 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D87158	2020-09-10 01:39:25 +03:00
Jessica Paquette	480e7f43a2	[AArch64][GlobalISel] Share address mode selection code for memops We were missing support for the G_ADD_LOW + ADRP folding optimization in the manual selection code for G_LOAD, G_STORE, and G_ZEXTLOAD. As a result, we were missing cases like this: ``` @foo = external hidden global i32* define void @baz(i32* %0) { store i32* %0, i32** @foo ret void } ``` https://godbolt.org/z/16r7ad This functionality already existed in the addressing mode functions for the importer. So, this patch makes the manual selection code use `selectAddrModeIndexed` rather than duplicating work. This is a 0.2% geomean code size improvement for CTMark at -O3. There is one code size increase (0.1% on lencod) which is likely because `selectAddrModeIndexed` doesn't look through constants. Differential Revision: https://reviews.llvm.org/D87397	2020-09-09 15:14:46 -07:00
Amara Emerson	e5784ef8f6	[GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator. We weren't using this before, so none of the MachineFunction CFG edges had the branch probability information added. As a result, block placement later in the pipeline was flying blind. This is enabled only with optimizations enabled like SelectionDAG. Differential Revision: https://reviews.llvm.org/D86824	2020-09-09 14:31:12 -07:00
Amara Emerson	cc76da7ada	[GlobalISel] Rewrite the elide-br-by-swapping-icmp-ops combine to do less. This combine previously tried to take sequences like: %cond = G_ICMP pred, a, b G_BRCOND %cond, %truebb G_BR %falsebb %truebb: ... %falsebb: ... and by inverting the compare predicate and swapping branch targets, delete the G_BR and instead have a single conditional branch to the falsebb. Since in an earlier patch we have a combine to fold not(icmp) into just an inverted icmp, we don't need this combine to do as much. This patch instead generalizes the combine by just looking for: G_BRCOND %cond, %truebb G_BR %falsebb %truebb: ... %falsebb: ... and then inverting the condition using a not (xor). The xor can be folded away in a separate combine. This change also lets us avoid some optimization code in the IRTranslator. I also think that deleting G_BRs in the combiner is unnecessary. That's something that targets can decide to do at selection time and could simplify generic code in future. Differential Revision: https://reviews.llvm.org/D86664	2020-09-09 13:08:16 -07:00
Hiroshi Yamauchi	0ab6a15698	[X86] Add support for using fast short rep mov for memcpy lowering. Disabled by default behind an option. Differential Revision: https://reviews.llvm.org/D86883	2020-09-09 12:46:40 -07:00
Krzysztof Parzyszek	0ee54cf883	[Hexagon] Account for truncating pairs to non-pairs when widening truncates Added missing selection patterns for vpackl.	2020-09-09 14:31:52 -05:00
Simon Pilgrim	6e45b98934	X86CallFrameOptimization.cpp - use const references where possible. NFCI.	2020-09-09 16:35:08 +01:00
Simon Pilgrim	e706116e11	X86FrameLowering::adjustStackWithPops - cleanup auto usage. NFCI. Don't use auto for non-obvious types, and use const references.	2020-09-09 16:15:02 +01:00
Qiu Chaofan	88ff4d2ca1	[PowerPC] Fix STRICT_FRINT/STRICT_FNEARBYINT lowering In standard C library, both rint and nearbyint returns rounding result in current rounding mode. But nearbyint never raises inexact exception. On PowerPC, x(v\|s)r(d\|s)pic may modify FPSCR XX, raising inexact exception. So we can't select constrained fnearbyint into xvrdpic. One exception here is xsrqpi, which will not raise inexact exception, so fnearbyint f128 is okay here. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D87220	2020-09-09 22:40:58 +08:00
Jay Foad	649bde488c	[AMDGPU] Simplify S_SETREG_B32 case in EmitInstrWithCustomInserter NFC.	2020-09-09 15:18:31 +01:00
Dmitry Preobrazhensky	95b7040e43	[AMDGPU][MC] Improved diagnostic messages for invalid registers Corrected parser to issue meaningful error messages for invalid and malformed registers. See bug 41303: https://bugs.llvm.org/show_bug.cgi?id=41303 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D87234	2020-09-09 16:44:03 +03:00
Ronak Chauhan	f078577f31	Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors" This reverts commit `487a805310`. Tests fail on big endian machines.	2020-09-09 18:01:28 +05:30
Mirko Brkusanin	43af2a6faa	[AMDGPU] Workaround for LDS Misalignment bug on GFX10 Add subtarget feature check to avoid using ds_read/write_b96/128 with too low alignment if a bug is present on that specific hardware. Add this "feature" to GFX 10.1.1 as it is also affected. Add global-isel test.	2020-09-09 11:46:09 +02:00
Sam Parker	3ebc755227	[ARM] Try to rematerialize VCTP instructions We really want to try and avoid spilling P0, which can be difficult since there's only one register, so try to rematerialize any VCTP instructions. Differential Revision: https://reviews.llvm.org/D87280	2020-09-09 07:41:22 +01:00
Krzysztof Parzyszek	c2b7b9b642	[Hexagon] Fix order of operands in V6_vdealb4w	2020-09-08 22:09:28 -05:00
Brad Smith	88b368a1c4	[PowerPC] Set setMaxAtomicSizeInBitsSupported appropriately for 32-bit PowerPC in PPCTargetLowering Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D86165	2020-09-08 21:21:14 -04:00
Craig Topper	b1e68f885b	[SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.: This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130. This required adding a SDNodeFlags to SelectionDAG::getSetCC. Now we manage to contant fold some stuff undefs during the initial getNode that we don't do in later DAG combines. Differential Revision: https://reviews.llvm.org/D87200	2020-09-08 15:27:21 -07:00
Krzysztof Parzyszek	d183f47261	[Hexagon] Handle widening of truncation's operand with legal result Failing example: v8i8 = truncate v8i32. v8i8 is legal, but v8i32 was widened to HVX. Make sure that v8i8 does not get altered (even if it's changed to another legal type).	2020-09-08 16:07:39 -05:00
Simon Pilgrim	0dacf3b5ac	RISCVMatInt.h - remove unnecessary includes. NFCI. Add APInt forward declaration and move include to RISCVMatInt.cpp	2020-09-08 18:25:24 +01:00
Heejin Ahn	d25c17f317	[WebAssembly] Fix fixEndsAtEndOfFunction for try-catch When the function return type is non-void and `end` instructions are at the very end of a function, CFGStackify's `fixEndsAtEndOfFunction` function fixes the corresponding block/loop/try's type to match the function's return type. This is applied to consecutive `end` markers at the end of a function. For example, when the function return type is `i32`, ``` block i32 ;; return type is fixed to i32 ... loop i32 ;; return type is fixed to i32 ... end_loop end_block end_function ``` But try-catch is a little different, because it consists of two parts: a try part and a catch part, and both parts' return type should satisfy the function's return type. Which means, ``` try i32 ;; return type is fixed to i32 ... block i32 ;; this should be changed i32 too! ... end_block catch ... end_try end_function ``` As you can see in this example, it is not sufficient to only `end` instructions at the end of a function; in case of `try`, we should check instructions before `catch`es, in case their corresponding `try`'s type has been fixed. This changes `fixEndsAtEndOfFunction`'s algorithm to use a worklist that contains a reverse iterator, each of which is a starting point for a new backward `end` instruction search. Fixes https://bugs.llvm.org/show_bug.cgi?id=47413. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D87207	2020-09-08 09:27:40 -07:00
Ronak Chauhan	487a805310	[AMDGPU] Support disassembly for AMDGPU kernel descriptors Decode AMDGPU Kernel descriptors as assembler directives. Reviewed By: scott.linder, jhenderson, kzhuravl Differential Revision: https://reviews.llvm.org/D80713	2020-09-08 21:26:11 +05:30
Simon Pilgrim	fcff2c32c0	X86CallLowering.cpp - improve auto const/pointer/reference qualifiers. NFCI. Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.	2020-09-08 13:01:23 +01:00
Simon Pilgrim	0729ae367a	X86DomainReassignment.cpp - improve auto const/pointer/reference qualifiers. NFCI. Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.	2020-09-08 13:01:23 +01:00
Sam Tebbs	7aabb6ad77	[ARM][LowOverheadLoops] Remove modifications to the correct element count register After my patch at D86087, code that now uses the mov operand rather than the vctp operand will no longer remove modifications to the vctp operand as they should. This patch fixes that by explicitly removing modifications to the vctp operand rather than the register used as the element count.	2020-09-08 10:30:05 +01:00
Qiu Chaofan	8d9c13f37d	Revert "[PowerPC] Implement instruction clustering for stores" This reverts commit `3c0b325023`, (along with `ea795304` and `bb39eb9e`) since it breaks test with UB sanitizer.	2020-09-08 17:24:08 +08:00
Qiu Chaofan	bb39eb9e7f	[PowerPC] Fix getMemOperandWithOffsetWidth Commit `3c0b3250` introduced memory cluster under pwr10 target, but a check for operands was unexpectedly removed. This adds it back to avoid regression.	2020-09-08 15:35:25 +08:00
Simon Wallis	8ee1419ab6	[AARCH64][RegisterCoalescer] clang miscompiles zero-extension to long long Implement AArch64 variant of shouldCoalesce() to detect a known failing case and prevent the coalescing of a 32-bit copy into a 64-bit sign-extending load. Do not coalesce in the following case: COPY where source is bottom 32 bits of a 64-register, and destination is a 32-bit subregister of a 64-bit register, ie it causes the rest of the register to be implicitly set to zero. A mir test has been added. In the test case, the 32-bit copy implements a 32 to 64 bit zero extension and relies on the upper 32 bits being zeroed. Coalescing to the result of the 64-bit load meant overwriting the upper 32 bits incorrectly when the loaded byte was negative. Reviewed By: john.brawn Differential Revision: https://reviews.llvm.org/D85956	2020-09-08 08:04:52 +01:00
Mikael Holmen	ea795304ec	[PowerPC] Add parentheses to silence gcc warning Without gcc 7.4 warns with ../lib/Target/PowerPC/PPCInstrInfo.cpp:2284:25: warning: suggest parentheses around '&&' within '\|\|' [-Wparentheses] BaseOp1.isFI() && ~~~~~~~~~~~~~~~^~ "Only base registers and frame indices are supported."); ~	2020-09-08 08:39:57 +02:00
Qiu Chaofan	3c0b325023	[PowerPC] Implement instruction clustering for stores On Power10, it's profitable to schedule some stores with adjacent target address together. This patch implements this feature. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D86754	2020-09-08 11:03:09 +08:00
Roman Lebedev	bb7d3af113	Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline This was reverted in `503deec218` because it caused gigantic increase (3x) in branch mispredictions in certain benchmarks on certain CPU's, see https://reviews.llvm.org/D84108#2227365. It has since been investigated and here are the results: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html > It's an amazingly severe regression, but it's also all due to branch > mispredicts (about 3x without this). The code layout looks ok so there's > probably something else to deal with. I'm not sure there's anything we can > reasonably do so we'll just have to take the hit for now and wait for > another code reorganization to make the branch predictor a bit more happy :) > > Thanks for giving us some time to investigate and feel free to recommit > whenever you'd like. > > -eric So let's just reland this. Original commit message: I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108 This reverts commit `503deec218`.	2020-09-08 00:24:03 +03:00
Craig Topper	da79b1eecc	[SelectionDAG][X86][ARM] Teach ExpandIntRes_ABS to use sra+add+xor expansion when ADDCARRY is supported. Rather than using SELECT instructions, use SRA, UADDO/ADDCARRY and XORs to expand ABS. This is the multi-part version of the sequence we use in LegalizeDAG. It's also the same as the Custom sequence uses for i64 on 32-bit and i128 on 64-bit. So we can remove the X86 customization. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D87215	2020-09-07 13:15:26 -07:00
Craig Topper	01b3e16757	[X86] Use the same sequence for i128 ISD::ABS on 64-bit targets as we use for i64 on 32-bit targets. Differential Revision: https://reviews.llvm.org/D87214	2020-09-07 11:14:05 -07:00
Simon Pilgrim	4e89a0ab02	MipsISelLowering.h - remove CCState/CCValAssign forward declarations. NFCI. These are already defined in the CallingConvLower.h include.	2020-09-07 18:15:26 +01:00
Simon Pilgrim	95ca3aacf0	BTFDebug.h - reduce MachineInstr.h include to forward declaration. NFCI.	2020-09-07 17:51:13 +01:00
Simon Pilgrim	dfc333050b	LeonPasses.h - remove unnecessary includes. NFCI. Reduce to forward declarations and move includes to LeonPasses.cpp where necessary.	2020-09-07 17:51:12 +01:00
Simon Pilgrim	1c34ac03a2	LeonPasses.h - remove orphan function declarations. NFCI. The implementations no longer exist.	2020-09-07 17:51:12 +01:00
alex-t	2480a31e5d	[AMDGPU] SILowerControlFlow::optimizeEndCF should remove empty basic block optimizeEndCF removes EXEC restoring instruction case this instruction is the only one except the branch to the single successor and that successor contains EXEC mask restoring instruction that was lowered from END_CF belonging to IF_ELSE. As a result of such optimization we get the basic block with the only one instruction that is a branch to the single successor. In case the control flow can reach such an empty block from S_CBRANCH_EXEZ/EXECNZ it might happen that spill/reload instructions that were inserted later by register allocator are placed under exec == 0 condition and never execute. Removing empty block solves the problem. This change require further work to re-implement LIS updates. Recently, LIS is always nullptr in this pass. To enable it we need another patch to fix many places across the codegen. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D86634	2020-09-07 19:37:27 +03:00
Simon Pilgrim	9de0a3da6a	[X86][SSE] Don't use LowerVSETCCWithSUBUS for unsigned compare with +ve operands (PR47448) We already simplify the unsigned comparisons if we've found the operands are non-negative, but we were still calling LowerVSETCCWithSUBUS which resulted in the PR47448 regressions.	2020-09-07 16:11:40 +01:00
Simon Pilgrim	5bb27e735d	X86AvoidStoreForwardingBlocks.cpp - use unsigned for Opcode values. NFCI. Fixes clang-tidy cppcoreguidelines-narrowing-conversions warnings.	2020-09-07 12:56:27 +01:00
Simon Pilgrim	9b645ebfff	[X86][AVX] Use lowerShuffleWithPERMV in shuffle combining to support non-VLX targets lowerShuffleWithPERMV allows us to use the ZMM variants for 128/256-bit variable shuffles on non-VLX AVX512 targets. This is another step towards shuffle combining through between vector widths - we still end up with an annoying regression (combine_vpermilvar_vperm2f128_zero_8f32) but we're going in the right direction....	2020-09-07 12:50:50 +01:00
Benjamin Kramer	7ba0f81934	[X86] Unbreak the build after `22fa6b20d9`	2020-09-07 12:24:30 +02:00
Simon Pilgrim	71dfdbe2c7	[X86] getFauxShuffleMask - handle insert_subvector(zero, sub, C) Directly use SM_SentinelZero elements if we're (widening)inserting into a zero vector.	2020-09-07 11:10:40 +01:00
Simon Pilgrim	9ad261540d	[X86] Use Register instead of unsigned. NFCI. Fixes llvm-prefer-register-over-unsigned clang-tidy warnings.	2020-09-07 10:49:29 +01:00
Simon Pilgrim	22fa6b20d9	[X86] Use Register instead of unsigned. NFCI. Fixes llvm-prefer-register-over-unsigned clang-tidy warnings.	2020-09-07 10:38:09 +01:00
Simon Pilgrim	0dbe2504af	[X86] Use Register instead of unsigned. NFCI. Fixes llvm-prefer-register-over-unsigned clang-tidy warning.	2020-09-07 10:38:08 +01:00
Sam Parker	0af4147804	[ARM][CostModel] CodeSize costs for i1 arith ops When optimising for size, make the cost of i1 logical operations relatively expensive so that optimisations don't try to combine predicates. Differential Revision: https://reviews.llvm.org/D86525	2020-09-07 09:27:18 +01:00
Thomas Lively	caee15a0ed	[WebAssembly] Fix incorrect assumption of simple value types Fixes PR47375, in which an assertion was triggering because WebAssemblyTargetLowering::isVectorLoadExtDesirable was improperly assuming the use of simple value types. Differential Revision: https://reviews.llvm.org/D87110	2020-09-06 15:42:21 -07:00
Amy Kwan	efa57f9a7a	[PowerPC] Implement Vector Expand Mask builtins in LLVM/Clang This patch implements the vec_expandm function prototypes in altivec.h in order to utilize the vector expand with mask instructions introduced in Power10. Differential Revision: https://reviews.llvm.org/D82727	2020-09-06 17:13:21 -05:00
Simon Pilgrim	ecac5c2808	[X86][AVX] lowerShuffleWithPERMV - adjust binary shuffle masks to account for widening on non-VLX targets rGabd33bf5eff2 enabled us to pad 128/256-bit shuffles to 512-bit on non-VLX targets, but wasn't updating binary shuffles to account for the new vector width.	2020-09-06 14:52:25 +01:00
vnalamot	aff94ec0f4	[AMDGPU] Remove the dead spill slots while spilling FP/BP to memory During the PEI pass, the dead TargetStackID::SGPRSpill spill slots are not being removed while spilling the FP/BP to memory. Fixes: SWDEV-250393 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87032	2020-09-06 07:04:25 +05:30
Krzysztof Parzyszek	62f89a89f3	[Hexagon] Add assertions about V6_pred_scalar2	2020-09-05 18:20:23 -05:00
Krzysztof Parzyszek	9518f032e4	[Hexagon] When widening truncate result, also widen operand if necessary	2020-09-05 18:19:32 -05:00
Krzysztof Parzyszek	8789f2bbde	[Hexagon] Resize the mem operand when widening loads and stores	2020-09-05 18:17:48 -05:00
Krzysztof Parzyszek	1387f96ab3	[Hexagon] Handle widening of vector truncate	2020-09-05 15:07:38 -05:00

1 2 3 4 5 ...

59397 Commits