llvm-project

Commit Graph

Author	SHA1	Message	Date
Nick Lewycky	244d9d6e41	Verify the LLVMContext that an Attribute belongs to. Attributes don't know their parent Context, adding this would make Attribute larger. Instead, we add hasParentContext that answers whether this Attribute belongs to a particular LLVMContext by checking for itself inside the context's FoldingSet. Same with AttributeSet and AttributeList. The Verifier checks them with the Module context. Differential Revision: https://reviews.llvm.org/D99362	2021-04-16 09:44:38 -07:00
Malhar Jajoo	093f1828e5	[ARM] Prevent phi-node-elimination from generating copy above t2WhileLoopStartLR This patch prevents phi-node-elimination from generating a COPY operation for the register defined by t2WhileLoopStartLR, as it is a terminator that defines a value. This happens because of the presence of phi-nodes in the loop body (the Preheader of which is the block containing the t2WhileLoopStartLR). If this is not done, the COPY is generated above/before the terminator (t2WhileLoopStartLR here), and since it uses the value defined by t2WhileLoopStartLR, MachineVerifier throws a 'use before define' error. This essentially adds on to the change in differential D91887/D97729. Differential Revision: https://reviews.llvm.org/D100376	2021-04-16 16:45:07 +01:00
Mats Petersson	517c3aee4d	[OpenMP IRBuilder, MLIR] Add support for OpenMP do schedule dynamic The implementation supports static schedule for Fortran do loops. This implements the dynamic variant of the same concept. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D97393	2021-04-16 16:09:49 +01:00
Jonathan Crowther	e71994a239	[SystemZ][z/OS] Add IsText Argument to GetFile and GetFileOrSTDIN Add the `IsText` argument to `GetFile` and `GetFileOrSTDIN` which will help z/OS distinguish between text and binary correctly. This is an extension to [this patch](https://reviews.llvm.org/D97785) Reviewed By: abhina.sreeskantharajan, amccarth Differential Revision: https://reviews.llvm.org/D100488	2021-04-16 10:08:36 -04:00
Sanjay Patel	bb907b26e2	[ValueTracking] don't recursively compute known bits using multiple llvm.assumes This is an alternative to D99759 to avoid the compile-time explosion seen in: https://llvm.org/PR49785 Another potential solution would make the exclusion logic stronger to avoid blowing up, but note that we reduced the complexity of the exclusion mechanism in D16204 because it was too costly. So I'm questioning the need for recursion/exclusion entirely - what is the optimization value vs. cost of recursively computing known bits based on assumptions? This was built into the implementation from the start with `60db058`, and we have kept adding code/cost to deal with that capability. By clearing the query's AssumptionCache inside computeKnownBitsFromAssume(), this patch retains all existing assume functionality except refining known bits based on even more assumptions. We have 1 regression test that shows a difference in optimization power. Differential Revision: https://reviews.llvm.org/D100573	2021-04-16 08:43:35 -04:00
Roman Lebedev	b06c55a698	[X86][CostModel] Fix cost model for non-power-of-two vector load/stores Sometimes LV has to produce really wide vectors, and sometimes they end up being not powers of two. As it can be seen from the diff, the cost computation is currently completely non-sensical in those cases. Instead of just scalarizing everything, split/factorize the wide vector into a number of subvectors, each one having a power-of-two elements, recurse to get the cost of op on this subvector. Also, check how we'd legalize this subvector, and if the legalized type is scalar, also account for the scalarization cost. Note that for sub-vector loads, we might be able to do better, when the vectors are properly aligned. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100099	2021-04-16 15:30:57 +03:00
Abhina Sreeskantharajan	3be2ba0ba3	[SystemZ][z/OS][Windows] Add new functions that set Text/Binary mode for Stdin and Stdout based on OpenFlags On Windows, we want to open a file in Binary mode if OF_CRLF bit is not set. On z/OS, we want to open a file in Binary mode if the OF_Text bit is not set. This patch creates two new functions called ChangeStdinMode and ChangeStdoutMode which will take OpenFlags as an arg to determine which mode to set stdin and stdout to. This will enable patches like https://reviews.llvm.org/D100056 to not affect Windows when setting the OF_Text flag for raw_fd_streams. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D100130	2021-04-16 08:09:19 -04:00
David Green	00a6045473	[ARM] Combine sub 0, csinc X, Y, CC -> csinv -X, Y, CC Combine sub 0, csinc X, Y, CC to csinv -X, Y, CC providing that the negation of X is cheap, currently just handling constants. This comes up during the splat of an i1 to a predicate, where we now generate csetm, as opposed to cset; rsb. Differential Revision: https://reviews.llvm.org/D99940	2021-04-16 11:52:31 +01:00
Nick Desaulniers	bb7016f8f5	[Aarch64] handle "o" inline asm memory constraints This Linux kernel is making use of this inline asm constraint which is causing an ICE. PR49956 Link: https://github.com/ClangBuiltLinux/linux/issues/1348 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D100412	2021-04-15 23:36:21 -07:00
Jim Lin	2893570e86	[RISCV] Don't emit save-restore call if function is a interrupt handler It has to save all caller-saved registers before a call in the handler. So don't emit a call that save/restore registers. Reviewed By: simoncook, luismarques, asb Differential Revision: https://reviews.llvm.org/D100532	2021-04-16 12:54:47 +08:00
hsmahesha	099dcb68a6	[AMDGPU] Refactor ds_read/ds_write related select code for better readability. Part of the code related to ds_read/ds_write ISel is refactored, and the corresponding comment is re-written for better readability, which would help while implementing any future ds_read/ds_write ISel related modifications. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100300	2021-04-16 08:24:00 +05:30
Mircea Trofin	0d06b14f59	[MLGO] Fix use of AM.invalidate post D100519 The ML inline advisors more aggressively invalidate certain analyses after each call site inlining, to more accurately capture the problem state.	2021-04-15 18:45:39 -07:00
Marcythm	f8cf3b9931	[LICM][NFC] Fix typo fixed some typos which may lead to misunderstandings in LICM.cpp Reviewed By: nikic, asbirlea Differential Revision: https://reviews.llvm.org/D100470	2021-04-16 09:42:00 +08:00
Arthur Eubanks	9c776c2fa2	[NFC][NewPM] Remove some AnalysisManager invalidate methods These were misleading, they're more of a "clear" than an "invalidate". We shouldn't be individually clearing analysis results. Either we clear all analyses when some IR becomes invalid, or we properly go through invalidation. There was only one use of this, which can be simulated with AM.invalidate(F, PA). Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D100519	2021-04-15 16:51:26 -07:00
Florian Hahn	3e7ee5428d	[InferAttrs] Do not mark first argument of str(n)cat as writeonly. str(n)cat appends a copy of the second argument to the end of the first argument. To find the end of the first argument, str(n)cat has to read from it until it finds the terminating 0. So it should not be marked as writeonly. I think this means the argument should not be marked as writeonly. (This is causing a mis-compile with legacy DSE, before it got removed) Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D100601	2021-04-15 23:00:21 +01:00
Momchil Velikov	f9d932e673	[clang][AArch64] Correctly align HFA arguments when passed on the stack When we pass a AArch64 Homogeneous Floating-Point Aggregate (HFA) argument with increased alignment requirements, for example struct S { __attribute__ ((__aligned__(16))) double v[4]; }; Clang uses `[4 x double]` for the parameter, which is passed on the stack at alignment 8, whereas it should be at alignment 16, following Rule C.4 in AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules) Currently we don't have a way to express in LLVM IR the alignment requirements of the function arguments. The align attribute is applicable to pointers only, and only for some special ways of passing arguments (e..g byval). When implementing AAPCS32/AAPCS64, clang resorts to dubious hacks of coercing to types, which naturally have the needed alignment. We don't have enough types to cover all the cases, though. This patch introduces a new use of the stackalign attribute to control stack slot alignment, when and if an argument is passed in memory. The attribute align is left as an optimizer hint - it still applies to pointer types only and pertains to the content of the pointer, whereas the alignment of the pointer itself is determined by the stackalign attribute. For byval arguments, the stackalign attribute assumes the role, previously perfomed by align, falling back to align if stackalign` is absent. On the clang side, when passing arguments using the "direct" style (cf. `ABIArgInfo::Kind`), now we can optionally specify an alignment, which is emitted as the new `stackalign` attribute. Patch by Momchil Velikov and Lucas Prates. Differential Revision: https://reviews.llvm.org/D98794	2021-04-15 22:58:14 +01:00
Stanislav Mekhanoshin	13015ebd6f	[AMDGPU] Factor out predicate FmaakFmamkF32Insts Differential Revision: https://reviews.llvm.org/D100409	2021-04-15 12:29:16 -07:00
Florian Hahn	49999d4364	[VPlan] Replace a few unnecessary includes with forward decls.	2021-04-15 20:08:31 +01:00
Stanislav Mekhanoshin	d4385e483d	[AMDGPU] Add new EmitDstSel field to VOPPofile. NFC. Differential Revision: https://reviews.llvm.org/D100589	2021-04-15 12:07:08 -07:00
hsmahesha	82787eb228	[AMDGPU] Move LDS lowering related utility functions to a separate utils file. Move some utility functions which are used within LDS lowering pass to a separate utils file so that other LDS related passes can make use of them when required. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D100526	2021-04-16 00:15:48 +05:30
Krzysztof Parzyszek	280678122d	[Hexagon] Avoid infinite loops in type legalization when lowering SETCC Only widen SETCC if the operands can be widened. Not checking that caused infinite widen-split loops in legalization.	2021-04-15 13:34:37 -05:00
Craig Topper	1656df13da	[RISCV] Share RVInstIShift and RVInstIShiftW instruction format classes with the B extension. This generalizes RVInstIShift/RVInstIShiftW to take the upper 5 or 7 bits of the immediate as an input instead of only bit 30. Then we can share them. For RVInstIShift I left a hardcoded 0 at bit 26 where RV128 gets a 7th bit for the shift amount. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100424	2021-04-15 11:08:28 -07:00
cchen	e0c2125d1d	[OpenMP] Added codegen for masked directive Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D100514	2021-04-15 12:55:07 -05:00
Danilo C. Grael	55487079a9	[LoopUnrollAndJam] Avoid repeated instructions for UAJ analysis Avoid visiting repeated instructions for processHeaderPhiOperands as it can cause a scenario of endless loop. Test case is attached and can be ran with `opt -basic-aa -tbaa -loop-unroll-and-jam -allow-unroll-and-jam -unroll-and-jam-count=4`. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D97407	2021-04-15 12:59:42 -04:00
Arthur Eubanks	c8f0a7c215	[NewPM] Cleanup IR printing instrumentation Being lazy with printing the banner seems hard to reason with, we should print it unconditionally first (it could also lead to duplicate banners if we have multiple functions in -filter-print-funcs). The printIR() functions were doing too many things. I separated out the call from PrintPassInstrumentation since we were essentially doing two completely separate things in printIR() from different callers. There were multiple ways to generate the name of some IR. That's all been moved to getIRName(). The printing of the IR name was also inconsistent, now it's always "IR Dump on $foo" where "$foo" is the name. For a function, it's the function name. For a loop, it's what's printed by Loop::print(), which is more detailed. For an SCC, it's the list of functions in parentheses. For a module it's "[module]", to differentiate between a possible SCC with a function called "module". To preserve D74814, we have to check if we're going to print anything at all first. This is unfortunate, but I would consider this a special case that shouldn't be handled in the core logic. Reviewed By: jamieschmeiser Differential Revision: https://reviews.llvm.org/D100231	2021-04-15 09:50:55 -07:00
Mark Johnston	f511dc75e4	[asan] Add an offset for the kernel address sanitizer on FreeBSD This is based on a port of the sanitizer runtime to the FreeBSD kernel that has been commited as https://cgit.freebsd.org/src/commit/?id=38da497a4dfcf1979c8c2b0e9f3fa0564035c147 and the following commits. Reviewed By: emaste, dim Differential Revision: https://reviews.llvm.org/D98285	2021-04-15 17:49:00 +01:00
Stefan Pintilie	f28cb01be0	[PowerPC] Add ROP Protection Instructions for PowerPC There are four new PowerPC instructions that are introduced in Power 10. They are hashst, hashchk, hashstp, hashchkp. These instructions will be used for ROP Protection. This patch adds the four instructions. Reviewed By: nemanjai, amyk, #powerpc Differential Revision: https://reviews.llvm.org/D99375	2021-04-15 11:38:38 -05:00
Stelios Ioannou	bf147c4653	[LSR] Fix for pre-indexed generated constant offset This patch changed the isLegalUse check to ensure that LSRInstance::GenerateConstantOffsetsImpl generates an offset that results in a legal addressing mode and formula. The check is changed to look similar to the assert check used for illegal formulas. Differential Revision: https://reviews.llvm.org/D100383 Change-Id: Iffb9e32d59df96b8f072c00f6c339108159a009a	2021-04-15 16:44:42 +01:00
OCHyams	17cec07184	Revert "[DebugInfo] Replace debug uses in replaceUsesOutsideBlock" This reverts commit `96a1e6b7cf`. Failing build bots e.g. https://lab.llvm.org/buildbot/#/builders/161/builds/163	2021-04-15 16:35:45 +01:00
OCHyams	96a1e6b7cf	[DebugInfo] Replace debug uses in replaceUsesOutsideBlock Value::replaceUsesOutsideBlock doesn't replace debug uses which leads to an unnecessary reduction in variable location coverage. Fix this, add a unittest for it, and add a regression test demonstrating the change through instcombine's replacedSelectWithOperand. Reviewed By: djtodoro Differential Revision: https://reviews.llvm.org/D99169	2021-04-15 16:19:36 +01:00
LemonBoy	24185541ca	[yaml2obj/obj2yaml/llvm-readobj] Support printing and parsing AVR-specific e_flags The `e_flags` contains a mixture of bitfields and regular ones, ensure all of them can be serialized and deserialized. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D100250	2021-04-15 15:54:28 +02:00
Sebastian Neubauer	7842e1725e	[AMDGPU] Fix large return values with amdgpu_gfx Returning in memory is not supported, so fall back to sret. Also, extend i1 and i16 to i32. Otherwise, they would be passed through memory. Differential Revision: https://reviews.llvm.org/D100543	2021-04-15 14:57:56 +02:00
Simon Pilgrim	9d57a77b81	[X86] combineCMP - fold cmpEQ/NE(TRUNC(X),0) -> cmpEQ/NE(X,0) If we are truncating from a i32 source before comparing the result against zero, then see if we can directly compare the source value against zero. If the upper (truncated) bits are known to be zero then we can compare against that, hopefully increasing the chances of us folding the compare into a EFLAG result of the source's operation. Fixes PR49028. Differential Revision: https://reviews.llvm.org/D100491	2021-04-15 13:55:51 +01:00
Bradley Smith	22c017f0f9	[AArch64][NEON] Match (or (and -a b) (and (a+1) b)) => bit select With this patch vbslq_f32(vnegq_s32(a), b, c) lowers to a BIT instruction. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D100304	2021-04-15 13:52:47 +01:00
Alex Orlov	49cbf4cd85	Fix bug in .eh_frame/.debug_frame PC offset calculation for DW_EH_PE_pcrel This fixes the following bugs: https://bugs.llvm.org/show_bug.cgi?id=27249 https://bugs.llvm.org/show_bug.cgi?id=46414 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D100328	2021-04-15 15:06:20 +04:00
Florian Hahn	6adebe3fd2	[VPlan] Add VPRecipeBase::mayHaveSideEffects. Add an initial version of a helper to determine whether a recipe may have side-effects. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D100259	2021-04-15 11:49:40 +01:00
Jun Ma	7e1422c1e4	[DAGCombiner] Fold step_vector with add/mul/shl This patch implements some DAG combines for STEP_VECTOR: add step_vector(C1), step_vector(C2) -> step_vector(C1+C2) add (add X step_vector(C1)), step_vector(C2) -> add X step_vector(C1+C2) mul step_vector(C1), C2 -> step_vector(C1*C2) shl step_vector(C1), C2 -> step_vector(C1<<C2) TestPlan: check-llvm Differential Revision: https://reviews.llvm.org/D100088	2021-04-15 18:06:35 +08:00
David Sherwood	ea14df695e	[SVE][LoopVectorize] Fix crash in InnerLoopVectorizer::widenPHIInstruction There were a few places in widenPHIInstruction where calculations of offsets were failing to take the runtime calculation of VF into account for scalable vectors. I've fixed those cases in this patch as well as adding an assert that we should not be scalarising for scalable vectors. Tests are added here: Transforms/LoopVectorize/AArch64/sve-widen-phi.ll Differential Revision: https://reviews.llvm.org/D99254	2021-04-15 10:51:49 +01:00
dfukalov	ce1626f34a	[AA] Updates for D95543. Addressing latter comments in D95543: - `AliasResult::Result` renamed to `AliasResult::Kind` - Offset printing added for `PartialAlias` case in `-aa-eval` - Removed VisitedPhiBBs check from BasicAA' Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D100454	2021-04-15 12:22:03 +03:00
Florian Hahn	acd9cc7495	[AArch64] Use type-legalization cost for code size memop cost. At the moment, getMemoryOpCost returns 1 for all inputs if CostKind is CodeSize or SizeAndLatency. This fools LoopUnroll into thinking memory operations on large vectors have a cost of one, even if they will get expanded to a large number of memory operations in the backend. This patch updates getMemoryOpCost to return the cost for the type legalization for both CodeSize and SizeAndLatency. This should more accurately reflect the number of memory operations required. I am not sure how latency should properly be included in SizeAndLatency from the description, but returning the size cost should be clearly more accurate. This does not cause any binary changes when building MultiSource/SPEC2000/SPEC2006 with -O3 -flto for AArch64, likely because large vector memops are not really formed by code emitted from Clang. But using the C/C++ matrix extension can easily result in code with very large vector operations directly from Clang, e.g. https://clang.godbolt.org/z/6xzxcTGvb Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D100291	2021-04-15 10:11:05 +01:00
David Sherwood	7120f89f7d	[NFC][LoopVectorize] Remove unnecessary VF.isScalable asserts There are a few places in LoopVectorize.cpp where we have been too cautious in adding VF.isScalable() asserts and it can be confusing. It also makes it more difficult to see the genuine places where work needs doing to improve scalable vectorization support. This patch changes getMemInstScalarizationCost to return an invalid cost instead of firing an assert for scalable vectors. Also, vectorizeInterleaveGroup had multiple asserts all for the same thing. I have removed all but one assert near the start of the function, and added a new assert that we aren't dealing with masks for scalable vectors. Differential Revision: https://reviews.llvm.org/D99727	2021-04-15 09:41:03 +01:00
Martin Storsjö	5144f730a8	[AArch64] Fix windows vararg functions with floats in the fixed args On Windows, float arguments are normally passed in float registers in the calling convention for regular functions. For variable argument functions, floats are passed in integer registers. This already was done correctly since many years. However, the surprising bit was that floats among the fixed arguments also are supposed to be passed in integer registers, contrary to regular functions. (This also seems to be the behaviour on ARM though, both on Windows, but also on e.g. hardfloat linux.) In the calling convention, don't promote shorter floats to f64, but convert them to integers of the same length. (Floats passed as part of the actual variable arguments are promoted to double already on the C/Clang level; the LLVM vararg calling convention doesn't do any extra promotion of f32 to f64 - this matches how it works on X86 too.) Technically, this is an ABI break compared to older LLVM versions, but it fixes compatibility with the official platform ABI. (In practice, floats among the fixed arguments in variable argument functions is a pretty rare construct.) Differential Revision: https://reviews.llvm.org/D100365	2021-04-15 11:02:14 +03:00
Nikita Popov	a1ed025d0e	Revert "[SCEV] Don't walk uses of phis without SCEV expression when forgetting" This reverts commit `faf9f11589`. Issues with this patch have been reported in https://reviews.llvm.org/D100264#2689917 and https://bugs.llvm.org/show_bug.cgi?id=49967.	2021-04-15 09:43:52 +02:00
Florian Hahn	5a3ff24b12	[NewGVN] Add phi-of-ops operands if no real PHI is created. If the PHI-of-ops simplifies to an existing value, no real PHI is created, which means the dependencies between the PHI-of-ops and its operands is not materialized in IR. At the moment, we fail to create a real PHI node for the PHI-of-ops, because the PHI-of-ops root instruction is not re-visited if one of the PHI-of-ops operands changes. We need to add the operands as additional users in this case. Even with this patch, there are still some dependencies missing. I will continue tackling the outstanding reporeted crashes in this area. Fixes PR36501, PR42422, PR42557. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D66924	2021-04-15 08:25:10 +01:00
Craig Topper	c3f1271464	[RISCV] Add a PatFrag to shorten repeated (XLenVT (VLOp GPR:$vl)) in V extension patterns. Reduces the amount of changes needed in D100288.	2021-04-14 22:36:35 -07:00
hsmahesha	4973b0c4e7	[AMDGPU] Disable forceful inline of non-kernel functions which use LDS. Now since LDS uses within non-kernel functions are being handled in the pass - LowerModuleLDS, we NO need to forcefully inline non-kernel functions just because they use LDS. Do forceful inlining only when the pass - LowerModuleLDS is not enabled. It is enabled by default. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D100481	2021-04-15 09:12:56 +05:30
Alexander Yermolovich	b7459a10da	[DWARF] Fix crash for DWARFDie::dump. When DIE is extracted manually, the DieArray is empty. When dump is invoked on aforementioned DIE it tries to extract child, even if Dump options say otherwise. Resulting in crash. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D99698	2021-04-14 18:46:34 -07:00
Philip Reames	dd985551c2	Reapply "[InferAttributes] Materialize all infered attributes for declaration"" and follow on patches. This reverts commit `ab98f2c712` and `98eea392cd`. It includes a fix for the clang test which triggered the revert. I failed to notice this one because there was another AMDGPU llvm test with a similiar name and the exact same text in the error message. Odd. Since only one build bot reported the clang test, I didn't notice that one.	2021-04-14 16:38:07 -07:00
Nico Weber	ab98f2c712	Revert "[InferAttributes] Materialize all infered attributes for declaration" Breaks check-clang, see comments on D100400 Also revert follow-up "[NFC] Move a recently added utility into a location to enable reuse" This reverts commit `3ce61fb6d6`. This reverts commit `61a85da882`.	2021-04-14 18:41:20 -04:00
Philip Reames	3ce61fb6d6	[NFC] Move a recently added utility into a location to enable reuse About to refresh a patch that uses this in FunctionAtrrs, doing the move seperately to control build times.	2021-04-14 15:05:16 -07:00
Philip Reames	61a85da882	[InferAttributes] Materialize all infered attributes for declaration We have some cases today where attributes can be inferred from another on access, but the result is not explicitly materialized in IR. This change is a step towards changing that. Why? Two main reasons: * Human clarity. It's really confusing trying to figure out why a transform is triggering when the IR doesn't appear to have the required attributes. * This avoids the need to special case declarations in e.g. functionattrs. Since we can assume the attribute is present, we can work directly from attributes (and only attributes) without also needing to query accessors on Function to avoid missing cases due to unannotated (but infered on use) declarations. (This piece will appear must easier to follow once D100226 also lands.) Differential Revision: https://reviews.llvm.org/D100400	2021-04-14 14:45:24 -07:00
Thomas Lively	6a18cc23ef	[WebAssembly] Codegen for i64x2.extend_{low,high}_i32x4_{s,u} Removes the builtins and intrinsics used to opt in to using these instructions and replaces them with normal ISel patterns now that they are no longer prototypes. Differential Revision: https://reviews.llvm.org/D100402	2021-04-14 13:43:09 -07:00
Mehrnoosh Heidarpour	29f189f90d	[InstCombine] Conditionally emit nowrap flags when combining two adds Currently, the InstCombineCompare is combining two add operations into a single add operation which always has a nsw flag, without checking the conditions to see if this flag should be present according to the original two add operations or not. This patch will change the InstCombineCompare to emit the nsw or nuw only when these flags are allowed to be generated according to the original add operations and remove the possibility of applying wrong optimization with passes that will perform on the IR later in the pipeline. To confirm that the current results are buggy and the results after proposed patch are the correct IR the following examples from Alive2 are attached; the same results can be seen in the case of nuw flag and nsw is just used as an example. The following link shows that the generated IR with current LLVM is a buggy IR when none of the original add operations have nsw flag. https://alive2.llvm.org/ce/z/WGaDrm The following link proves that the generated IR after the patch in the former case is the correct IR. https://alive2.llvm.org/ce/z/wQ7G_e Differential Revision: https://reviews.llvm.org/D100095	2021-04-14 20:53:06 +02:00
William S. Moses	d3e2b4c0a2	[SROA][TBAA] Handle shift of regular TBAA nodes SROA shifts TBAA nodes in a way that may present a problem for !tbaa but not !tbaa.struct nodes. Differential Revision: https://reviews.llvm.org/D99851	2021-04-14 14:35:20 -04:00
Thomas Lively	af7925b4dd	[WebAssembly] Codegen for f64x2.convert_low_i32x4_{s,u} Add a custom DAG combine and ISD opcode for detecting patterns like (uint_to_fp (extract_subvector ...)) before the extract_subvector is expanded to ensure that they will ultimately lower to f64x2.convert_low_i32x4_{s,u} instructions. Since these instructions are no longer prototypes and can now be produced via standard IR, this commit also removes the target intrinsics and builtins that had been used to prototype the instructions. Differential Revision: https://reviews.llvm.org/D100425	2021-04-14 10:42:45 -07:00
Nikita Popov	0d91075f77	[ValueTracking] Don't require strictly positive for mul nsw recurrence Just like in the mul nuw case, it's sufficient that the step is non-zero. If the step is negative, then the values will jump between positive and negative, "crossing" zero, but the value of the recurrence is never actually zero.	2021-04-14 19:39:59 +02:00
Nikita Popov	5c0fb026c9	[ValueTracking] Don't require non-zero step for add nuw It's okay if the step is zero, we'll just stay at the same non-zero value in that case. The valuable part of this is that the step doesn't even need to be a constant anymore.	2021-04-14 19:06:18 +02:00
Stanislav Mekhanoshin	b7ebb25e53	[AMDGPU] Factor out SelectSAddrFI() This is a service function generally useful for selection of a FI in an SADDR. NFC for now, needed for future patch. Differential Revision: https://reviews.llvm.org/D100406	2021-04-14 09:40:02 -07:00
Sander de Smalen	4f42d873c2	[TTI] NFC: Change getArithmeticInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100317	2021-04-14 17:20:36 +01:00
Sander de Smalen	d84bd951a8	[TTI] NFC: Change getFPOpCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D100316	2021-04-14 17:20:36 +01:00
Sander de Smalen	1af35e77f4	[TTI] NFC: Change getVectorInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100315	2021-04-14 17:20:35 +01:00
Sander de Smalen	174e8f6c5e	[TTI] NFC: Change getShuffleCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100314	2021-04-14 17:20:35 +01:00
Sander de Smalen	14b934f8a6	[TTI] NFC: Change getCFInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D100313	2021-04-14 17:20:34 +01:00
Sander de Smalen	596f669cfb	[TTI] NFC: Change getCallInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D100312	2021-04-14 17:20:34 +01:00
Thomas Lively	af7ab81ce3	[WebAssembly] Use standard intrinsics for f32x4 and f64x2 ops Now that these instructions are no longer prototypes, we do not need to be careful about keeping them opt-in and can use the standard LLVM infrastructure for them. This commit removes the bespoke intrinsics we were using to represent these operations in favor of the corresponding target-independent intrinsics. The clang builtins are preserved because there is no standard way to easily represent these operations in C/C++. For consistency with the scalar codegen in the Wasm backend, the intrinsic used to represent {f32x4,f64x2}.nearest is @llvm.nearbyint even though @llvm.roundeven better captures the semantics of the underlying Wasm instruction. Replacing our use of @llvm.nearbyint with use of @llvm.roundeven is left to a potential future patch. Differential Revision: https://reviews.llvm.org/D100411	2021-04-14 09:19:27 -07:00
Sjoerd Meijer	39d29817f3	[SCCP] Follow up of rGbbab9f986c6d. NFC. This addresses the linter messages, mainly the inconsistent capitalisation of member functions.	2021-04-14 17:14:46 +01:00
Benjamin Kramer	cf4161673c	[Instcombine] Disable memcpy of alloca bypass for instruction sources This transformation is fundamentally broken when it comes to dominance, it just happened to work when the source of the memcpy can be moved into the place of the alloca. The bug shows up a lot more often since `077bff39d4` allows the source to be a switch. It would be possible to check dominance of the source and all its operands, but that seems very heavy for instcombine.	2021-04-14 16:52:09 +02:00
hsmahesha	e3070db0f7	[AMDGPU] Rename "LDS lowering" pass name. Rename the name of "LDS lowering" pass from `amdgpu-disable-lower-module-lds` to `amdgpu-enable-lower-module-lds` as later is consistent and reads better. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D100441	2021-04-14 20:19:53 +05:30
Simon Pilgrim	4fbe761572	[X86][SSE] canonicalizeShuffleWithBinOps - check for more combos of merge-able binary shuffles. In the fold SHUFFLE(BINOP(X,Y),BINOP(Z,W)) -> BINOP(SHUFFLE(X,Z),SHUFFLE(Y,W)), check if both X/Z AND Y/W have at least one merge-able shuffle in which case the total number of shuffle should still fall. Helps with instruction count regressions we saw while fixing PR48823	2021-04-14 15:24:41 +01:00
Simon Pilgrim	b49c41afba	[SLP] createOp - fix null dereference warning. NFCI. Only attempt to propagateIRFlags if we have both SelectInst - afaict we shouldn't have matched a min/max reduction without both SelectInst, but static analyzer doesn't know that.	2021-04-14 15:24:41 +01:00
Pablo Barrio	cca40aa8d8	[AArch64][v8.5A] Add BTI to all function starts The existing BTI placement pass avoids inserting "BTI c" when the function has local linkage and is only directly called. However, even in this case, there is a (small) chance that the linker later adds a hunk with an indirect call to the function, e.g. if the function is placed in a separate section and moved far away from its callers. Make sure to add BTI for these functions too. Differential Revision: https://reviews.llvm.org/D99417	2021-04-14 15:24:01 +01:00
Sjoerd Meijer	bbab9f986c	[SCCP] Create SCCP Solver This refactors SCCP and creates a SCCPSolver interface and class so that it can be used by other passes and transformations. We will use this in D93838, which adds a function specialisation pass. This is based on an early version by Vinay Madhusudan. Differential Revision: https://reviews.llvm.org/D93762	2021-04-14 14:58:03 +01:00
Sanjay Patel	7ef2c68a3d	[InstSimplify] improve efficiency for detecting non-zero value Stepping through callstacks in the example from D99759 reveals this potential compile-time improvement. The savings come from avoiding ValueTracking's computing known bits if we have already dealt with special-case patterns. Further improvements in this direction seem possible. This makes a degenerate test based on PR49785 about 40x faster (25 sec -> 0.6 sec), but it does not address the larger question of how to limit computeKnownBitsFromAssume(). Ie, the original test there is still infinite-time for all practical purposes. Differential Revision: https://reviews.llvm.org/D100408	2021-04-14 09:04:15 -04:00
Sanjay Patel	5ae5d25e38	[ValueTracking] match negative-stepping non-zero recurrence This is pulled out of D100408. This avoids a regression that would be exposed by making the calling code from InstSimplify more efficient.	2021-04-14 08:57:53 -04:00
Sebastian Neubauer	929edd4375	[AMDGPU] Mark scavenged SGPR as used Otherwise it reuses the same register for storing the stack slot offset if the stack slot offset is big. Differential Revision: https://reviews.llvm.org/D100461	2021-04-14 14:55:01 +02:00
Sanjay Patel	4919365397	[ValueTracking] reduce code duplication; NFC The start value can't be null for something to be a non-zero recurrence, so hoist that common check out of the switch. Subsequent checks may be incomplete or over-specified as noted in: D100408	2021-04-14 08:32:42 -04:00
Zarko Todorovski	6b7838b68c	[AIX] Allow safe for 32bit P8 VSX pattern matching Pull some of the safe for 32bit pattern matching for Pwr8 and above. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D97909	2021-04-14 08:12:48 -04:00
Tim Northover	6401b78ab3	SDAG: constant fold bf16 -> i16 casts This direction is particularly useful because i16 constants are much more likely to be legal than bf16.	2021-04-14 11:27:46 +01:00
Roman Lebedev	2fea5d5d4a	[InstCombine] tmp alloca bypass: ensure that the replacement dominates all alloca uses After `077bff39d4`, isDereferenceableForAllocaSize() can recurse into selects, which is causing a problem for the new test case, reduced from https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20210412/904154.html because the replacement (the select) is defined after the first use of an alloca, so we'd end up with a verifier error. Now, this new check is too restrictive. We likely can handle some cases, by trying to sink all uses of an alloca to after the the def.	2021-04-14 13:04:12 +03:00
Simon Pilgrim	73737fe990	[X86] Fold cmpeq/ne(trunc(x),0) --> cmpeq/ne(x,0) Relax the fold from rGbaadbe04bf75 to compare any op, not just logic ops, now that the movmsk regressions have been handled.	2021-04-14 11:02:02 +01:00
Simon Pilgrim	016ceb8382	[X86][SSE] combineSetCCMOVMSK - allow comparison with upper (known zero) bits in MOVMSK(SHUFFLE(X,u)) -> MOVMSK(X) fold Extension to rG74f98391a7a4, we can also include any of the upper (known zero) bits in the comparison in the shuffle removal fold, just as long as we demand all the elements of the movmsk source vector.	2021-04-14 11:02:01 +01:00
Nemanja Ivanovic	8be3181df6	[PowerPC] Fix incorrect subreg typo from `0148bf53f0`	2021-04-14 05:01:12 -05:00
Martin Storsjö	3b32dc4b84	[ARM] [COFF] Properly produce cross-section relative relocations Differential Revision: https://reviews.llvm.org/D99574	2021-04-14 12:31:28 +03:00
Martin Storsjö	d5c5cf5ce8	[AArch64] [COFF] Properly produce cross-section relative relocations This fixes breakage on Windows/ARM64 after D94355. Modelled after the corresponding code for X86; not entirely familiar with those aspects of that layer otherwise. Differential Revision: https://reviews.llvm.org/D99572	2021-04-14 12:31:26 +03:00
Serguei Katkov	cf0d3477aa	[GreedyRA ORE] Separate Folder Reloads and Zero Cost Folder Reloads Patchpoint instructions have operands which is actually zero cost (or the same as register) to use the value from the stack. In terms of statistic it makes same to separate them. Move from computation instructions related to stack spill/reload to number of stack slot referenced. Reviewers: reames, MatzeB, anemet, thegameg Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D100016	2021-04-14 14:25:28 +07:00
Bogdan Graur	0acf4e5005	[NFC] Fix unused warning. Differential Revision: https://reviews.llvm.org/D100449	2021-04-14 09:09:20 +02:00
Serguei Katkov	02265ed7ad	[Live Intervals] Teach Greedy RA to recognize special case live-through Statepoint instruction has a deopt section which is actually live-through the call. Currently this is handled by special post pass after RA - fixup-statepoint-caller-saved. This change teaches Greedy RA that if segment of live interval is ended with statepoint instruction and its reg is used in deopt bundle then this live interval interferes regmask of this statepoint and as a result caller-saved register cannot be assigned to this live interval. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D100296	2021-04-14 13:26:49 +07:00
Min-Yih Hsu	91b6ef64db	[M68k] Put M68kInfo as the direct library dependency for AsmParser M68kAsmParser uses `llvm::getTheM68kTarget` from M68kInfo, therefore we should put M68kInfo as its direct dependency. Otherwise the build will fail when building LLVM libraries as shared objects (building LLVM libraries statically won't have this problem though).	2021-04-13 21:21:02 -07:00
Serguei Katkov	d5ed0d4816	[Live Intervals] Factor-out unionBitMask. NFC. For further re-usage in other place.	2021-04-14 10:39:01 +07:00
Wang, Pengfei	a3b52a9d13	[X86][AMX] Refactor for PostRA ldtilecfg pass. This is a follow up of D99010. We didn't consider the live range of shape registers when hoist ldtilecfg. There maybe risks, e.g. we happen to insert it to an invalid range of some registers and get unexpected error. This patch fixes this problem by storing the value to corresponding stack place of ldtilecfg after all its definition immediately. This patch also fix a problem in previous code: If we don't have a ldtilecfg which dominates all AMX instructions, we cannot initialize shapes for other ldtilecfg. There're still some optimization points left. E.g. eliminate unused mov instructions, break the def-use dependency before RA etc. Reviewed By: LuoYuanke, xiangzhangllvm Differential Revision: https://reviews.llvm.org/D99966	2021-04-14 10:08:23 +08:00
Philip Reames	00c8be3f93	fix whitespace type	2021-04-13 19:02:41 -07:00
ShihPo Hung	d5e962f1f2	[RISCV] Implement COPY for Zvlsseg registers When copying Zvlsseg register tuples, we split the COPY to NF whole register moves as below: $v10m2_v12m2 = COPY $v4m2_v6m2 # NF = 2 => $v10m2 = PseudoVMV2R_V $v4m2 $v12m2 = PseudoVMV2R_V $v6m2 This patch copies forwardCopyWillClobberTuple from AArch64 to check register overlapping. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100280	2021-04-13 18:55:51 -07:00
Nemanja Ivanovic	0148bf53f0	[PowerPC] Use correct node to get a super register from a subreg The VSX tablegen file has some rather eggregious uses of COPY_TO_REGCLASS even in situations where it needs to use SUBREG_TO_REG. While this produces correct code, it often doesn't allow the register coalescer to coalesce copies and the resulting code ends up being suboptimal. This patch just changes over patterns that should use SUBREG_TO_REG.	2021-04-13 19:52:21 -05:00
Sterling Augustine	32e264921b	Revert "[GlobalOpt] Revert valgrind hacks" This reverts commit `dbc16ed199`.	2021-04-13 17:47:07 -07:00
root	645ce31c20	Title: [RISCV] Add missing part of instruction vmsge {u}. VX Review By: craig.topper Differential Revision : https://reviews.llvm.org/D100115	2021-04-14 06:41:59 +08:00
Daniel Sanders	be50657c6a	[TableGen] Resolve concrete but not complete field access initializers This fixes the resolution of Rec10.Zero in ListSlices.td. As part of this, correct the definition of complete for ListInit such that it's complete iff all the elements in the list are complete rather than always being complete regardless of the elements. This is the reason Rec10.TwoFive from ListSlices.td previously resolved despite being incomplete like Rec10.Zero was Depends on D100247 Reviewed By: Paul-C-Anagnostopoulos Differential Revision: https://reviews.llvm.org/D100253	2021-04-13 15:14:56 -07:00
Anirudh Prasad	6ddd8c28b7	[AsmParser][SystemZ][z/OS] Add support to AsmLexer to accept HLASM style integers - Add support for HLASM style integers. These are the decimal integers [0-9]. - HLASM does not support the additional prefixed integers like, `0b`, `0x`, octal integers and Masm style integers. - To achieve this, a field `LexHLASMStyleIntegers` (similar to the `LexMasmStyleIntegers` field) is introduced in `MCAsmLexer.h` as well as a corresponding setter. Note: This field could also go into MCAsmInfo.h. I used the previous precedent set by the `LexMasmIntegers` field. Depends on https://reviews.llvm.org/D99286 Reviewed By: epastor Differential Revision: https://reviews.llvm.org/D99374	2021-04-13 15:29:37 -04:00
Craig Topper	6aa6f748ae	[RISCV] Add a generic PatGprImm class and use it to simplify patterns in RISCVInstrInfoB.td. NFC	2021-04-13 12:07:24 -07:00
Craig Topper	cb073f1bc0	[RISCV] Make use of PatGprGpr and PatGpr in RISCVInstrInfoB.td. NFC	2021-04-13 12:06:58 -07:00
Yonghong Song	a285bdb56f	BPF: remove default .extern data section Currently, for any extern variable, if it doesn't have section attribution, it will be put into a default ".extern" btf DataSec. The initial design is to put every extern variable in a DataSec so libbpf can use it. But later on, libbpf actually requires extern variables to put into special sections, e.g., ".kconfig", ".ksyms", etc. so they can be used properly based on section name. Andrii mentioned since ".extern" variables are not actually used, it makes sense to remove it from the compiler so libbpf does not need to deal with it, esp. for static linking. The BTF for these extern variables is still generated. With this patch, I tested kernel selftests/bpf and all tests passed. Indeed, removing ".extern" DataSec seems having no impact. Differential Revision: https://reviews.llvm.org/D100392	2021-04-13 11:35:52 -07:00
Nikita Popov	faf9f11589	[SCEV] Don't walk uses of phis without SCEV expression when forgetting I've run into some cases where a large fraction of compile-time is spent invalidating SCEV. One of the causes is forgetLoop(), which walks all values that are def-use reachable from the loop header phis. When invalidating a topmost loop, that might be close to all values in a function. Additionally, it's fairly common for there to not actually be anything to invalidate, but we'll still be performing this walk again and again. My first thought was that we don't need to continue walking the uses if the current value doesn't have a SCEV expression. However, this isn't quite right, because SCEV construction can skip over values (e.g. for a chain of adds, we might only create a SCEV expression for the final value). What this patch does instead is to only walk the (full) def-use chain of loop phis that have a SCEV expression. If there's no expression for a phi, then we also don't have any dependent expressions to invalidate. Differential Revision: https://reviews.llvm.org/D100264	2021-04-13 20:28:17 +02:00
Craig Topper	1afdfc6169	[RISCV] Rename RISCVISD::GREVI(W)/GORCI(W) to RISCVISD::GREV(W)/GORC(W). Don't require second operand to be a constant. Prep work for adding intrinsics for these instructions in the future.	2021-04-13 11:04:28 -07:00
Jessica Paquette	516d09387b	[AArch64][GlobalISel] Mark G_CTPOP as legal for v16s8 and v8s8 G_CTPOP can be directly selected to CNT in these cases. Differential Revision: https://reviews.llvm.org/D100349	2021-04-13 11:03:39 -07:00
Simon Pilgrim	74f98391a7	[X86][SSE] combineSetCCMOVMSK - allow comparison with upper (known zero) bits in CMP(MOVMSK(PACKSS())) -> CMP(MOVMSK()) fold We already allow the comparison of the upper bits of 'IsAllOf' (allbits) patterns, but we can safely compare the known zero bits for 'IsAnyOf' (zerobits) patterns as well. This fixes an issues where we are comparing a type wide than the number of vector elements, which avoids a regression mentioned in rGbaadbe04bf75.	2021-04-13 17:37:24 +01:00
Anirudh Prasad	7da22dfcd0	[SystemZ][z/OS] Introduce dialect querying helper functions - In the SystemZAsmParser, there will be a few queries to the type of dialect it is (AD_ATT, AD_HLASM) in future patches. - It would be nice to have two small helper functions `isParsingATT()` and `isParsingHLASM()` - Putting this as a separate smaller patch allows us to remove its definitions from other dependent patches. Reviewed By: uweigand, abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D99891	2021-04-13 12:14:34 -04:00
Evgeny Leviant	dbc16ed199	[GlobalOpt] Revert valgrind hacks Differential revision: https://reviews.llvm.org/D69428	2021-04-13 19:11:10 +03:00
Yonghong Song	968292cb93	BPF: generate proper BTF for globals with WeakODRLinkage For a global weak symbol defined as below: char g __attribute__((weak)) = 2; LLVM generates an allocated global with WeakAnyLinkage, for which BPF backend generates proper BTF info. For the above example, if a modifier "const" is added like const char g __attribute__((weak)) = 2; LLVM generates an allocated global with WeakODRLinkage, for which BPF backend didn't generate any BTF as it didn't handle WeakODRLinkage. This patch addes support for WeakODRLinkage and proper BTF info can be generated for weak symbol defined with "const" modifier. Differential Revision: https://reviews.llvm.org/D100362	2021-04-13 08:54:05 -07:00
Anirudh Prasad	f7eec83932	[AsmParser][SystemZ][z/OS] Add in support to allow use of additional comment strings. - Currently, MCAsmInfo provides a CommentString attribute, that various targets can set, so that the AsmLexer can appropriately lex a string as a comment based on the set value of the attribute. - However, AsmLexer also supports a few additional comment syntaxes, in addition to what's specified as a CommentString attribute. This includes regular C-style block comments (/* ... /), regular C-style line comments (// .... ) and #. While I'm not sure as to why this behaviour exists, I am assuming it does to maintain backward compatibility with GNU AS (see https://sourceware.org/binutils/docs/as/Comments.html#Comments for reference) For example: Consider a target which sets the CommentString attribute to ''. The following strings are all lexed as comments. ``` "# abc" -> comment "// abc" -> comment "/* abc / -> comment " abc" -> comment ``` - In HLASM however, only "*" is accepted as a comment string, and nothing else. - To achieve this, an additional attribute (`AllowAdditionalComments`) has been added to MCAsmInfo. If this attribute is set to false, then only the string specified by the CommentString attribute is used as a possible comment string to be lexed by the AsmLexer. The regular C-style block comments, line comments and "#" are disabled. As a final note, "#" will still be treated as a comment, if the CommentString attribute is set to "#". Depends on https://reviews.llvm.org/D99277 Reviewed By: abhina.sreeskantharajan, myiwanch Differential Revision: https://reviews.llvm.org/D99286	2021-04-13 11:15:09 -04:00
Tim Northover	5e3d9fcc3a	StackProtector: ensure protection does not interfere with tail call frame. The IR stack protector pass must insert stack checks before the call instead of between it and the return. Similarly, SDAG one should recognize that ADJCALLFRAME instructions could be part of the terminal sequence of a tail call. In this case because such call frames cannot be nested in LLVM the stack protection code must skip over the whole sequence (or risk clobbering argument registers).	2021-04-13 15:14:57 +01:00
Sander de Smalen	03f47bdcb1	[TTI] NFC: Change get[Interleaved]MemoryOpCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100205	2021-04-13 14:21:02 +01:00
Sander de Smalen	d676b5749d	[TTI] NFC: Change getMaskedMemoryOpCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100204	2021-04-13 14:21:01 +01:00
Sander de Smalen	db134e2428	[TTI] NFC: Change getCmpSelInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100203	2021-04-13 14:21:01 +01:00
Sander de Smalen	2285dfb73f	[TTI] NFC: Change getMinMaxReductionCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100202	2021-04-13 14:21:00 +01:00
Sander de Smalen	bd86824d98	[TTI] NFC: Change getArithmeticReductionCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html This patch is practically NFC, with the exception of an AArch64 SVE related cost-model change, where we can now return an Invalid cost instead of some bogus number. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100201	2021-04-13 14:20:59 +01:00
Sander de Smalen	fd1f8a5462	[TTI] NFC: Change getGatherScatterOpCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100200	2021-04-13 14:20:59 +01:00
Sander de Smalen	92d8421f49	[TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100199	2021-04-13 14:20:58 +01:00
madhur13490	5682ae2fc6	[AMDGPU] Set implicit arg attributes for indirect calls This patch adds attributes corresponding to implicits to functions/kernels if 1. it has an indirect call OR 2. it's address is taken. Once such attributes are set, rest of the codegen would work out-of-box for indirect calls. This patch eliminates the potential overhead -fixed-abi imposes even though indirect functions calls are not used. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D99347	2021-04-13 13:15:13 +00:00
Martin Storsjö	45f8946a75	[CodeView] Fix the ARM64 CPUType enum The old, incorrect one seems to have been added in `d41ac895bb`, with a similarly placed entry added in EnumTables.cpp in `eb4d6142dc`. This matches the value documented at https://docs.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/cv-cpu-type-e?view=vs-2019. This fixes running obj2yaml on an object file generated by MSVC. Differential Revision: https://reviews.llvm.org/D100306	2021-04-13 12:54:22 +03:00
Florian Hahn	467b1f1cd2	[SimplifyCFG] Allow hoisting terminators only with HoistCommonInsts=false. As a side-effect of the change to default HoistCommonInsts to false early in the pipeline, we fail to convert conditional branch & phis to selects early on, which prevents vectorization for loops that contain conditional branches that effectively are selects (or if the loop gets vectorized, it will get vectorized very inefficiently). This patch updates SimplifyCFG to perform hoisting if the only instruction in both BBs is an equal branch. In this case, the only additional instructions are selects for phis, which should be cheap. Even though we perform hoisting, the benefits of this kind of hoisting should by far outweigh the negatives. For example, the loop in the code below will not get vectorized on AArch64 with the current default, but will with the patch. This is a fundamental pattern we should definitely vectorize. Besides that, I think the select variants should be easier to use for reasoning across other passes as well. https://clang.godbolt.org/z/sbjd8Wshx ``` double clamp(double v) { if (v < 0.0) return 0.0; if (v > 6.0) return 6.0; return v; } void loop(double* X, double *Y) { for (unsigned i = 0; i < 20000; i++) { X[i] = clamp(Y[i]); } } ``` Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D100329	2021-04-13 10:33:35 +01:00
Ricky Taylor	6e098e133d	[M68k] Implement AsmParser This is a work-in-progress implementation of an assembler for M68k. Outstanding work: - Updating existing tests assembly syntax - Writing new tests for the assembler (and disassembler) I've left those until there's consensus that this approach is okay (I hope that's okay!). Questions I'm aware of: - Should this use Motorola or gas syntax? (At the moment it uses Motorola syntax.) - The disassembler produces a table at runtime for disassembly generated from the code beads. Is this okay? (This is less than ideal but as I mentioned in my llvm-dev post, it's quite complicated to write a table-gen parser for code beads.) Depends on D98519 Depends on D98532 Depends on D98534 Depends on D98535 Depends on D98536 Differential Revision: https://reviews.llvm.org/D98537	2021-04-13 09:25:34 +01:00
Craig Topper	7c9bbbf735	[RISCV] Rename RISCVISD::SHFLI to RISCVISD::SHFL and don't require the second operand to be an immediate. Prep work for adding intrinsics in the future. Left an assert that the input is constant in ReplaceNodeResults, as the intrinsic shouldn't go through that path.	2021-04-12 23:46:50 -07:00
Amy Huang	dad5caa59e	Revert "Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"" This change causes an assert / segmentation fault in LTO builds. This reverts commit `f2e4f3eff3`.	2021-04-12 20:10:17 -07:00
Serguei Katkov	c362179b0a	[GreedyRA ORE] Add debug location for function level report Reviewers: reames, MatzeB, anemet, thegameg Reviewed By: thegameg Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D100168	2021-04-13 09:49:12 +07:00
Chen Zheng	80aa9b0f7b	[PowerPC] stop reverse mem op generation for some cases. We should consider the feeder user number when we do reverse memory operation transformation. Otherwise, we may get negative impact. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D100166	2021-04-12 22:41:28 -04:00
Evgeniy Brevnov	e50aa1af2d	[NARY][NFC] Use hasNUsesOrMore instead of getNumUses since it's more efficient.	2021-04-13 09:29:49 +07:00
Freddy Ye	3fc1fe8db8	[X86] Support -march=rocketlake Reviewed By: skan, craig.topper, MaskRay Differential Revision: https://reviews.llvm.org/D100085	2021-04-13 09:48:13 +08:00
Gulfem Savrun Yeniceri	e96df3e531	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-04-13 01:29:41 +00:00
Nick Desaulniers	237d4ee835	[JumpThreading] merge debug info when merging select+br Jump threading can replace select then unconditional branch with conditional branch, but when doing so loses debug info. This destructive transform is eventually leading to a failed Verifier run during full LTO builds of the Linux kernel with CFI and KCOV enabled, as reported in PR39531. ModuleSanitizerCoveragePass will insert calls to __sanitizer_cov_trace_pc, and sometimes split critical edges, using whatever debug info may or may not exist for the branch for the added libcall. Since we can inline calls to __sanitizer_cov_trace_pc due to LTO, this can lead to the error observed in PR39531 when the debug info isn't propagated to the libcall, because of prior destructive transforms that failed to retain debug info. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D100137	2021-04-12 17:51:21 -07:00
Arthur Eubanks	a8ab1f98d2	[Evaluator] Look through invariant.group intrinsics Turning on -fstrict-vtable-pointers in Chrome caused an extra global initializer. Turns out that a llvm.strip.invariant.group intrinsic was causing GlobalOpt to fail to step through some simple code. We can treat .invariant.group uses as simply their operand. Value::stripPointerCastsForAliasAnalysis() does exactly this. This should be safe because the Evaluator does not skip memory accesses due to invariants or alias analysis. However, we don't want to leak that we've stripped arbitrary pointer casts to users of Evaluator, so we bail out if we evaluate a function to any constant, since we may have looked through .invariant.group calls and aliasing pointers cannot be arbitrarily substituted. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D98843	2021-04-12 16:12:15 -07:00
Nick Desaulniers	4914c98367	[SantizerCoverage] handle missing DBG MD when inserting libcalls Instruction::getDebugLoc can return an invalid DebugLoc. For such cases where metadata was accidentally removed from the libcall insertion point, simply insert a DILocation with line 0 scoped to the caller. When we can inline the libcall, such as during LTO, then we won't fail a Verifier check that all calls to functions with debug metadata themselves must have debug metadata. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D100158	2021-04-12 15:55:58 -07:00
Yuanfang Chen	c5fda0e662	Reland "Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom"" This reverts commit `a3fabc79ae` (relands `f4d682d6ce` with fix for the compile-time regression issue).	2021-04-12 14:50:54 -07:00
Fangrui Song	0a614fff4f	[ARM] Fix -Wmissing-field-initializers	2021-04-12 14:28:23 -07:00
Jian Cai	ed1734931a	Fix up build failures after `cfce5b26a8` Build log: https://lab.llvm.org/buildbot/#/builders/37/builds/3538 Differential Revision: https://reviews.llvm.org/D98916	2021-04-12 14:09:15 -07:00
Nikita Popov	a3fabc79ae	Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom" This reverts commit `f4d682d6ce`. This caused a significant compile-time regression: https://llvm-compile-time-tracker.com/compare.php?from=4b7bad9eaea2233521a94f6b096aaa88dc584e23&to=f4d682d6ce6c5b3a41a0acf297507c82f5c21eef&stat=instructions Possibly this is due to overeager parsing of target triples.	2021-04-12 22:55:59 +02:00
Sanjay Patel	5354a213a0	[InstCombine] fold shift+trunc signbit check https://alive2.llvm.org/ce/z/6vQvrP This solves: https://llvm.org/PR49866	2021-04-12 16:19:43 -04:00
Jian Cai	cfce5b26a8	[ARM] support symbolic expression as immediate in memory instructions Currently the ARM backend only accpets constant expressions as the immediate operand in load and store instructions. This allows the result of symbolic expressions to be used in memory instructions. For example, 0: .space 2048 strb r2, [r0, #(.-0b)] would be assembled into the following instructions. strb r2, [r0, #2048] This only adds support to ldr, ldrb, str, and strb in arm mode to address the build failure of Linux kernel for now, but should facilitate adding support to similar instructions in the future if the need arises. Link: https://github.com/ClangBuiltLinux/linux/issues/1329 Reviewed By: peter.smith, nickdesaulniers Differential Revision: https://reviews.llvm.org/D98916	2021-04-12 12:13:55 -07:00
Sanjay Patel	661cc71a1c	[PassManager][PhaseOrdering] lower expects before running simplifyCFG Retry of `330619a3a6` that includes a clang test update. Original commit message: If we run passes before lowering llvm.expect intrinsics to metadata, then those passes have no way to act on the hints provided by llvm.expect. SimplifyCFG is the known offender, and we made it smarter about profile metadata in D98898 <https://reviews.llvm.org/D98898>. In the motivating example from https://llvm.org/PR49336 , this means we were ignoring the recommended method for a programmer to tell the compiler that a compare+branch is expensive. This change appears to solve that case - the metadata survives to the backend, the compare order is as expected in IR, and the backend does not do anything to reverse it. We make the same change to the old pass manager to keep things synchronized. Differential Revision: https://reviews.llvm.org/D100213	2021-04-12 15:07:53 -04:00
Arthur Eubanks	be00edfee5	[NewPM] Fix -print-changed when a -filter-print-funcs function is removed -filter-print-funcs -print-changed was crashing after the filter func was removed by a pass with Assertion failed: After.find("*** IR Dump") == 0 && "Unexpected banner format." We weren't printing the banner because when we have -filter-print-funcs, we print each function separately, letting the print function filter out unwanted functions. Reviewed By: jamieschmeiser Differential Revision: https://reviews.llvm.org/D100237	2021-04-12 11:55:17 -07:00
Sanjay Patel	23ac9d1e6e	Revert "[PassManager][PhaseOrdering] lower expects before running simplifyCFG" This reverts commit `330619a3a6`. There are clang tests that also need to be updated.	2021-04-12 13:58:54 -04:00
Arthur Eubanks	269b335bd7	[Inliner] Propagate SROA analysis through invariant group intrinsics SROA can handle invariant group intrinsics, let the inliner know that for better heuristics when the intrinsics are present. This fixes size issues in a couple files when turning on -fstrict-vtable-pointers in Chrome. Reviewed By: rnk, mtrofin Differential Revision: https://reviews.llvm.org/D100249	2021-04-12 10:54:22 -07:00
Hamza Sood	0a92aff721	Replace uses of std::iterator with explicit using This patch removes all uses of `std::iterator`, which was deprecated in C++17. While this isn't currently an issue while compiling LLVM, it's useful for those using LLVM as a library. For some reason there're a few places that were seemingly able to use `std` functions unqualified, which no longer works after this patch. I've updated those places, but I'm not really sure why it worked in the first place. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D67586	2021-04-12 10:47:14 -07:00
Fraser Cormack	d737c47137	[RISCV] Support vector SET[U]LT and SET[U]GE with splatted immediates This patch adds more optimized codegen for the above SETCC forms, by matching the '.vi' vector forms when the immediate is a 5-bit signed immediate plus 1. The immediate can be decremented and the corresponding SET[U]LE or SET[U]GT forms can be matched. This work was left as a TODO from D94168. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100096	2021-04-12 18:36:45 +01:00
Yuanfang Chen	f4d682d6ce	[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom D24453 enabled libcalls simplication for ARM PCS. This may cause caller/callee calling conventions mismatch in some situations such as LTO. This patch makes instcombine aware that the compatible calling conventions differences are benign (not emitting undef idom). Differential Revision: https://reviews.llvm.org/D99773	2021-04-12 09:32:23 -07:00
Sanjay Patel	330619a3a6	[PassManager][PhaseOrdering] lower expects before running simplifyCFG If we run passes before lowering llvm.expect intrinsics to metadata, then those passes have no way to act on the hints provided by llvm.expect. SimplifyCFG is the known offender, and we made it smarter about profile metadata in D98898. In the motivating example from https://llvm.org/PR49336 , this means we were ignoring the recommended method for a programmer to tell the compiler that a compare+branch is expensive. This change appears to solve that case - the metadata survives to the backend, the compare order is as expected in IR, and the backend does not do anything to reverse it. We make the same change to the old pass manager to keep things synchronized. Differential Revision: https://reviews.llvm.org/D100213	2021-04-12 12:23:31 -04:00
David Green	dd31b2c6e5	[ARM] Add a number of intrinsics for MVE lane interleaving Add a number of intrinsics which natively lower to MVE operations to the lane interleaving pass, allowing it to efficiently interleave the lanes of chucks of operations containing these intrinsics. Differential Revision: https://reviews.llvm.org/D97293	2021-04-12 17:23:02 +01:00
Stephen Tozer	f2e4f3eff3	Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" The causes of the previous build errors have been fixed in revisions `aa3e78a59f`, and `140757bfaa` This reverts commit `f40976bd01`.	2021-04-12 16:57:29 +01:00
Simon Pilgrim	baadbe04bf	[X86] Fold cmpeq/ne(trunc(logic(x)),0) --> cmpeq/ne(logic(x),0) Fixes the issues noted in PR48768, where the and/or/xor instruction had been promoted to avoid i8/i16 partial-dependencies, but the test against zero had not. We can almost certainly relax this fold to work for any truncation, although it breaks a number of existing folds (notable movmsk folds which tend to rely on the truncate to determine the demanded bits/elts in the source vector). There is a reverse combine in TargetLowering.SimplifySetCC so we must wait until after legalization before attempting this.	2021-04-12 16:05:34 +01:00
Wang, Pengfei	4cbaaf4a24	[X86][AMX] Hoist ldtilecfg The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop. This patch try to calculate the nearest point where all shapes of AMX registers are reachable. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D99010	2021-04-12 22:36:41 +08:00
David Green	6c0a1ed3a9	[ARM] Add FP handling for MVE lane interleaving FP16 to FP32 converts can be handled in MVE lane interleaving, much like the sext/zext lowering we do. This expands the pass with fpext and fptrunc handling, and basic fp operations allowing more efficient lowering of fp vectors. Differential Revision: https://reviews.llvm.org/D97292	2021-04-12 15:28:13 +01:00
Malhar Jajoo	58f3201a20	[ARM] Updates to arm-block-placement pass The patch makes two updates to the arm-block-placement pass: - Handle arbitrarily nested loops - Extends the search (for t2WhileLoopStartLR) to the predecessor of the preHeader. Differential Revision: https://reviews.llvm.org/D99649	2021-04-12 14:46:23 +01:00
Paul C. Anagnostopoulos	489cdedd11	[TableGen] Fix bug in recent change to ListInit::convertInitListSlice() Differential Revision: https://reviews.llvm.org/D100247	2021-04-12 09:44:39 -04:00
Andrew Savonichev	f037b07b5c	Revert "[AArch64] Add Machine InstCombiner patterns for FMUL indexed variant" This reverts commit `cca9b5985c`. Buildbot reported an error for CodeGen/AArch64/machine-combiner-fmul-dup.mir: * Bad machine code: Virtual register killed in block, but needed live out. * - function: indexed_2s - basic block: %bb.0 entry (0x640fee8) Virtual register %7 is used after the block. * Bad machine code: Virtual register defs don't dominate all uses. * - function: indexed_2s - v. register: %7 LLVM ERROR: Found 2 machine code errors.	2021-04-12 16:28:49 +03:00
Andrew Savonichev	cca9b5985c	[AArch64] Add Machine InstCombiner patterns for FMUL indexed variant This patch adds DUP+FMUL => FMUL_indexed pattern to InstCombiner. FMUL_indexed is normally selected during instruction selection, but it does not work in cases when VDUP and VMUL are in different basic blocks. Differential Revision: https://reviews.llvm.org/D99662	2021-04-12 16:08:39 +03:00
Sebastian Neubauer	6cc91adf1e	[AMDGPU] Kill temporary register after restoring Not a correctness issue, but the temporary register is not used afterwards and should be dead. Differential Revision: https://reviews.llvm.org/D100295	2021-04-12 14:20:03 +02:00
Bradley Smith	f2593a0bd1	[AArch64][SVE] Remove redundant PTEST of MATCH/NMATCH results Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D99584	2021-04-12 12:55:00 +01:00
Stephen Tozer	aa3e78a59f	Reapply "[DebugInfo] Correctly track SDNode dependencies for list debug values" Fixed memory leak error by using BumpAllocator for SDDbgValue arrays. This reverts commit `1b589172bd`.	2021-04-12 12:51:29 +01:00
Dmitry Preobrazhensky	67b39661c8	[AMDGPU][MC][NFC] Removed extra spaces Fixed bugs 49646, 49647. Differential Revision: https://reviews.llvm.org/D100173	2021-04-12 13:33:19 +03:00
Sebastian Neubauer	7a8e65dd3d	[AMDGPU] Fix ubsan error The RegScavenger can be null sometimes, so a pointer is needed. Fixes UBSan error introduced in `f9a8c6a0e5`.	2021-04-12 12:14:00 +02:00
Sebastian Neubauer	b76c2a6c2b	[AMDGPU] Fix saving fp and bp Spilling the fp or bp to scratch could overwrite VGPRs of inactive lanes. Fix that by using only the active lanes of the scavenged VGPR. This builds on the assumptions that 1. a function is never called with exec=0 2. lanes do not die in a function, i.e. exec!=0 in the function epilog 3. no new lanes are active when exiting the function, i.e. exec in the epilog is a subset of exec in the prolog. Differential Revision: https://reviews.llvm.org/D96869	2021-04-12 11:52:55 +02:00
Sebastian Neubauer	32bc9a9bc3	[AMDGPU] Unify spill code Instead of reimplementing spilling in prolog and epilog, reuse buildSpillLoadStore. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D99269	2021-04-12 11:19:08 +02:00
Sebastian Neubauer	f9a8c6a0e5	[AMDGPU] Save VGPR of whole wave when spilling Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot determine if a VGPR is used in other lanes or not, so we need to save all lanes of the VGPR. We even need to save the VGPR if it is marked as dead. The generated code depends on two things: - Can we scavenge an SGPR to save EXEC? - And can we scavenge a VGPR? If we can scavenge an SGPR, we - save EXEC into the SGPR - set the needed lane mask - save the temporary VGPR - write the spilled SGPR into VGPR lanes - save the VGPR again to the target stack slot - restore the VGPR - restore EXEC If we were not able to scavenge an SGPR, we do the same operations, but everytime the temporary VGPR is written to memory, we - write VGPR to memory - flip exec (s_not exec, exec) - write VGPR again (previously inactive lanes) Surprisingly often, we are able to scavenge an SGPR, even though we are at the brink of running out of SGPRs. Scavenging a VGPR does not have a great effect (saves three instructions if no SGPR was scavenged), but we need to know if the VGPR we use is live before or not, otherwise the machine verifier complains. Differential Revision: https://reviews.llvm.org/D96336	2021-04-12 11:01:38 +02:00
Stelios Ioannou	a655f250fe	[AArch64] Adds memory operands for indexed loads. This patch adds the memory operands for indexed loads so that certain optimizations can take place. Differential Revision: https://reviews.llvm.org/D100215/ Change-Id: I539fcf046ca4ad1e7df1d893f57d751419d8364d	2021-04-12 09:11:37 +01:00
Zhang Qing Shan	d69c236e1d	[NFC][Debug] Fix unnecessary deep-copy for vector to save compiling time We saw some big compiling time impact after enabling the debug entry value feature for X86 platform(D73534). Compiling time goes from 900s->1600s with our testcase. It is caused by allocating/freeing the memory busily. 'using FwdRegWorklist = MapVector<unsigned, SmallVector<FwdRegParamInfo, 2>>;' The value for this map is vector, and we miss the reference when access the element. The same happens for `auto CalleesMap = MF->getCallSitesInfo();` which is a DenseMap. Reviewed by: djtodoro, flychen50 Differential Revision: https://reviews.llvm.org/D100162	2021-04-12 14:55:03 +08:00
Bing1 Yu	747111ea71	[X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D99244	2021-04-12 13:58:14 +08:00
Evgeniy Brevnov	36b932d6a3	[NARY] Don't optimize min/max if there are side uses Say we have %1=min(%a,%b) %2=min(%b,%c) %3=min(%2,%a) The optimization will try to reassociate the later one so that we can rewrite it to %3=min(%1, %c) and remove %2. But if %2 has another uses outside of %3 then we can't remove %2 and end up with: %1=min(%a,%b) %2=min(%b,%c) %3=min(%1, %c) This doesn't harm by itself except it is not profitable and changes IR for no good reason. What is bad it triggers next iteration which finds out that optimization is applicable to %2 and %3 and generates: %1=min(%a,%b) %2=min(%b,%c) %3=min(%1,%c) %4=min(%2,%a) and so on... The solution is to prevent optimization in the first place if intermediate result (%2) has side uses and known to be not removed. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D100170	2021-04-12 12:43:54 +07:00
Freddy Ye	5cb47be410	[X86] Remove FeatureCLWB from FeaturesICLClient Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100279	2021-04-12 12:08:59 +08:00
Chen Zheng	bb346146a5	[Debug-Info] make fortran CHARACTER(1) type as valid unsigned type This resolves https://bugs.llvm.org/show_bug.cgi?id=49872 Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D100015	2021-04-11 23:17:01 -04:00
Qiu Chaofan	ece7345859	[PowerPC] Lower f128 SETCC/SELECT_CC as libcall if p9vector disabled XSCMPUQP is not available for pre-P9 subtargets. This patch will lower them into libcall for correct behavior on power7/power8. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92083	2021-04-12 10:33:32 +08:00
Jim Lin	a3bfddbb6a	[RISCV][NFC] Remove unneeded explict XLenVT type on codegen patterns Customized SDNode has been specified the explict XLenVT type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100190	2021-04-12 10:16:06 +08:00
Craig Topper	cb4c793e46	[RISCV] Update computeKnownBitsForTargetNode to treat READ_VLENB as being 16 byte aligned. According to the 0.10 spec, VLEN is at least 128 bits and is a power of 2.	2021-04-11 17:54:23 -07:00
Craig Topper	ff902080a9	[RISCV] Use SLLI/SRLI instead of SLLIW/SRLIW for (srl (and X, 0xffff), C) custom isel on RV64. We don't need the sign extending behavior here and SLLI/SRLI are able to compress to C.SLLI/C.SRLI.	2021-04-11 13:59:51 -07:00
Roman Lebedev	8fc8c745cf	[NFCI][SimplifyCFG] PerformValueComparisonIntoPredecessorFolding(): improve Dominator Tree updating Same as with previous patches.	2021-04-11 23:56:23 +03:00
Roman Lebedev	13fca9d816	[NFCI][SimplifyCFG] mergeEmptyReturnBlocks(): improve Dominator Tree updating Same as with previous patches.	2021-04-11 23:56:23 +03:00
Roman Lebedev	0699da1569	[NFCI][Local] MergeBasicBlockIntoOnlyPred(): improve Dominator Tree updating Same as with TryToSimplifyUncondBranchFromEmptyBlock()/MergeBlockIntoPredecessor() patch.	2021-04-11 23:56:23 +03:00
Roman Lebedev	e5692a564a	[NFCI][BasicBlockUtils] MergeBlockIntoPredecessor(): improve Dominator Tree updating Same as with TryToSimplifyUncondBranchFromEmptyBlock() patch.	2021-04-11 23:56:23 +03:00
Roman Lebedev	2def9c3d8e	[NFCI][Local] TryToSimplifyUncondBranchFromEmptyBlock(): improve Dominator Tree updating First, we don't need vector-ness for the predecessor lists. Secondly, like elsewhere, do insertions before deletions. Lastly, the check that we actually need to insert an edge, that it doesn't exist already, is backwards. Instead of looking at successors of every single 'PredOfBB', just always look at predecessors of the 'Succ'. The result is always the same, but we avoid really inefficient code.	2021-04-11 23:56:22 +03:00
Roman Lebedev	6d44b3c56d	[NFCI][DomTreeUpdater] applyUpdates(): reserve space for updates first While, indeed, we may end up pushing less updates that we'd reserve space for, self-dominating updates aren't often enough for that to matter. But this should matter for normal updates.	2021-04-11 23:56:22 +03:00
Simon Pilgrim	231b87618b	[X86][AVX512] Fold not(kmov(x)) -> kmov(not(x)) and not(widen_subvector(x)) -> widen_subvector(not(x)) Improve AVX512 mask inversion, rG38c799bce801 exposed some missing opportunities to move scalar not() back onto the boolvector types for folding with setcc etc.	2021-04-11 20:07:09 +01:00
Thomas Lively	ea8dd3ee2e	[WebAssembly] Update v128.any_true In the final SIMD spec, there is only a single v128.any_true instruction, rather than one for each lane interpretation because the semantics do not depend on the lane interpretation. Differential Revision: https://reviews.llvm.org/D100241	2021-04-11 11:13:16 -07:00
Simon Pilgrim	13bdac5709	[X86] combineXor - Pull out repeated getOperand() calls. NFCI.	2021-04-11 19:01:59 +01:00
Simon Pilgrim	38c799bce8	[X86] Fold cmpeq/ne(and(X,Y),Y) --> cmpeq/ne(and(~X,Y),0) Followup to D100177, handle an similar (demorgan inverse style) case from PR47797 as well The AVX512 test cases could be further improved if we folded not(iX bitcast(vXi1)) -> (iX bitcast(not(vXi1))) Alive2: https://alive2.llvm.org/ce/z/AnA_-W	2021-04-11 18:42:01 +01:00
Craig Topper	3ae71226ef	[RISCV] Drop earlyclobber constraint from vwadd(u).wx, vwsub(u).wx, vfwadd.wf and vfwsub.wf. The first source has the same EEW as the destination and the other source is a scalar so the overlap constraints don't apply to the unmasked version. For the masked version we have a constraint that the destination can't be V0 so that covers the only overlap issue there. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D100217	2021-04-11 10:19:45 -07:00
Craig Topper	bc0e052730	[RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffff) when zext.h is supported. Similar to what we do for zext.w. Disable the (srl (and X, 0xffff), C) custom isel when zext.h is available.	2021-04-11 10:03:35 -07:00
Roman Lebedev	91248e2db9	[InstCombine] Improve "get low bit mask upto and including bit X" pattern https://alive2.llvm.org/ce/z/3u-48R	2021-04-11 18:08:08 +03:00
Roman Lebedev	a36bb7fd76	[InstCombine] (X \| Op01C) + Op1C --> X + (Op01C + Op1C) iff the or is actually an add https://alive2.llvm.org/ce/z/Coc5yf	2021-04-11 18:08:08 +03:00
Roman Lebedev	005881e96e	[LoopIdiom] left-shift-until-bittest: set all allowed no-wrap flags on add/sub I've checked each one of these with alive2, and this is both correct and precise.	2021-04-11 18:08:07 +03:00
Arthur Eubanks	c88b87f9ce	Revert "Remove "Rewrite Symbols" from codegen pipeline" This reverts commit `6210261ecb`. addr-label.ll crashes on armv7.	2021-04-10 23:28:16 -07:00
Arthur Eubanks	6210261ecb	Remove "Rewrite Symbols" from codegen pipeline It breaks up the function pass manager in the codegen pipeline. With empty parameters, it looks at the -mllvm flag -rewrite-map-file. This is likely not in use. Add a check that we only have one function pass manager in the codegen pipeline. This required reverting commit 9583a3f2625818b78c0cf6d473cdedb9f23ad82c: "[AsmPrinter] Delete dead takeDeletedSymbsForFunction()". This was not NFC as initially thought. By coalescing two function psas managers, this exposed the reverted code as necessary. addr-label.ll was crashing due to an emitted blockaddress's block being removed but the label not emitted. Some tests relied on the fact that we had a module pass somewhere in the codegen pipeline. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99707	2021-04-10 22:38:44 -07:00
Roman Lebedev	9829f5e6b1	[CVP] @llvm.[us]{min,max}() intrinsics handling If we can tell that either one of the arguments is taken, bypass the intrinsic. Notably, we are indeed fine with non-strict predicate: * UL: https://alive2.llvm.org/ce/z/69qVW9 https://alive2.llvm.org/ce/z/kNFTKf https://alive2.llvm.org/ce/z/AvaPw2 https://alive2.llvm.org/ce/z/oxo53i * UG: https://alive2.llvm.org/ce/z/wxHeGH https://alive2.llvm.org/ce/z/Lf76qx * SL: https://alive2.llvm.org/ce/z/hkeTGS https://alive2.llvm.org/ce/z/eR_b-W * SG: https://alive2.llvm.org/ce/z/wEqRm7 https://alive2.llvm.org/ce/z/FpAsVr Much like with all other comparison handling in CVP, while we could sort-of handle two Value's, at least for plain ICmpInst it does not appear to be worthwhile. This only fires 78 times on test-suite + dt + rs, but we don't canonicalize to these yet. (only SCEV produces them)	2021-04-11 00:33:47 +03:00
Nikita Popov	8de2f1ff79	[IVUsers] Check LoopSimplify cache earlier (NFC) Check the cache before calling isLoopSimplifyForm(). Otherwise we'd always perform the check for the innermost loop and only skip it for dominating loops.	2021-04-10 22:58:13 +02:00
Wenlei He	00ef28ef21	[CSSPGO] Fix dangling context strings and improve profile order consistency and error handling This patch fixed the following issues along side with some refactoring: 1. Fix bugs where StringRef for context string out live the underlying std::string. We now keep string table in profile generator to hold std::strings. We also do the same for bracketed context strings in profile writer. 2. Make sure profile output strictly follow (total sample, name) order. Previously, there's inconsistency between ProfileMap's key and FunctionSamples's name, leading to inconsistent ordering. This is now fixed by introducing context profile canonicalization. Assertions are also added to make sure ProfileMap's key and FunctionSamples's name are always consistent. 3. Enhanced error handling for profile writing to make sure we bubble up errors properly for both llvm-profgen and llvm-profdata when string table is not populated correctly for extended binary profile. 4. Keep all internal context representation bracket free. This avoids creating new strings for context trimming, merging and preinline. getNameWithContext API is now simplied accordingly. 5. Factor out the code for context trimming and merging into SampleContextTrimmer in SampleProf.cpp. This enables llvm-profdata to use the trimmer when merging profiles. Changes in llvm-profgen will be in separate patch. Differential Revision: https://reviews.llvm.org/D100090	2021-04-10 12:39:10 -07:00
Roman Lebedev	f041757e9c	[NFC][JumpThreading] Increment 'NumFolds' statistic all places terminator becomes uncond	2021-04-10 21:24:29 +03:00
Roman Lebedev	a407738def	[NFC][CVP] Add statistic for function pointer argument non-null-ness deduction	2021-04-10 21:23:20 +03:00
Roman Lebedev	fe7b3ad8d5	[CVP] LVI: Use in-block values when checking value signedness domain This has a huge positive impact on all the folds that use these helpers, as it can be seen on vanilla test-suite + rawspeed + darktable: correlated-value-propagation.NumSRems +75.68% (+ 28) correlated-value-propagation.NumAShrs +63.87% (+198) correlated-value-propagation.NumSDivs +49.42% (+127) correlated-value-propagation.NumSExt + 8.85% (+593) correlated-value-propagation.NumUDivURemsNarrowed + 8.65% (+34) ... while having pretty minimal compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=e8c7f43e2c2c6f3581ec1c6489ec21ad9f98958a&to=4cd197711e58ee1b2faeee0c35eea54540185569&stat=instructions	2021-04-10 21:10:59 +03:00
Roman Lebedev	257eda0794	[NFC][LVI] getPredicateAt(): drop default value for UseBlockValue The default is likely wrong. Out of all the callees, only a single one needs to pass-in false (JumpThread), everything else either already passes true, or should pass true. Until the default is flipped, at least make it harder to unintentionally add new callees with UseBlockValue=false.	2021-04-10 20:46:01 +03:00
Roman Lebedev	e8c7f43e2c	[NFC][ConstantRange] Add 'icmp' helper method "Does the predicate hold between two ranges?" Not very surprisingly, some places were already doing this check, without explicitly naming the algorithm, cleanup them all.	2021-04-10 19:38:55 +03:00
Roman Lebedev	7b12c8c59d	Revert "[NFC][ConstantRange] Add 'icmp' helper method" This reverts commit `17cf2c9423`.	2021-04-10 19:37:53 +03:00
Roman Lebedev	17cf2c9423	[NFC][ConstantRange] Add 'icmp' helper method "Does the predicate hold between two ranges?" Not very surprisingly, some places were already doing this check, without explicitly naming the algorithm, cleanup them all.	2021-04-10 19:09:52 +03:00
Roman Lebedev	c329a47d9e	[CVP] @llvm.abs() handling Iff we know the sigdness domain of the argument, we can either skip @llvm.abs, or do negation directly. Notably, INT_MIN can belong to either domain: * X u<= INT_MIN --> X is always fine https://alive2.llvm.org/ce/z/QB8j-C https://alive2.llvm.org/ce/z/7sFKpS * X s<= 0 --> -X is always fine https://alive2.llvm.org/ce/z/QbGSyq https://alive2.llvm.org/ce/z/APsN84 If all else fails, try to inferr NSW flag: https://alive2.llvm.org/ce/z/qCJfYm	2021-04-10 16:47:31 +03:00
dfukalov	8f4b7e94a2	[AMDGPU][CostModel] Refine cost model for control-flow instructions. Added cost estimation for switch instruction, updated costs of branches, fixed phi cost. Had to increase `-amdgpu-unroll-threshold-if` default value since conditional branch cost (size) was corrected to higher value. Test renamed to "control-flow.ll". Removed redundant code in `X86TTIImpl::getCFInstrCost()` and `PPCTTIImpl::getCFInstrCost()`. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D96805	2021-04-10 09:20:24 +03:00

... 2 3 4 5 6 ...

146169 Commits