llvm-project

Commit Graph

Author	SHA1	Message	Date
Amaury Séchet	b30bbd181b	Small formating nit in DAGCombiner. NFC	2022-09-26 13:36:11 +00:00
Yeting Kuo	43c5fbdd3a	[VP][RISCV] Add vp.sqrt intrinsic and RISC-V support. The patch modeled vp.fabs patch D132793. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D133690	2022-09-26 10:47:40 +08:00
Yi Kong	32994b7357	Make MLIR model URLs cache variables This allows us to directly use the models published on Github. Differential Revision: https://reviews.llvm.org/D134566	2022-09-23 15:21:53 -07:00
Afanasyev Ivan	d2e434c378	[RegisterCoalescer] fix dst subreg replacement during remat copy trick Instructions might use definition register as its "undef" operand. It happens on architectures with predicated executon: ``` %0:subreg = instruction op_1, ..., op_N, undef %0:subreg, op_N+2, ... ``` RegisterCoalescer should take into account all remat instruction operands during destination subregister fixup. ``` ; remat result before fix: %1 = instruction op_1, ..., op_N, undef %1:subreg, op_N+2, ... ; remat result after fix (correct): %1 = instruction op_1, ..., op_N, undef %1, op_N+2, ... ``` Differential Revision: https://reviews.llvm.org/D125657	2022-09-23 18:52:29 +00:00
Philip Reames	b9c4733079	[DAG] Move one-use add of splat to base of scatter/gather This extends the uniform base transform used with scatter/gather to support one-use vector adds-of-splats with a non-zero base. This has the effect of essentially reassociating an add from vector to scalar domain. The motivation is to improve the lowering of scatter/gather operations fed by complex geps. Differential Revision: https://reviews.llvm.org/D134472	2022-09-22 18:45:12 -07:00
Eric Wang	83c53d346f	[NFC][MLGO] Introduce logRewardIfNeeded method This patch introduces a logRewardIfNeeded method to reuse regallocscoring. Differential Revision: https://reviews.llvm.org/D134232	2022-09-22 19:22:32 -05:00
Philip Reames	60c91fd364	[RISCV] Disallow scale for scatter/gather RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely. This has two effects: * First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to. * Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.) The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics. Differential Revision: https://reviews.llvm.org/D134382	2022-09-22 15:31:26 -07:00
Craig Topper	9d236d4dab	[SelectionDAGBuilder] Simplify how VTs is created for constrained intrinsics. NFC All constrained intrinsics return a single value. We can directly convert it to an EVT instead of going through ComputeValueTypes.	2022-09-22 14:21:22 -07:00
Philip Reames	46525fee81	[DAGCombine] Check both forms of a commutative transform The transform to fold an add into the base of a scatter/gather was only checking to see if the LHS was a splat. Included test change indicates that splats are not canonicalized to LHS, and that we need to check both sides.	2022-09-22 12:21:47 -07:00
Jonathan Camilleri	08288052ae	[DebugInfo] Emit access specifiers for typedefs The accessibility level of a typedef or using declaration in a struct or class was being lost when producing debug information. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D134339	2022-09-22 17:08:41 +00:00
Amara Emerson	885a87033c	[GlobalISel] Enforce G_ASSERT_ALIGN to have a valid alignment > 0.	2022-09-22 16:05:07 +01:00
Matt Arsenault	94ebd7d9ff	MachineVerifier: Verify REG_SEQUENCE Somehow there was no verification of this, other than an ad-hoc assertion in TwoAddressInstructions.	2022-09-22 09:51:15 -04:00
Christudasan Devadasan	32a8260ccc	-dot-machine-cfg for printing MachineFunction to a dot file This pass allows a user to dump a MIR function to a dot file and view it as a graph. It is targeted to provide a similar functionality as -dot-cfg pass on LLVM-IR. As of now the pass also support below flags: -dot-mcfg-only [optional][won't print instructions in the graph just block name] -mcfg-dot-filename-prefix [optional][prefix to add to output dot file] -mcfg-func-name [optional] [specify function name or it's substring, handy if mir file contains multiple functions and you need to see graph of just one] More flags and details can be introduced as per the requirements in future. This pass is inspired from -dot-cfg IR pass and APIs are written in almost identical format. Patch by Yashwant Singh <Yashwant.Singh@amd.com> (yassingh) Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D133709	2022-09-22 12:48:33 +05:30
Fangrui Song	2d975f1efe	[GlobalISel] Fix std::max after D134380	2022-09-21 14:09:04 -07:00
Amara Emerson	85cd376f70	[GlobalISel] Fix known bits for G_ASSERT_ALIGN. I don't know what was going on originally with these tests. It seems reasonable to have the immediate be the same byte alignment unit as the IR, in which case we need to take the log2 in order to set the right number of low bits. This fixes a miscompile in chromium. Differential Revision: https://reviews.llvm.org/D134380	2022-09-21 21:34:05 +01:00
Guozhi Wei	c39311eb40	[RegisterCoalescer] Use LiveRangeEdit to handle rematerialization This patch uses the API provided by LiveRangeEdit to handle rematerialization. It will make future maintenance and improvement more easier. No functional change. Differential Revision: https://reviews.llvm.org/D133610	2022-09-21 17:51:07 +00:00
Philip Reames	143f3bf8f4	[SDAG] Split handling of VPLoad/VPGather and VPStore/VPScatter [nfc] The merged routines are not-idiomatic, and the code sharing that results is prettty minimal. The confusion factor is not justified.	2022-09-21 09:06:02 -07:00
Thomas Symalla	c98a46fee6	[ISel] Enable generating more fma instructions. This patch changes a FADD / FMUL => FMA ISel pattern implemented in D80801 so that it peeks through more than one FMA. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D132837	2022-09-21 12:03:11 +02:00
Matt Arsenault	b9a371f6d1	AtomicExpand: Use correct pointer size for integer This was using the default address space.	2022-09-20 16:51:05 -04:00
Amara Emerson	78833a43e8	[GlobalISel][Legalizer] Fix lowerSelect() not sign-extending the mask value. I'm not sure why the SEXT_INREG was gated on a bitwidth check of the mask vs element size. This fixes a miscompile in chromium's skia library. Differential Revision: https://reviews.llvm.org/D134236	2022-09-20 16:40:34 +01:00
Matt Arsenault	34fb7803f8	GlobalISel: Pass through AssumptionCache	2022-09-19 19:10:51 -04:00
Matt Arsenault	bcb931c484	SelectionDAG: Add AssumptionCache analysis dependency Fixes compile time regression after `bb70b5d406`	2022-09-19 19:10:51 -04:00
Matt Arsenault	0d8ffcc532	Analysis: Add AssumptionCache argument to isDereferenceableAndAlignedPointer This does not try to pass it through from the end users.	2022-09-19 18:57:33 -04:00
Simon Pilgrim	8206044183	[DAG] SimplifyDemandedVectorElts - add MULHS/MULHU handling to existing MUL/AND handling Allows to determine known zero elements, which particularly helps simplification of DIV/REM by constant patterns	2022-09-19 12:44:43 +01:00
Simon Pilgrim	47cfe71027	[DAG] MatchRotate - reuse existing LHSShiftArg/RHSShiftArg variables. NFC.	2022-09-18 14:35:10 +01:00
Kai Nacke	ae35188f97	[GISel] Fix match tree emitter. The following changes are necessasy to get the generated tree matcher to compile: - In CodeExpansions::declare(), the assert() prevents connecting two instructions. E.g. the match code (match (MUL $t, $s1, $s2), (SUB $d, $t, $s3)), results in two declarations of $t, one for the def and one for the use. Removing the assertion allows this construct. If $t is later used, it is one of the operands, which should be perfectly fine. - The code emitted in GIMatchTreeVRegDefPartitioner::generatePartitionSelectorCode() is not compilable: - The value of NewInstrID should be emitted, not the name - Both calls involving getOperand() end with one parenthesis too many - Swaps generated condition for the partition code in the latter function It also changes the rules i2p_to_p2i, fabs_fabs_fold, and fneg_fneg_fold to use the tree matcher for a linear match. These rules are tested by: CodeGen/AArch64/GlobalISel/combine-fabs.mir CodeGen/AArch64/GlobalISel/combine-fneg.mir CodeGen/AArch64/GlobalISel/combine-ptrtoint.mir CodeGen/AMDGPU/GlobalISel/combine-add-nullptr.mir Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D133257	2022-09-18 00:00:15 +00:00
Aiden Grossman	e5e3dccd07	[mlgo] Add in-development instruction based features for regalloc advisor This patch adds in instruction based features to the regalloc advisor gated behind a flag so a user can decide at runtime whether or not they want to enable the feature. The features are only enabled when LLVM is compiled in MLGO develpment mode (LLVM_HAVE_TF_API) is set to true. To extract the instruction features, I'm taking a list of segments from each LiveInterval and noting the start and end SlotIndices. This list is then sorted based on the start SlotIndex and I iterate through each SlotIndex to grab instructions, making sure to check for overlaps. This results in a vector of opcodes and binary mapping matrix that maps live ranges to the opcodes of the instructions within that LR. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D131930	2022-09-17 19:54:45 +00:00
Kazu Hirata	20d764aff0	[llvm] Don't including SetVector.h (NFC) llvm/lib/ProfileData/RawMemProfReader.cpp uses SetVector without including SetVector.h, so this patch adds an appropriate #include there.	2022-09-17 12:36:43 -07:00
Vladislav Dzhidzhoev	6cf11f4462	[GlobalISel][DebugInfo] Salvage trivially dead instructions Use salvageDebugInfo for instructions erased as trivially dead in GlobalISel. It would be helpful to implement support of G_PTR_ADD and G_FRAME_INDEX in salvageDebugInfo in future in order to preserve more variable location. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D133986	2022-09-17 03:54:55 +03:00
Sotiris Apostolakis	b827e7c600	[SelectOpti] Restrict load sinking This is a follow-up to D133777, which resolved a use-after-free case but did not cover all possible memory bugs due to misplacement of loads. In short, the overall problem was that sinked loads could be moved after state-modifying instructions leading to memory bugs. The solution is to restrict load sinking unless it is found to be sound. i) Within a basic block (to-be-sinked load and select-user are in the same BB), loads can be sinked only if there is no intervening state-modifying instruction. This is a conservative approach to avoid resorting to alias analysis to detect potential memory overlap. ii) Across basic blocks, sinking of loads is avoided. This is because going over multiple basic blocks looking for memory conflicts could be computationally expensive and also unlikely to allow loads to sink. Further, experiments showed that not sinking these loads has a slight positive performance effect. Maybe for some of these loads, having some separation allows enough time for the load to be executed in time for its user. This is not the case for floating point operations that benefit more from sinking. The solution in D133777 was essentially undone in this patch, since the latter is a complete solution to the observed problem. Overall, the performance impact of this patch is minimal. Tested on two internal Google workloads with instrPGO. Search application showed <0.05% perf difference, while the database one showed a slight improvement, but not statistically significant. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D133999	2022-09-16 20:50:46 +00:00
Jessica Paquette	1076b31da8	[GlobalISel] Combine select + fcmp to fminnum/fmaxnum/fminimum/fmaximum This is a partial port of the code used by the SelectionDAGBuilder to translate selects. In particular, see matchSelectPattern in ValueTracking.cpp. This is a GISel-equivalent of the portion which handles fminnum/fmaxnum/fminimum/fmaximum. I tried to set it up so it'd be easy to add the non-FP cases. Those are simpler. On the AArch64-end, it seems like the FP cases are more important for perf right now, so I bit the bullet and went at the more complicated problem. :) I elected to do this as a post-legalize combine rather than in the IRTranslator because Deciding which fmax/fmin to use can depend on legalization rules Philosophically-speaking (TM), putting it in a combine just feels cleaner Being able to enable/disable the combine is handy Another option would be to use the ValueTracking code in the IRTranslator and match what SelectionDAGBuilder::visitSelect does. I think that may be somewhat annoying since we'd need to write lowerings back into the selects in the legalizer. I'm not strongly opposed to the approach. We'd also want to be careful with vector selects once that's implemented, which explicitly check if a vector select is legal on the target. That'd probably need a hook. From what I can tell, doing this as a combine is probably a cleaner option long-term. Differential Revision: https://reviews.llvm.org/D116702	2022-09-16 13:35:46 -07:00
Pengxuan Zheng	59365f33e2	[MachineCSE] Add a threshold to avoid spending too much time in isProfitableToCSE Currently, it can become extremely costly to compute MayIncreasePressure if the size of CSUses turns out to be very large. In that case, it's no longer cost effective to keep computing MayIncreasePressure. Therefore, to limit the amount of time spent in isProfitableToCSE, we simply conservatively assume MayIncreasePressure if the size of CSUses is too large. This can reduce overall compile time by 30% for some benchmarks. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134003	2022-09-16 13:35:17 -07:00
Craig Topper	1121eca685	[VP][VE] Default VP_SREM/UREM to Expand and add generic expansion using VP_SDIV/UDIV+VP_MUL+VP_SUB. I want to default all VP operations to Expand. These 2 were blocking because VE doesn't support them and the tests were expecting them to fail a specific way. Using Expand caused them to fail differently. Seemed better to emulate them using operations that are supported. @simoll mentioned on Discord that VE has some expansion downstream. Not sure if its done like this or in the VE target. Reviewed By: frasercrmck, efocht Differential Revision: https://reviews.llvm.org/D133514	2022-09-16 13:19:02 -07:00
David Majnemer	8a868d8859	Revert "Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView"" This reverts commit `cd20a18286` and adds a "let Heading" to NoStackProtectorDocs.	2022-09-16 19:39:48 +00:00
Stanislav Mekhanoshin	a0c8f5fefa	[SDAG] Print divergence in SDNode::dump If target does not support divergence the field is set to false and not printed. Differential Revision: https://reviews.llvm.org/D133984	2022-09-16 11:43:34 -07:00
Juan Manuel MARTINEZ CAAMAÑO	e438f2d821	[DAGCombine] Do not fold SRA/SRL of MUL into MULH when MUL's LSB are used, and MUL_LOHI is available Folding into a sra(mul) / srl(mul) into a mulh introduces an extra multiplication to compute the high half of the multiplication, while it is more profitable to compute the high and lower halfs with a single mul_lohi. Differential Revision: https://reviews.llvm.org/D133768	2022-09-16 15:48:36 +00:00
Liqiang Tao	2e37557fde	StackProtector: ensure stack checks are inserted before the tail call The IR stack protector pass should insert stack checks before the tail calls not only the musttail calls. So that the attributes `ssqreq` and `tail call`, which are emited by llvm-opt, could be both enabled by llvm-llc. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D133860	2022-09-16 22:24:46 +08:00
Florian Hahn	6b86b481e3	[AArch64] Use tbl for truncating vector FPtoUI conversions. On AArch64, doing the vector truncate separately after the fptoui conversion can be lowered more efficiently using tbl.4, building on D133495. https://alive2.llvm.org/ce/z/T538CC Depends on D133495 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133496	2022-09-16 14:57:43 +01:00
Florian Hahn	8491d01cc3	[AArch64] Lower vector trunc using tbl. Similar to using tbl to lower vector ZExts, tbl4 can be used to lower vector truncates. The initial version support i32->i8 conversions. Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133495	2022-09-16 12:42:49 +01:00
Nikita Popov	b4309800e9	[CodeGen] Don't zero callee-save registers with zero-call-used-regs (PR57692) Callee save registers must be preserved, so -fzero-call-used-regs should not be zeroing them. The previous implementation only did not zero callee save registers that were saved&restored inside the function, but we need preserve all of them. Fixes https://github.com/llvm/llvm-project/issues/57692. Differential Revision: https://reviews.llvm.org/D133946	2022-09-16 11:52:29 +02:00
Florian Hahn	5871f18827	[AArch64] Lower extending uitofp using tbl. On AArch64, doing the zero-extend separately first can be lowered more efficiently using tbl, building on D120571. https://alive2.llvm.org/ce/z/8Je595 Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133494	2022-09-16 10:20:25 +01:00
Yuta Mukai	116838b151	[MachinePipeliner] Fix the interpretation of the scheduling model The method of counting resource consumption is modified to be based on "Cycles" value when DFA is not used. The calculation of ResMII is modified to total "Cycles" and divide it by the number of units for each resource. Previously, ResMII was excessive because it was assumed that resources were consumed for the cycles of "Latency" value. The method of resource reservation is modified similarly. When a value of "Cycles" is larger than 1, the resource is considered to be consumed by 1 for cycles of its length from the scheduled cycle. To realize this, ResourceManager maintains a resource table for all slots. Previously, resource consumption was always 1 for 1 cycle regardless of the value of "Cycles" or "Latency". In addition, the number of micro operations per cycle is modified to be constrained by "IssueWidth". To disable the constraint, --pipeliner-force-issue-width=100 can be used. For the case of using DFA, the scheduling results are unchanged. Reviewed By: dpenry Differential Revision: https://reviews.llvm.org/D133572	2022-09-16 09:51:48 +09:00
Alexander Timofeev	fbdea5a2e9	[AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler. It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D133593	2022-09-15 22:03:56 +02:00
Florian Hahn	81a11da762	[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl. This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops using a wide shuffle creating a v64i8 vector, selecting groups of 3 zero elements and an element from the input. This is profitable on AArch64 where such shuffles can be lowered to tbl instructions, but only in loops, because it requires materializing 4 masks, which can be done in the loop preheader. This is the only reason the transform is part of CGP. If there's a better alternative I missed, please let me know. The same goes for the shouldReplaceZExtWithShuffle hook which guards this. I am not sure if this transform will be beneficial on other targets, but it seems like there is no way other convenient way. This improves the generated code for loops like the one below in combination with D96522. int foo(uint8_t p, int N) { unsigned long long sum = 0; for (int i = 0; i < N ; i++, p++) { unsigned int v = p; sum += (v < 127) ? v : 256 - v; } return sum; } https://clang.godbolt.org/z/Wco866MjY Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D120571	2022-09-15 19:18:13 +01:00
Sergei Barannikov	c6acb4eb0f	[SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s All in-tree targets pass pointer-sized ConstantSDNodes to the method. This overload reduced amount of boilerplate code a bit. This also makes getCALLSEQ_END consistent with getCALLSEQ_START, which already takes uint64_ts.	2022-09-15 14:02:12 -04:00
Matt Arsenault	63d1d37d35	RegAllocGreedy: Avoid overflowing priority bitfields The class priority is expected to be at most 5 bits before it starts clobbering bits used for other fields. Also clamp the instruction distance in case we have millions of instructions. AMDGPU was accidentally overflowing into the global priority bit in some cases. I think in principal we would have wanted this, but in the cases I've looked at, it had the counter intuitive effect and de-prioritized the large register tuple. Avoid using weird bit hack PPC uses for global priority. The AllocationPriority field is really 5 bits, and PPC was relying on overflowing this to 6-bits to forcibly set the global priority bit. Split this out as a separate flag to avoid having magic behavior for values above 31.	2022-09-15 10:38:40 -04:00
Stanislav Mekhanoshin	ef4b9c33f5	Fix crash while printing MMO target flags MachineMemOperand::print can dereference a NULL pointer if TII is not passed from the printMemOperand. This does not happen while dumping the DAG/MIR from llc but crashes the debugger if a dump() method is called from gdb. Differential Revision: https://reviews.llvm.org/D133903	2022-09-14 17:29:48 -07:00
Roland Froese	207228c1d6	[DAGCombiner] More load-store forwarding for big-endian Get some load-store forwarding cases for big-endian where a larger store covers a smaller load, and the offset would be 0 and handled on little-endian but on big-endian the offset is adjusted to be non-zero. The idea is just to shift the data to make it look like the offset 0 case. Differential Revision: https://reviews.llvm.org/D130115	2022-09-14 15:36:35 -04:00
Marco Elver	4627a30acf	[MIR] Support printing and parsing pcsections Adds support for printing and parsing PC sections metadata in MIR. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D133785	2022-09-14 10:30:25 +02:00
YongKang Zhu	5fa6b24354	Address feedback in https://reviews.llvm.org/D133637 https://reviews.llvm.org/D133637 fixes the problem where we should hash raw content of register mask instead of the pointer to it. Fix the same issue in `llvm::hash_value()`. Remove the added API `MachineOperand::getRegMaskSize()` to avoid potential confusion. Add an assert to emphasize that we probably should hash a machine operand iff it has associated machine function, but keep the fallback logic in the original change. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D133747	2022-09-13 16:12:41 -07:00
Matt Arsenault	d00b0aab31	RegisterCoalescer: Fix verifier error when merging copy of undef There's no real read of the register, so the copy introduced a new live value. Make sure we introduce a replacement implicit_def instead of just erasing the copy. Found from llvm-reduce since it tries to set undef on everything.	2022-09-13 18:40:28 -04:00
Sotiris Apostolakis	eda61fb656	[SelectOpti] Fix lifetime intrinsic bug When a select is converted to a branch and load instructions are sinked to the true/false blocks, lifetime intrinsics (if present) could be made unsound if not moved. This conservatively moves all lifetime intrinsics in a transformed BB to the end block to ensure preserved lifetime semantics. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D133777	2022-09-13 19:00:18 +00:00
Craig Topper	efd5acf120	[LegalizeTypes][NVPTX] Remove extra compare from fallback code for ISD::ADD in ExpandIntRes_ADDSUB. This is the ultimate fallback code if UADDO isn't supported. If the target uses 0/1 we used one compare, but if the target doesn't use 0/1 we emitted two compares. Regardless of boolean constants we should only need to check that the Result is less than one of the original operands. So we only need one compare. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D133708	2022-09-13 09:07:56 -07:00
Hendrik Greving	393a17b5d1	[ValueTypes] Define MVTs for v256i2/v128i4. Adds MVT::v256i2, MVT::v128i4. Differential Revision: https://reviews.llvm.org/D133603	2022-09-13 09:02:23 -07:00
Matt Arsenault	740f920a1f	LiveRegUnits: Break register loop when a clobber is encountered	2022-09-13 10:15:08 -04:00
Matt Arsenault	b7dae832e6	DeadMachineInstructionElim: Don't repeat per-function init This was happening for every iteration but only needs to be done once.	2022-09-13 08:19:54 -04:00
Matt Arsenault	243632c63e	LiveRegUnits: Cleanup isReg checks This is the common case and should be checked first. Provides a very marginal compile time improvement on the example I'm looking at.	2022-09-13 08:19:53 -04:00
Pavel Samolysov	02aaf8e3d6	[NFC][ScheduleDAGInstrs] Use structure bindings and emplace_back Some uses of std::make_pair and the std::pair's first/second members in the ScheduleDAGInstrs.[cpp\|h] files were replaced with using of the vector's emplace_back along with structure bindings from C++17.	2022-09-13 12:49:04 +03:00
Sylvestre Ledru	cd20a18286	Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView" Causing: https://github.com/llvm/llvm-project/issues/57709 This reverts commit `ab56719acd`.	2022-09-13 10:53:59 +02:00
David Green	f124e59b2e	[TypePromotionPass] Don't treat phi's as ToPromote This attempts to stop the type promotion pass transforming where it is not profitable, by not marking PhiNodes as ToPromote and being more aggressive about pulling extends out of loops. Differential Revision: https://reviews.llvm.org/D133203	2022-09-13 08:57:15 +01:00
Matt Arsenault	920b2e65fc	DeadMachineInstructionElim: Fix typo	2022-09-12 20:10:33 -04:00
Aiden Grossman	eec183c171	[nfc] Refactor SlotIndex::getInstrDistance to better reflect actual functionality This patch refactors SlotIndex::getInstrDistance to SlotIndex::getApproxInstrDistance to better describe the actual functionality of this function. This patch also adds in some additional comments better documenting the assumptions that this function makes to increase clarity. Based on discussion on the LLVM Discourse: https://discourse.llvm.org/t/odd-behavior-in-slotindex-getinstrdistance/64934/5 Reviewed By: mtrofin, foad Differential Revision: https://reviews.llvm.org/D133386	2022-09-12 23:33:35 +00:00
Amara Emerson	f24f469223	[GlobalISel] Fix crash when lowering G_SELECT of pointer vectors. The bit masking lowering only works for vectors of scalars, so for pointer element types we need to add some casting. Differential Revision: https://reviews.llvm.org/D133672	2022-09-13 00:01:37 +01:00
Matt Arsenault	d90f7cb559	LiveRegUnits: Do not use phys_regs_and_masks Somehow DeadMachineInstructionElim is about 3x slower when using it. Hopefully this reverses the compile time regression reported for `b5041527c7`.	2022-09-12 17:21:24 -04:00
David Majnemer	ab56719acd	[clang, llvm] Add __declspec(safebuffers), support it in CodeView __declspec(safebuffers) is equivalent to __attribute__((no_stack_protector)). This information is recorded in CodeView. While we are here, add support for strict_gs_check.	2022-09-12 21:15:34 +00:00
Kazu Hirata	9606608474	[llvm] Use x.empty() instead of llvm::empty(x) (NFC) I'm planning to deprecate and eventually remove llvm::empty. I thought about replacing llvm::empty(x) with std::empty(x), but it turns out that all uses can be converted to x.empty(). That is, no use requires the ability of std::empty to accept C arrays and std::initializer_list. Differential Revision: https://reviews.llvm.org/D133677	2022-09-12 13:34:35 -07:00
YongKang Zhu	481a32f587	Bug fix on stable hash calculation for machine operands RegisterMask and RegisterLiveOut MachineOperand::getRegMask() returns a pointer to register mask. We should hash the raw content of register mask instead of its pointer. Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D133637	2022-09-12 13:25:04 -07:00
Craig Topper	38ffa2bb96	[LegalizeTypes] Improve splitting for urem/udiv by constant for some constants. For remainder: If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer type, this urem will be expand by DAGCombiner using multiply by magic constant. We do have to take into account that adding high and low together can produce a carry, making it a (BitWidth / 2)+1 bit number. So we need to also add back in the carry from the first addition. For division: We can use the above trick to compute the remainder, subtract that remainder from the dividend, then multiply by the multiplicative inverse of the Divisor modulo (1 << BitWidth). This is based on the section "Remainder by Summing Digits" in Hacker's delight. The remainder trick is similar to a trick you may have learned for determining if a decimal number is divisible by 3. You can add all the digits together and see if the sum is divisible by 3. If you're not sure if the sum is divisible by 3, you can add its digits together. This can be repeated until you have a single decimal digit. If that digit is 3, 6, or 9, then the original number is divisible by 3. This works because 10 % 3 == 1. gcc already does this same trick. There are additional tricks gcc does urem as well as srem, udiv, and sdiv that I plan to add in future patches. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130862	2022-09-12 10:34:52 -07:00
Matthias Gehre	c1502425ba	Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth Also remove new-pass-manager version of ExpandLargeDivRem because there is no way yet to access TargetLowering in the new pass manager. Differential Revision: https://reviews.llvm.org/D133691	2022-09-12 17:06:16 +01:00
Jay Foad	210e6a993d	[GlobalISel] Simplify extended add/sub to add/sub with carry Simplify extended add/sub (with carry-in and carry-out) to add/sub with carry (with carry-out only) if carry-in is known to be zero. Differential Revision: https://reviews.llvm.org/D133702	2022-09-12 17:05:44 +01:00
Matt Arsenault	e30271169f	RegAllocGreedy: Try local instruction splitting with subranges This was only trying this to relax register class constraints, but this can also help if there are subranges involved. This solves a compilation failure for AMDGPU when there is high pressure created by large register tuples. If one virtual register is using most of the available budget, we need to be able to evict subranges. This solves the immediate failure, but this solution leaves a lot to be desired. In the relevant testcases, we have 32-element tuples but most of the uses are operations on 1 element subranges of it. What we're now getting is a spill and restore of the full 1024 bits and an extract of the used 32-bits. It would be far better if we introduced a copy to a new virtual register with a smaller register class and used narrower spills. Furthermore, we could probably do a better job if the allocator were to introduce new subranges where none previously existed in the highest pressure scenarios. The block and region splits should also try to split specific subranges out. The mve-vst3.ll test changes looks like noise to me, but instruction count increased by one. mve-vst4.ll looks like a solid improvement with several 16-byte spills eliminated. splitkit-copy-live-lanes.mir also shows a solid reduction in total spill count. This could use more tests but it's pretty tiring to come up with cases that fail on this.	2022-09-12 09:03:55 -04:00
Pavel Samolysov	f045d0c392	[NFC][ScheduleDAG] Use a reference to iterate over NodeSuccs/ChainSuccs	2022-09-12 15:54:48 +03:00
Pavel Samolysov	05946c144d	[NFC][ScheduleDAG] Use structure bindings and emplace_back Some uses of std::make_pair and the std::pair's first/second members in the ScheduleDAGRRList.cpp file were replaced with using of the vector's emplace_back along with structure bindings from C++17.	2022-09-12 15:54:48 +03:00
Matt Arsenault	c34679b2e8	DAG: Sink some getter code closer to uses	2022-09-12 08:38:35 -04:00
Matt Arsenault	bb70b5d406	CodeGen: Set MODereferenceable from isDereferenceableAndAlignedPointer Previously this was assuming piontsToConstantMemory implies dereferenceable.	2022-09-12 08:38:35 -04:00
Pavel Samolysov	354a3d9c02	[NFC][ScheduleDAG] Use Register and MCPhysReg instead of unsigned	2022-09-12 15:18:11 +03:00
Matt Arsenault	b5041527c7	DeadMachineInstructionElim: Switch to using LiveRegUnits Theoretically improves compile time for targets with many overlapping registers	2022-09-12 07:55:14 -04:00
Matt Arsenault	6c44a7179f	RegAlloc: Use SmallSet instead of std::set There shouldn't be more than a small handful of hints at most.	2022-09-12 07:55:10 -04:00
David Spickett	739b69e655	[LLVM][AArch64] Explain that X19 is used as the frame base pointer register Fixes #50098 LLVM uses X19 as the frame base pointer, if it needs to. Meaning you can get warnings if you clobber that with inline asm. However, it doesn't explain why. The frame base register is not part of the ABI so it's pretty confusing why you get that warning out of the blue. This adds a method to explain a reserved register with X19 as the first one. The logic is the same as getReservedRegs. I could have added a return parameter to isASMClobberable and friends but found that there's a lot of things that call isReservedReg in various ways. So while one more method on the pile isn't great design, it is simpler right now to do it this way and only pay the cost if you are actually using a reserved register. Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D133213	2022-09-12 09:18:09 +00:00
Kazu Hirata	af91e2b9db	[GlobalISel] Use std::initializer_list::size (NFC)	2022-09-11 12:19:37 -07:00
Manuel Brito	b51c6130ef	Use PoisonValue instead of UndefValue when RAUWing unreachable code [NFC] Replacing the following instances of UndefValue with PoisonValue, where the UndefValue is used as an arbitrary value: - llvm/lib/CodeGen/WinEHPrepare.cpp `demotePHIsOnFunclets`: RAUW arbitrary value for lingering uses of removed PHI nodes - llvm/lib/Transforms/Utils/BasicBlockUtils.cpp `FoldSingleEntryPHINodes`: Removes a self-referential single entry phi node. - llvm/lib/Transforms/Utils/CallGraphUpdater.cpp `finalize`: Remove all references to removed functions. - llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp `cleanup`: the result is not used then the inserted instructions are removed. - llvm/tools/bugpoint/CrashDebugger.cpp `TestInts`: the program is cloned and instructions are removed to narrow down source of crash. Differential Revision: https://reviews.llvm.org/D133640	2022-09-10 14:28:01 +01:00
Craig Topper	545affbf79	[DAGCombiner] Use HandleSDNode to keep node alive across call to getNegatedExpression. getNegatedExpression can delete nodes. If the first call to getNegatedExpression produced a node that the second call also manages to create, it might get deleted. Use a HandleSDNode to ensure it has a use to prevent it from being deleted. Fixes PR57658. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133602	2022-09-09 22:02:41 -07:00
Sebastian Neubauer	c7750c522e	Add helper func to get first non-alloca position The LLVM performance tips suggest that allocas should be placed at the beginning of the entry block. So far, llvm doesn’t provide any helper to find that position. Add BasicBlock::getFirstNonPHIOrDbgOrAlloca and IRBuilder::SetInsertPointPastAllocas(Function*) that get an insert position after the (static) allocas at the start of a function and use it in ShadowStackGCLowering. Differential Revision: https://reviews.llvm.org/D132554	2022-09-09 15:39:53 +02:00
Craig Topper	aa83bdd198	[DAGCombiner][X86] Fold (sub (subcarry X, 0, Carry), Y) -> (subcarry X, Y, Carry) Fixes PR57576. Differential Revision: https://reviews.llvm.org/D133471	2022-09-08 22:56:46 -07:00
Joe Loser	5e96cea1db	[llvm] Use std::size instead of llvm::array_lengthof LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpful for C++17 or later: `std::size` already has support for C-style arrays. Change call sites to use `std::size` instead. Differential Revision: https://reviews.llvm.org/D133429	2022-09-08 09:01:53 -06:00
Eric Wang	d8a2d3f7d4	[NFC][Regalloc] Introduce the RegAllocPriorityAdvisorAnalysis This patch introduces the priority analysis and the priority advisor, the default implementation, and the scaffolding for introducing the other implementations of the advisor. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D132835	2022-09-08 07:50:03 -07:00
Arthur Eubanks	7e0a52e8e9	[NFC][MachineFunctionPass] Only lookup pass name if we request printing Should report the small compile time regression reported in D133055.	2022-09-07 21:38:00 -07:00
Marco Elver	343700358f	[AsmPrinter] Emit PCs into requested PCSections Interpret MD_pcsections in AsmPrinter emitting the requested metadata to the associated sections. Functions and normal instructions are handled. Differential Revision: https://reviews.llvm.org/D130879	2022-09-07 11:36:02 +02:00
Marco Elver	31a548021b	[GlobalISel] Propagate PCSections metadata to MachineInstr Propagate (most) PC sections metadata to MachineInstr when GlobalISel is doing instruction selection. This change results in support for architectures using GlobalISel (such as -O0 with AArch64). Not all instructions may be supported yet, and requires further target-specific handling (such as done for AArch64 pseudo-atomics). Expanding supported instructions is planned on a case-by-case basis and new use cases for PC sections metadata. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130886	2022-09-07 11:36:02 +02:00
Marco Elver	f0d6709e4a	[AtomicExpandPass] Always copy pcsections Metadata to expanded atomics When expanding IR atomics to target-specific atomics, copy all !pcsections Metadata to expanded atomics automatically. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130885	2022-09-07 11:36:01 +02:00
Marco Elver	0ba8886af5	[FastISel] Propagate PCSections metadata to MachineInstr Propagate PC sections metadata to MachineInstr when FastISel is doing instruction selection. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130884	2022-09-07 11:36:01 +02:00
Marco Elver	4c58b00801	[SelectionDAG] Propagate PCSections through SDNodes Add a new entry to SDNodeExtraInfo to propagate PCSections through SelectionDAG. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130882	2022-09-07 11:22:50 +02:00
Xiang1 Zhang	16743c9534	[CodeGen] Limit building time in CodeGenPrepare for huge function Details: Currently CodeGenPrepare is very time consuming in handling big functions. Old Algorithm : It iterate each BB in function, and go on handle very instructions in BB. Due to some instruction optimizations may affect the BBs' dominate tree. The old logic will re-iterate and try optimize for each BB. Suppose we have a big function with 20000 BBs, If we handled the last BB with fine tuning the dominate tree. We need totally re-iterate and try optimize the 20000 BBs from the beginning. The Complex is near N! And we really encounter somes big tests (> 20000 BBs) that cost more than 30 mins in this pass. (Debug version compiler will cost 2 hours here) What this patch do for huge function ? It mainly changes the iteration way for optimization. 1 We do optimizeBlock for each BB (that is same with old way). And, in the meaning time, If BB is changed/updated in the optimization, it will be put into FreshBBs (try do optimizeBlock again). The new created BB at previous iteration will also put into FreshBBs. 2 For the BBs which not updated at previous iteration, we directly skip it. Strictly speaking, here may miss some opportunity, but the probability is very small. 3 For Instructions in single BB, we do optimizeInst for each instruction. If optimizeInst change the instruction dominator in this BB, rather than break and go back to optimize the first BB (the old way), we directly iterate instructions (to do optimizeInst) in this updated BB again (the new way). What this patch do for small/normal (not huge) function ? It is same with the Old Algorithm. (NFC) Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D129352	2022-09-07 10:05:40 +08:00
Markus Böck	f049b2c3fc	[MC] Emit Stackmaps before debug info This patch is essentially an alternative to https://reviews.llvm.org/D75836 and was mentioned by @lhames in a comment. The gist of the issue is that Mach-O has restrictions on which kind of sections are allowed after debug info has been emitted, which is also properly asserted within LLVM. Problem is that stack maps are currently emitted as one of the last sections in each target-specific AsmPrinter so far, which would cause the assertion to trigger. The current approach of special casing for the `__LLVM_STACKMAPS` section is not viable either, as downstream users can overwrite the stackmap format using plugins, which may want to use different sections. This patch fixes the issue by emitting the stack map earlier, right before debug info is emitted. The way this is implemented is by taking the choice when to emit the StackMap away from the target AsmPrinter and doing so in the base class. The only disadvantage of this approach is that the `StackMaps` member is now part of the base class, even for targets that do not support them. This is functionaly not a problem however, as emitting an empty `StackMaps` is a no-op. Differential Revision: https://reviews.llvm.org/D132708	2022-09-06 20:20:56 +02:00
Amara Emerson	fe7c3b87ce	Add parantheses to silence warning.	2022-09-06 15:36:19 +01:00
Marco Elver	7d63983c65	[SelectionDAG] Properly copy ExtraInfo on RAUW During SelectionDAG legalization SDNodes with associated extra info may be replaced with a new SDNode. Preserve associated extra info on ReplaceAllUsesWith and remove entries in DeallocateNode. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130881	2022-09-06 16:32:50 +02:00
Marco Elver	cc3faf4226	[SelectionDAG] Rename CallSiteDbgInfo to NodeExtraInfo For information infrequently attached to SDNodes, it is useful to provide a way to add this information out-of-line. This is already done for call-site specific information. Rename CallSiteDbgInfo to NodeExtraInfo in preparation of adding additional information not necessarily related to call sites only. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130880	2022-09-06 16:32:50 +02:00
Matthias Gehre	2090e85fee	[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64 This adds the ExpandLargeDivRem to the default pass pipeline. The limit at which it expands div/rem instructions is configured via a new TargetTransformInfo hook (default: no expansion) X86, Arm and AArch64 backends implement this hook to expand div/rem instructions with more than 128 bits. Differential Revision: https://reviews.llvm.org/D130076	2022-09-06 15:32:04 +01:00
Marco Elver	42836e283f	[MachineInstr] Allow setting PCSections in ExtraInfo Provide MachineInstr::setPCSection(), to propagate relevant metadata through the backend. Use ExtraInfo to store the metadata. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130876	2022-09-06 15:52:44 +02:00
Amara Emerson	3dd861818a	[GlobalISel] Combine G_INSERT/EXTRACT_VECTOR_ELT with out of bounds indices to undef. Differential Revision: https://reviews.llvm.org/D133309	2022-09-06 13:45:04 +01:00
Benjamin Kramer	c349d7f4ff	[SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path The main difference is that this preserves intermediate rounding steps, which the other route doesn't. This aligns bfloat16 more with half floats, which use this path on most targets. I didn't understand what the difference was between these softening approaches when I first added bfloat lowerings, would be nice if we only had one of them. Based on @pengfei 's D131502 Differential Revision: https://reviews.llvm.org/D133207	2022-09-06 11:54:34 +02:00
Daniil Fukalov	51d33afcbe	[RegisterCoalescer] Fix crash on early clobbered subreg operands. The issue was with processing two subregs of the same reg are used in the same instruction (e.g. inline asm): "def early-clobber" and other just "def". Register coalescer ran in bad recursion if the early clobbered subreg is second in the following sequence of COPYs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127136	2022-09-06 08:42:37 +03:00
Daniil Fukalov	99d364d1f4	[MachineVerifier] Fix crash on early clobbered subreg operands. MachineVerifier tried to checkLivenessAtDef() ignoring it is actually a subreg. The issue was with processing two subregs of the same reg are used in the same instruction (e.g. inline asm): "def early-clobber" and other just "def". Reviewed By: foad Differential Revision: https://reviews.llvm.org/D126661	2022-09-05 17:08:21 +03:00
David Sherwood	ffa6267300	[CodeGen] Support extracting fixed-length vectors from illegal scalable vectors For some indices we can simply extract the fixed-length subvector from the low half of the scalable vector, for example when the index is less than the minimum number of elements in the low half. For all other cases we can expand the operation through the stack by storing out the vector and reloading the fixed-length part we need. Fixes https://github.com/llvm/llvm-project/issues/55412 Tests added here: CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll Differential Revision: https://reviews.llvm.org/D117499	2022-09-05 15:05:14 +01:00
Amara Emerson	fb60e50c78	[GlobalISel] Fix a combine crash due to a negative G_INSERT_VECTOR_ELT idx. These should really be folded away to undef but we shouldn't crash in any case.	2022-09-05 12:10:17 +01:00
Simon Pilgrim	4e6783f866	[DAG] getFreeze()/getNode() - account for operand depth when calling isGuaranteedNotToBeUndefOrPoison (PR57554) Similar to #57402 - we were calling isGuaranteedNotToBeUndefOrPoison on the freeze operand (with Depth = 0), but wasn't accounting for the fact that a later isGuaranteedNotToBeUndefOrPoison assertion will call from the new node (with Depth = 0 as well) - which will then recursively call isGuaranteedNotToBeUndefOrPoison for its operands with Depth = 1 Fixes #57554	2022-09-05 11:46:46 +01:00
Craig Topper	e529c0a2a0	[TargetLowering] Use ComputeMaxSignificantBits instead of ComputeNumSignBits in expandMUL_LOHI. NFC The way ComputeNumSignBits was being used was only correct if OuterBitSize is exactly 2x InnerBitSize. Which is always true, but not obviously so. Comparing ComputeMaxSignificantBits to InnerBitSize feels more correct.	2022-09-04 22:35:16 -07:00
Craig Topper	45e2809f71	[TargetLowering] Use getShiftAmountConstant. NFC	2022-09-04 20:22:32 -07:00
Kazu Hirata	32aa35b504	Drop empty string literals from static_assert (NFC) Identified with modernize-unary-static-assert.	2022-09-03 11:17:47 -07:00
Kazu Hirata	fedc59734a	[llvm] Use range-based for loops (NFC)	2022-09-03 11:17:40 -07:00
Kazu Hirata	89f1433225	Use llvm::lower_bound (NFC)	2022-09-03 11:17:37 -07:00
Kazu Hirata	bc96b36a41	[CodeGen] Use std::lcm (NFC)	2022-09-03 11:17:33 -07:00
Simon Pilgrim	62cdfdab4d	[DAG] canCreateUndefOrPoison - add freeze(insert_subvector(x,y,c)) -> insert_subvector(freeze(x),freeze(y),c) support We already have plenty of assertions in place to ensure that the insertion index is constant and inrange	2022-09-03 13:41:33 +01:00
Simon Pilgrim	e2d140e9c3	[TTI] Add isExpensiveToSpeculativelyExecute wrapper CGP uses a raw `getInstructionCost(I, TargetTransformInfo::TCK_SizeAndLatency) >= TCC_Expensive` check to see if its better to move an expensive instruction used in a select behind a branch instead. This is causing issues with upcoming improvements to TCK_SizeAndLatency costs on X86 as we need to use TCK_SizeAndLatency as an uop count (so its compatible with various target-specific buffer sizes - see D132288), but we can have instructions that have a low TCK_SizeAndLatency value but should still be treated as 'expensive' (FDIV for example) - by adding a isExpensiveToSpeculativelyExecute wrapper we can keep the current behaviour but still add an x86 override in a future patch when the cost tables are updated to compensate.	2022-09-03 13:12:22 +01:00
Daniil Fukalov	b4e1b0e00d	[LiveIntervals] Split live intervals on any dead def Each dead def of the same virtual register is required to be split into multiple virtual registers with separate live intervals to avoid MachineVerifier error. Partially fixes https://github.com/llvm/llvm-project/issues/56050 and https://github.com/llvm/llvm-project/issues/56051 Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D130477	2022-09-02 20:00:22 +03:00
David Green	5073499b69	[TypePromotionPass] Rename variable to avoid name conflict. NFC	2022-09-02 12:35:15 +01:00
Fangrui Song	8d95fd7e56	[MachineFunctionPass] Support -filter-passes for -print-changed [MachineFunctionPass] Support -filter-passes for -print-changed -filter-passes specifies a `PassID` (a lower-case dashed-separated pass name, also used by -print-after, -stop-after, etc) instead of a CamelCasePass. `-filter-passes=CamelCaseNewPMPass` seems like a workaround for new PM passes before we can use lower-case dashed-separated pass names (as used by `-passes=`). Example: ``` # getPassName() is "IRTranslator". PassID is "irtranslator" llc -mtriple=aarch64 -print-changed -filter-passes=irtranslator < print-changed-machine.ll ``` Close https://github.com/llvm/llvm-project/issues/57453 Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D133055	2022-09-01 11:06:06 -07:00
Nikita Popov	5134bd432f	[DwarfEhPrepare] Assign dummy debug location for inserted _Unwind_Resume calls (PR57469) DwarfEhPrepare inserts calls to _Unwind_Resume into landing pads. If _Unwind_Resume happens to be defined in the same module and debug info is used, then this leads to a verifier error: inlinable function call in a function with debug info must have a !dbg location call void @_Unwind_Resume(ptr %exn.obj) #0 Fix this by assigning a dummy location to the call. (As this happens in the backend, inlining is not actually relevant here.) Fixes https://github.com/llvm/llvm-project/issues/57469. Differential Revision: https://reviews.llvm.org/D133095	2022-09-01 16:35:49 +02:00
Nikita Popov	c635ea5c50	[CombinerHelper] Avoid deprecated method (NFC)	2022-09-01 16:09:05 +02:00
Stephen Tozer	211efaa1ce	Reapply "[DebugInfo] Extend the InstrRef LDV to support DbgValues with many Ops" Re-landing with an erroneous assert removed. This reverts commit `58d104b352`.	2022-09-01 14:20:24 +01:00
Amara Emerson	4cf3db41da	[GlobalISel] Add sdiv exact (X, constant) -> mul combine. This port of the SDAG optimization is only for exact sdiv case. Differential Revision: https://reviews.llvm.org/D130517	2022-09-01 13:34:00 +01:00
Craig Topper	77dbc5200b	[MachineCSE] Use TargetInstrInfo::isAsCheapAsAMove in isPRECandidate. Some targets like RISC-V require operands to be inspected to determine if an instruction is similar to a move. Spotted while investigating code differences between using an ADDI vs an ADDIW. RISC-V has the isAsCheapAsAMove flag for ADDI, but the TII hook checks the immediate is 0 or the register is X0. ADDIW is never generated with X0 or with an immediate of 0 so it doesn't have the isAsCheapAsAMove flag. I don't know enough about the PRE code to write a test for this yet. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132981	2022-08-31 15:39:41 -07:00
Sam Clegg	c5c4ba37b1	[WebAssembly][MC] Avoid the need for .size directives for functions Warn if `.size` is specified for a function symbol. The size of a function symbol is determined solely by its content. I noticed this simplification was possible while debugging #57427, but this change doesn't fix that specific issue. Differential Revision: https://reviews.llvm.org/D132929	2022-08-31 14:28:56 -07:00
Nick Desaulniers	d7474bef77	[llvm][TailDuplicator] don't taildup isInlineAsmBrIndirectTargets This fixes a crash observed after https://reviews.llvm.org/D129997. Similar to D88823. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D130127	2022-08-31 13:07:10 -07:00
Simon Pilgrim	eaede4b5b7	[DAG] extractShiftForRotate - replace assertion for shift opcode with an early-out We feed the result from the first extractShiftForRotate call into the second, and that result might no longer be a shift op (usually due to constant folding). NOTE: We REALLY need to stop creating nodes on the fly inside extractShiftForRotate! Fixes Issue #57474	2022-08-31 15:50:48 +01:00
Simon Pilgrim	9d22800275	[DAG] visitFreeze - account for operand depth when calling isGuaranteedNotToBeUndefOrPoison (PR57402) We were calling isGuaranteedNotToBeUndefOrPoison on operands (with Depth = 0), but wasn't accounting for the fact that a later isGuaranteedNotToBeUndefOrPoison assertion will call from the new node (with Depth = 0 as well) - which will then recursively call isGuaranteedNotToBeUndefOrPoison for its operands with Depth = 1 Fixes #57402	2022-08-31 12:20:30 +01:00
Kai Luo	ad2f7fd286	[AtomicExpand] Make floating point conversion happens before fence insertion IIUC, the conversion part is not part of atomic operations and fences should be put around converted atomic operations. This also fixes atomic load of floating point values which requires fence on PowerPC. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D127609	2022-08-31 09:54:58 +08:00
Markus Böck	2fdf963daf	[GlobalISel] Explicitly fail trying to translate `gc.statepoint` and related intrinsics The provided testcase would previously fail with an assertion due to later down below trying to allocate registers for `token` return types and arguments. This is especially problematic as the process would then exit instead of falling back to using FastIsel. This patch fixes that by simply explicitly failing translation if either of these intrinsics are encountered. Fixes https://github.com/llvm/llvm-project/issues/57349 Differential Revision: https://reviews.llvm.org/D132974	2022-08-31 00:47:17 +02:00
David Penry	9aca7b0217	[ModuloScheduler] Fix missing LLVM_DEBUG Guard a debug message with LLVM_DEBUG Differential Revision: https://reviews.llvm.org/D132895	2022-08-30 09:20:37 -07:00
Tomas Matheson	9a390d6692	[AArch64][GISel] fix G_ADD/G_SUB legalization widenScalarDst updates the insert point to after MI, so widenScalarSrc must be called before widenScalarDst. Otherwise The updated Src values will appear after MI and break SSA. e.g.: %14:_(s64), %15:_(s1) = G_UADDE %9:_, %11:_, %13:_ becomes %14:_(s64), %16:_(s32) = G_UADDE %9:_, %11:_, %17:_ %15:_(s1) = G_TRUNC %16:_(s32) %17:_(s32) = G_ZEXT %13:_(s1) Differential Revision: https://reviews.llvm.org/D132547 Change-Id: Ie3458747a6879433f4d5ab9939d2bd102dd0f2db	2022-08-30 10:59:32 +01:00
Xiang1 Zhang	a808ac2e42	[NFC] Clang-format for CodeGenPrepare.cpp	2022-08-30 13:42:36 +08:00
wanglian	e2bb9774b1	[LegalizeTypes] Support widen result for VECTOR_REVERSE. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D132359	2022-08-30 10:01:26 +08:00
Craig Topper	2f811a6c7f	[VP][RISCV] Add vp.fabs intrinsic and RISC-V support. Mostly just modeled after vp.fneg except there is a "functional instruction" for fneg while fabs is always an intrinsic. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D132793	2022-08-29 09:32:06 -07:00
Kazu Hirata	267f21a21b	Use std::gcd (NFC) This patch replaces calls to greatestCommonDivisor with std::gcd where two arguments are of the same type. This means that std::common_type_t of the argument type is the same as the argument type. We could drop calls to std::abs in some cases, but that's left for another patch.	2022-08-28 10:41:51 -07:00
Kazu Hirata	d1688e9ddf	[llvm] Use std::gcd (NFC) This patch replaces calls to greatestCommonDivisor with std::gcd where both arguments are known to be of unsigned. This means that std::common_type_t of the two argument types should just be the wider one of the two.	2022-08-27 23:54:29 -07:00
Kazu Hirata	9d6ab7230b	[GlobalISel] Use std::lcm (NFC) This patch replaces getLCMSize with std::lcm, a C++17 feature. Note that all the arguments are of unsigned with no implicit type conversion as they are passed to getLCMSize.	2022-08-27 09:53:16 -07:00
Kazu Hirata	21de2888a4	Use llvm::is_contained (NFC)	2022-08-27 09:53:11 -07:00
Matthias Gehre	3e39b27101	[llvm/CodeGen] Add ExpandLargeDivRem pass Adds a pass ExpandLargeDivRem to expand div/rem instructions with more than 128 bits into a loop computing that value. As discussed on https://reviews.llvm.org/D120327, this approach has the advantage that it is independent of the runtime library. This also helps the clang driver, which otherwise would need to understand enough about the runtime library to know whether to allow _BitInts with more than 128 bits. Targets are still free to disable this pass and instead provide a faster implementation in a runtime library. Fixes https://github.com/llvm/llvm-project/issues/44994 Differential Revision: https://reviews.llvm.org/D126644	2022-08-26 11:55:15 +01:00
Simon Pilgrim	88c7b16bed	[DAG] Strip poison generating flags in freeze(op()) -> op(freeze()) fold This patch follows the InstCombine approach of stripping poison generating flags (nsw/nuw from add/sub etc.) to allow us to push a freeze() through the op. Unlike InstCombine it doesn't retain any flags, but we have plenty of DAG folds that do the same thing already. We assert that the newly generated op isGuaranteedNotToBeUndefOrPoison. Similar to the ValueTracking approach, isGuaranteedNotToBeUndefOrPoison has been updated to confirm that if an op can't create undef/poison and its operands are guaranteed not to be undef/poison - then its not undef/poison. This is just for the generic opcodes - target specific opcodes will need to do this manually just in case they have some special cases. Differential Revision: https://reviews.llvm.org/D132333	2022-08-26 11:47:51 +01:00
Matthias Gehre	6d13b80fcb	Revert "[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers" This reverts https://reviews.llvm.org/D120329. I abandoned the PR [0] to add __divei4 functions to compiler-rt in favor of adding a pass to transform div/rem [1]. This removes the backend code that was supposed to emit calls to the __divei4 functions. [0] https://reviews.llvm.org/D120327 [1] https://reviews.llvm.org/D130076 Differential Revision: https://reviews.llvm.org/D130079	2022-08-26 10:52:56 +01:00
Alex Richardson	0483b00875	Mark the $local function begin symbol as a function While this does not matter for most targets, when building for Arm Morello, we have to mark the symbol as a function and add size information, so that LLD can correctly evaluate relocations against the local symbol. Since Morello is an out-of-tree target, I tried to reproduce this with in-tree backends and with the previous reviews applied this results in a noticeable difference when targeting Thumb. Background: Morello uses a method similar Thumb where the encoding mode is specified in the LSB of the symbol. If we don't mark the target as a function, the relocation will not have the LSB set and calls will end up using the wrong encoding mode (which will almost certainly crash). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D131429	2022-08-26 09:34:04 +00:00
wanglian	2887d7786f	[DAGCombiner] Use FoldConstantArithmetic instead of dyn_cast in visitFP_ROUND. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D132439	2022-08-25 11:29:05 +08:00
Matthias Braun	5364f49407	Fix CSR update check D132080 introduced a bug leading to `RegisterClassInfo` caches not getting invalidated when there was exactly one more CSR register added. Differential Revision: https://reviews.llvm.org/D132606	2022-08-24 18:09:49 -07:00
Sami Tolvanen	cff5bef948	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Relands `67504c9549` with a fix for 32-bit builds. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 22:41:38 +00:00
Sami Tolvanen	a79060e275	Revert "KCFI sanitizer" This reverts commit `67504c9549` as using PointerEmbeddedInt to store 32 bits breaks 32-bit arm builds.	2022-08-24 19:30:13 +00:00
Sami Tolvanen	67504c9549	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 18:52:42 +00:00
spupyrev	8d5b694da1	extending code layout alg The diff modifies ext-tsp code layout algorithm in the following ways: (i) fixes merging of cold block chains (this is a port of D129397); (ii) adjusts the cost model utilized for optimization; (iii) adjusts some APIs so that the implementation can be used in BOLT; this is a prerequisite for D129895. The only non-trivial change is (ii). Here we introduce different weights for conditional and unconditional branches in the cost model. Based on the new model it is slightly more important to increase the number of "fall-through unconditional" jumps, which makes sense, as placing two blocks with an unconditional jump next to each other reduces the number of jump instructions in the generated code. Experimentally, this makes a mild impact on the performance; I've seen up to 0.2%-0.3% perf win on some benchmarks. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D129893	2022-08-24 09:40:25 -07:00
Simon Pilgrim	f9de13232f	[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis. For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling. Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU. Differential Revision: https://reviews.llvm.org/D132520	2022-08-24 17:28:18 +01:00
Stephen Tozer	58d104b352	Revert "[DebugInfo] Extend the InstrRef LDV to support DbgValues with many Ops" Reverting due to reported errors when running Linux kernel builds with KMSAN -gdwarf-4. This reverts commit `2cb9e1ac42`.	2022-08-24 15:24:32 +01:00
Simon Pilgrim	5377abcde2	[DAG] matchRotateHalf - constify SelectionDAG arg. NFC. Based off Issue #57283 - we need to try harder to ensure we're not creating nodes on-the-fly - so make sure we're just using SelectionDAG for analysis where possible	2022-08-24 10:57:38 +01:00
Simon Pilgrim	e624f8a3bb	[DAG] MatchRotate - bail if we fail to match a shl/srl pair extractShiftForRotate may fail to return canonicalized shifts due to constant folding or other simplification that can occur in getNode() Fixes Issue #57283	2022-08-24 03:05:07 +01:00
Sanjay Patel	f8dfbea324	[SDAG] expand more is-power-of-2 patterns that use popcount (ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0) Adjust the legality check to avoid the poor codegen on AArch64. We probably only want to use popcount on this pattern when it is a single instruction. fixes #57225 Differential Revision: https://reviews.llvm.org/D132237	2022-08-23 17:53:53 -04:00
Stephen Tozer	2cb9e1ac42	[DebugInfo] Extend the InstrRef LDV to support DbgValues with many Ops This patch builds on prior support patches to enable support for variadic debug values in InstrRefLDV, allowing DBG_VALUE_LISTs to have their ranges extended. Differential Revision: https://reviews.llvm.org/D128212	2022-08-23 20:17:09 +01:00
Arthur Eubanks	d6cc7a5b46	[FastISel] Respect musttail over "disable-tail-calls" musttail should be honored even in the presence of attributes like "disable-tail-calls". SelectionDAG properly handles this. Update LangRef to explicitly mention that this is the semantics of musttail. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D132193	2022-08-23 08:55:40 -07:00
Jakub Kuderski	6fa87ec10f	[ADT] Deprecate is_splat and replace all uses with all_equal See the discussion thread for more details: https://discourse.llvm.org/t/adt-is-splat-and-empty-ranges/64692 Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D132335	2022-08-23 11:36:27 -04:00
Stephen Tozer	89d0cc99ec	[DebugInfo][InstrRef] Handle transfers of variadic debug values in LDV This patch adds the last of the changes required to enable DBG_VALUE_LIST handling in InstrRefLDV, handling variadic debug values during the transfer tracking step. Most of the changes are fairly straightforward, and based around tracking multiple locations per variable in TransferTracker::VLocTracker. Differential Revision: https://reviews.llvm.org/D128211	2022-08-23 15:01:28 +01:00
Thomas Symalla	562accddaa	[NFC] Fix typo in dbg message in RegisterCoalescer. funcion => function	2022-08-23 15:14:45 +02:00
Stephen Tozer	b12e5c884f	[DebugInfo][InstrRef][NFC] Emit variadic debug values from InstrRefLDV In preparation for supporting DBG_VALUE_LIST in InstrRefLDV, this patch adds the logic for emitting DBG_VALUE_LIST instructions from InstrRefLDV. The logical changes here are fairly simple, with the main change being that instead of directly prepending offsets to the DIExpr, we use appendOpsToArg to modify the expression for individual debug operands in the expression. The function emitLoc is also changed to take a list of debug ops, with an empty list meaning an undef value. Differential Revision: https://reviews.llvm.org/D128209	2022-08-23 13:22:56 +01:00
Denis Antrushin	d1fd791e72	[TwoAddressInstruction] Handle pointer compare sunk past statepoint. CodeGenPrepare pass can sink pointer comparison across statepoint to the point of use (see comment in IR/SafepointIRVerifier.cpp) Due to specifics of statepoints, it is still legal to have tied def and use rewritten to the same register in TwoAddress pass. However, properly updating LiveIntervals and LiveVariables becomes complicated. For simplicity, let's fall back to generic handling of tied registers when we detect such case. TODO: This fixes functional (assertion) failure. Ideally we should try to recompute new live range/liveness in place. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D132255	2022-08-23 12:34:11 +03:00
Stephen Tozer	53fd5af689	[DebugInfo] Let InstrRefBasedLDV handle joins for lists of debug ops In preparation for adding support for DBG_VALUE_LIST instructions in InstrRefLDV, this patch updates the logic for joining variables at block joins to support joining variables that use multiple debug operands. This is one of the more meaty "logical" changes, although the line count isn't too high - this changes pickVPHILoc to find a valid joined location for every operand, with part of the function being split off into pickValuePHILoc which finds a location for a single operand. Differential Revision: https://reviews.llvm.org/D128180	2022-08-22 20:22:22 +01:00
David Penry	ced705c440	[ModuloSchedule] Add interface call to accept/reject SMS schedules This interface allows a target to reject a proposed SMS schedule. For Hexagon/PowerPC, all schedules are accepted, leaving behavior unchanged. For ARM, schedules which exceed register pressure limits are rejected. Also, two RegisterPressureTracker methods now need to be public so that register pressure can be computed by more callers. Reapplication of D128941/(reversion:D132037) with small fix. Differential Revision: https://reviews.llvm.org/D132170	2022-08-22 12:10:13 -07:00
Philip Reames	274f86e7a6	[TTI] Remove OperandValueKind/Properties from getArithmeticInstrCost interface [nfc] This completes the client side transition to the OperandValueInfo version of this routine. Backend TTI implementations still use the prior versions for now.	2022-08-22 11:06:32 -07:00
Kazu Hirata	36ec4deca5	[LiveDebugValues] Fix a warning This patch fixes: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.h:330:5: error: anonymous types declared in an anonymous union are an extension [-Werror,-Wnested-anon-types]	2022-08-22 10:57:16 -07:00
Stephen Tozer	b5ba5d2aab	[DebugInfo][NFC] Represent DbgValues with multiple ops in IRefLDV In preparation for allowing InstrRefBasedLDV to handle DBG_VALUE_LIST, this patch updates the internal representation that it uses to represent debug values to store a list of values. This is one of the more significant changes in terms of line count, but is fairly simple and should not affect the output of this pass. Differential Revision: https://reviews.llvm.org/D128177	2022-08-22 18:04:38 +01:00
Matthias Braun	b2542c40b9	RegisterClassInfo: Fix CSR cache invalidation `RegisterClassInfo` caches information like allocation orders and reuses it for multiple machine functions where possible. However the `MCPhysReg *CalleeSavedRegs` field used to test whether the set of callee saved registers changed did not work: After D28566 `MachineRegisterInfo::getCalleeSavedRegs()` can return dynamically computed CSR sets that are only valid while the `MachineRegisterInfo` object of the current function exists. This changes the code to make a copy of the CSR list instead of keeping a possibly invalid pointer around. Differential Revision: https://reviews.llvm.org/D132080	2022-08-22 09:28:26 -07:00
Stephen Tozer	11ce014a12	[DebugInfo][NFC] Update LDV to use generic DBG_VALUE* MI interface Currently, InstrRefLDV only handles DBG_VALUE instructions, not DBG_VALUE_LIST, and as a result of this it handles these instructions using functions that only work for that type of debug value, i.e. using getOperand(0) to get the debug operand. This patch changes this to use the generic debug value functions, such as getDebugOperand and isDebugOffsetImm, as well as adding an IsVariadic field to the DbgValueProperties class and a few other minor changes to acknowledge DBG_VALUE_LISTs. Note that this patch does not add support for DBG_VALUE_LIST here, but is a precursor to other patches that do add that support. Differential Revision: https://reviews.llvm.org/D128174	2022-08-22 16:28:12 +01:00
Stephen Tozer	53125e7d91	[DebugInfo] Handle joins PHI+Def values in InstrRef LiveDebugValues In the InstrRefBasedImpl for LiveDebugValues, we attempt to propagate debug values through basic blocks in part by checking to see whether all a variable's incoming debug values to a BB "agree", i.e. whether their properties match and they refer to the same underlying value. Prior to this patch, the check for agreement between incoming values relied on exact equality, which meant that a VPHI and a Def DbgValue that referred to the same underlying value would be seen as disagreeing. This patch changes this behaviour to treat them as referring to the same value, allowing the shared value to propagate into the BB. Differential Revision: https://reviews.llvm.org/D125953	2022-08-22 14:51:27 +01:00
Kazu Hirata	ec5eab7e87	Use range-based for loops (NFC)	2022-08-20 21:18:32 -07:00
Kazu Hirata	258531b7ac	Remove redundant initialization of Optional (NFC)	2022-08-20 21:18:28 -07:00
Luo, Yuanke	5159be3c9b	(Reland) [fastalloc] Support allocating specific register class in fastalloc This reverts commit `853bb192c4`.	2022-08-20 13:25:34 +08:00
Lorenzo Albano	98117fe208	[VP] Add splitting for VP_STRIDED_STORE and VP_STRIDED_LOAD Following the comment's thread of D117235, I added checks for the widening + splitting case, which also causes a split with one of the resulting vectors to be empty. Due to the same issues described in that same thread, the `fixed-vectors-strided-store.ll` test is missing the widening + splitting case, while the same case in the `strided-vpload.ll` test requires to manually split the loaded vector. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121784	2022-08-19 18:15:56 -07:00
Eli Friedman	8f826fe723	Fix reverse-iteration buildbot. A couple of instances of iterating over maps snuck in while the bot was down; fix them to use maps with deterministic iteration.	2022-08-19 14:21:05 -07:00
Nick Desaulniers	e412bac912	[MachineVerifier] add checks for INLINEASM_BR Test for a case we observed after the initial implementation of D129997 landed, in which case we observed a crash while building the ppc64le Linux kernel. In that case, we had one block with two exits, both to the same successor. Removing one of the exits corrupted the successor/predecessor lists. So when we have an INLINEASM_BR, check a few things for each indirect target: 1. that it exists. 2. that it is listed in our successors. 3. that its predecessor list contains the parent MBB of INLINEASM_BR. This would have caught the regression discovered after D129997 landed, after the pass that was problematic (early-tailduplication) rather than getting a stack trace in a later pass (regalloc) that doesn't understand the anomaly and crashes. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D130290	2022-08-19 12:52:26 -07:00
Bill Wendling	ac6a0cdc2e	[X86][AArch64][NFC] Simplify querying used argument registers Registers used for arguments are listed as "live-ins" into the starting basic block. This means we don't have to go through a potentially expensive search through all possible argument registers when we only care about used argument registers. Differential Revision: https://reviews.llvm.org/D132181	2022-08-19 11:39:05 -07:00
wanglian	fc2b4dfef2	[DAGCombiner] Add use check for VSCALE in visitSUB. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D132115	2022-08-19 09:46:18 +08:00
Eric Wang	ad8eb85545	[NFC][MLGO] ML Regalloc Priority Advisor This patch introduces the priority analysis and the priority advisor, the default implementation, and the scaffolding for introducing the other implementations of the advisor. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D131220	2022-08-18 15:33:48 -07:00
Paul Walker	96c8d615d6	[SVE] Extend findMoreOptimalIndexType so BUILD_VECTORs do not force 64bit indices. Extends findMoreOptimalIndexType to allow ISD::BUILD_VECTOR based indices to be truncated when such truncation is lossless. This can enable the use of 32bit gather/scatter indices thus making it less likely to have to split a gather/scatter in two. Depends on D125194 Differential Revision: https://reviews.llvm.org/D130533	2022-08-18 18:00:53 +01:00
Simon Pilgrim	fdec50182d	[CostModel] Replace getUserCost with getInstructionCost * Replace getUserCost with getInstructionCost, covering all cost kinds. * Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks. Original Patch by @samparker (Sam Parker) Differential Revision: https://reviews.llvm.org/D79483	2022-08-18 11:55:23 +01:00
wanglian	989ebc1783	[DAGCombiner][NFC] Tidy up unnecessary brackets in visitADD. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D132107	2022-08-18 15:48:22 +08:00
wanglian	230e277dfe	[DAGCombiner][NFC] Merge two if statement into one. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D131941	2022-08-18 10:12:35 +08:00
Daniil Fukalov	7ed3d81333	[NFCI] Move cost estimation from TargetLowering to TargetTransformInfo. TragetLowering had two last InstructionCost related `getTypeLegalizationCost()` and `getScalingFactorCost()` members, but all other costs are processed in TTI. E.g. it is not comfortable to use other TTI members in these two functions overrided in a target. Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout parameter - it was always passed from TTI. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D117723	2022-08-18 00:38:55 +03:00
Sanjay Patel	7f72a0f5bb	[SDAG] avoid generating libcall to function with same name This is a potentially better alternative to D131452 that also should avoid the infinite loop bug from: issue #56403 This is again a minimal fix to reduce merging pain for the release. But if this makes sense, then we might want to guard all of the RTLIB generation (and other libcalls?) with a similar name check. Differential Revision: https://reviews.llvm.org/D131521	2022-08-17 16:19:34 -04:00
Matthias Braun	19ce5e515f	RAGreedyStats: Ignore identity COPYs; count COPYs from/to physregs Improve copy statistics: - Count copies from or to physical registers: They are used to model function parameters and calling conventions and the register allocator optimizes for them. - Check physical registers assigned to virtual registers and stop counting "identity" `COPY`s where source and destination is the same physical registers; they will be removed in the `virtregmap` pass anyway. Differential Revision: https://reviews.llvm.org/D131932	2022-08-17 12:53:29 -07:00
Archit Saxena	e170d955fe	Split EH code by default The current machine function splitter is reliant on profile data to do profile summary analysis to split blocks into cold section. This may sometimes limit the usage of machine function splitter especially in cases where we could do some form of static analysis to split out cold blocks if profile data is absent or profile data which may be faulty (Consider Sample PGO). Of all code that could statically be marked cold Exception handling blocks are one of them (In fact BFI framework also tends to mark them as cold), and the most in size contribution. In my experiments I found out Exception handling pads and all code reachable from there account for up to 6-8% of the .text section on modern production binaries. This patch introduces a flag to split out all Exception handling blocks and blocks only reachable from Exceptional Handling pad to cold section. This flag has shown to give a performance win of up to 0.1% in terms of average cycles and instructions executed on internal facebook search service. Reviewed By: snehasish Differential Revision: https://reviews.llvm.org/D131824	2022-08-17 12:40:31 -07:00
Nick Desaulniers	6b0e2fa6f0	[SelectionDAG] make INLINEASM_BR use MachineBasicBlocks instead of BlockAddresses As part of re-architecting callbr to no longer use blockaddresses (https://reviews.llvm.org/D129288), we don't really need them in MIR. They make comparing MachineBasicBlocks of indirect targets during MachineVerifier a PITA. Suggested by @efriedma from the discussion: https://reviews.llvm.org/D130290#3669531 Reviewed By: efriedma, void Differential Revision: https://reviews.llvm.org/D130316	2022-08-17 09:34:31 -07:00
David Penry	1c9f0408bc	Revert "[ModuloSchedule] Add interface call to accept/reject SMS schedules" This reverts commit `8c4aea438c`. Needed because buildbot failures (warnings) gave a clue that there was a functional bug in the ARM rejection logic. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D132037	2022-08-17 09:32:43 -07:00
David Penry	8c4aea438c	[ModuloSchedule] Add interface call to accept/reject SMS schedules This interface allows a target to reject a proposed SMS schedule. For Hexagon/PowerPC, all schedules are accepted, leaving behavior unchanged. For ARM, schedules which exceed register pressure limits are rejected. Also, two RegisterPressureTracker methods now need to be public so that register pressure can be computed by more callers. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D128941	2022-08-17 08:13:26 -07:00
Andre Vieira	49223e0a2d	[TypePromotion] Don't promote PHI + ZExt if wider than RegisterBitWidth Differential Revision: https://reviews.llvm.org/D131966	2022-08-17 09:54:15 +01:00
Eli Friedman	cfd2c5ce58	Untangle the mess which is MachineBasicBlock::hasAddressTaken(). There are two different senses in which a block can be "address-taken". There can be a BlockAddress involved, which means we need to map the IR-level value to some specific block of machine code. Or there can be constructs inside a function which involve using the address of a basic block to implement certain kinds of control flow. Mixing these together causes a problem: if target-specific passes are marking random blocks "address-taken", if we have a BlockAddress, we can't actually tell which MachineBasicBlock corresponds to the BlockAddress. So split this into two separate bits: one for BlockAddress, and one for the machine-specific bits. Discovered while trying to sort out related stuff on D102817. Differential Revision: https://reviews.llvm.org/D124697	2022-08-16 16:15:44 -07:00
Nicolas Miller	ccfabfbb1f	Fix subrange liveness checking at rematerialization This patch fixes an issue where an instruction reading a whole register would be moved during register allocation into a spot where one of the subregisters was dead. The code to check whether an instruction can be rematerialized at a given point or not was already checking for subranges to ensure that subregisters are live, but only when the instruction being moved was using a subregister, this patch changes that so the subranges are checked even when the moved instruction uses the full register. This patch also adds a case to the original test for the subrange checking that trigger the issue described above. The original subrange checking code was introduced in this revision: https://reviews.llvm.org/D115278 And I've encountered this issue on AMDGPUs while working with DPC++: https://github.com/intel/llvm/issues/6209 Essentially the greedy register allocator attempts to move the following instruction: ``` %3961:vreg_64 = V_LSHLREV_B64_e64 3, %3078:vreg_64, implicit $exec ``` From `@3440` into the body of a loop `@16312`, but `%3078` has the following live ranges: ``` %3078 [2224r,2240r:0)[2240r,3488B:1)[16192B,38336B:1) 0@2224r 1@2240r L0000000000000003 [2224r,3440r:0) 0@2224r L000000000000000C [2240r,3488B:0)[16192B,38336B:0) 0@2240r ``` So `@16312e` `%3078.sub1` is alive but `%3078.sub0` is dead, so this instruction being moved there leads to invalid memory accesses as `3078.sub0` ends up being trashed and the result of this instruction is used as part of an address calculation for a load. On the original ticket this issue showed up on gfx906 and gfx90a but not on gfx908, this turned out to be because on gfx908 instead of moving the shift instruction into the loop, its value is spilled into an ACC register, gfx906 doesn't have ACC registers and for gfx90a ACC registers are used like regular vector registers and so aren't used for spilling. With this patch the original application from the DPC++ ticket works properly on gfx906, and the result of the shift instruction is correctly spilled instead of moving the instruction in the loop. Original Author: npmiller Reviewed by: rampitec Submitted by: rampitec Differential Revision: https://reviews.llvm.org/D131884	2022-08-16 10:50:09 -07:00
Arthur Eubanks	9181ce623f	[Windows] Put init_seg(compiler/lib) in llvm.global_ctors Currently we treat initializers with init_seg(compiler/lib) as similar to any other init_seg, they simply have a global variable in the proper section (".CRT$XCC" for compiler/".CRT$XCL" for lib) and are added to llvm.used. However, this doesn't match with how LLVM sees normal (or init_seg(user)) initializers via llvm.global_ctors. This causes issues like incorrect init_seg(compiler) vs init_seg(user) ordering due to GlobalOpt evaluating constructors, and the ability to remove init_seg(compiler/lib) initializers at all. Currently we use 'A' for priorities less than 200. Use 200 for init_seg(compiler) (".CRT$XCC") and 400 for init_seg(lib) (".CRT$XCL"), which do not append the priority to the section name. Priorities between 200 and 400 use ".CRT$XCC${Priority}". This allows for some wiggle room for people/future extensions that want to add initializers between compiler and lib. Fixes #56922 Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D131910	2022-08-16 08:16:18 -07:00
Steve Merritt	ec60fca752	[CodeView] Use non-qualified names for static local variables Static variables declared within a routine or lexical block should be emitted with a non-qualified name. This allows the variables to be visible to the Visual Studio watch window. Differential Revision: https://reviews.llvm.org/D131400	2022-08-16 10:33:43 -04:00
Andre Vieira	c6b5a13b7a	[TypePromotion] Only search for PHI + ZExt promotion of Integers Differential Revision: https://reviews.llvm.org/D131948	2022-08-16 10:15:32 +01:00
wanglian	fbc4c26e9a	[SelectionDAG][NFC] Fix return type when used isConstantIntBuildVectorOrConstantInt and isConstantFPBuildVectorOrConstantFP Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D131870	2022-08-16 10:07:24 +08:00
Rahman Lavaee	df2213f345	[EHStreamer] Omit @LPStart when function has no landing pads When no landing pads exist for a function, `@LPStart` is undefined and must be omitted. EH table is generally not emitted for functions without landing pads, except when the personality function is uknown (`!isNoOpWithoutInvoke(classifyEHPersonality(Per))`). In that case, we must omit `@LPStart` even when machine function splitting is enabled. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D131626	2022-08-15 17:09:46 -07:00
Aiden Grossman	24cdf97d63	[mlgo] Add ability to create feature-gated development features in regalloc advisor Currently there is no way to add in development features to the ML regalloc evict advisor which is useful to have when working on feature engineering/improving the current model. This patch adds in the ability to add in development features to the ML regalloc evict advisor which are gated by a runtime flag and not added in at all if not compiled in LLVM development mode. This sets the stage for future work where we are planning on upstreaming some of the newer features that we are currently experimenting with. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D131209	2022-08-15 16:01:37 -07:00
David Green	dfc95bab07	[DAG] Ensure more Legal BUILD_VECTOR elements types in shuffle->And combine This is a followup to D131350, which caused another problem for i64 types being split into i32 on i32 targets. This patch tries to make sure that either Illegal types are OK, or that the element types of a buildvector are legal and bigger than or equal to the size of the original elements. Differential Revision: https://reviews.llvm.org/D131883	2022-08-15 14:41:45 +01:00
Luo, Yuanke	853bb192c4	Revert "(Reland) [fastalloc] Support allocating specific register class in fastalloc" This reverts commit `30f9e6ebd3`.	2022-08-15 20:33:15 +08:00
Ayke van Laethem	de48717fcf	[AVR] Support unaligned store This patch really just extends D39946 towards stores as well as loads. While the patch is in SelectionDAGBuilder, it only applies to AVR (the only target that supports unaligned atomic operations). Differential Revision: https://reviews.llvm.org/D128483	2022-08-15 14:29:37 +02:00
Simon Pilgrim	3a73133217	[DAG] canCreateUndefOrPoison - add freeze(sign_extend_inreg(x,vt)) -> sign_extend_inreg(freeze(x),vt) support Guaranteed not to create undef/poison	2022-08-15 12:18:59 +01:00
Peter Waller	6e85db7293	[DAGCombine] Combine signext_inreg of extract-extend The outer signext_inreg is redundant in the following: Fold (signext_inreg (extract_subvector (zext\|anyext\|sext iN_value to _) _) from iN) -> (extract_subvector (signext iN_value to iM)) Tests are precommitted and clone those by analogy from the AND case in the same file. Add a negative test to check extension width is handled correctly. This patch supersedes D130700. Differential Revision: https://reviews.llvm.org/D131503	2022-08-15 10:58:07 +00:00
Simon Pilgrim	7e294e676e	[DAG] canCreateUndefOrPoison - add freeze(assertsext/zext(x,bt)) -> assertsext/zext(freeze(x),vt) support These are guaranteed not to create undef/poison (although they may pass through) - the associated ISD::VALUETYPE node is also guaranteed never to generate poison	2022-08-15 11:13:43 +01:00
Kazu Hirata	f5a68feab3	Use llvm::none_of (NFC)	2022-08-14 16:25:39 -07:00
Simon Pilgrim	e2d13fd096	[DAG] canCreateUndefOrPoison - add freeze(shl(x,y)) -> shl(freeze(x),y) support These are guaranteed not to create undef/poison if the shift amount is known to be in range	2022-08-14 14:38:10 +01:00
Simon Pilgrim	a621d38bcb	[DAG] canCreateUndefOrPoison - add freeze(and/or/xor(x,y)) -> and/or/xor(freeze(x),y) support These are guaranteed not to create undef/poison	2022-08-14 13:14:53 +01:00
Simon Pilgrim	60534b8879	[DAG] canCreateUndefOrPoison - add freeze(add/sub/mul(x,y)) -> add/sub/mul(freeze(x),y,z) support These are guaranteed not to create undef/poison as long as there are no poison generating flags	2022-08-13 20:58:00 +01:00
Luo, Yuanke	30f9e6ebd3	(Reland) [fastalloc] Support allocating specific register class in fastalloc Reland commit `719658d078` The base RA support infrastructure that only allow a specific register class be allocated in RA pss. Since greedy RA, basic RA derived from base RA, they all allow allocating specific register class. Fast RA doesn't support allocating register for specific register class. This patch is to enable ShouldAllocateClass in fast RA, so that it can support allocating register for specific register class. Differential Revision: https://reviews.llvm.org/D131825	2022-08-13 13:57:34 +08:00
Joe Loser	b12aa497cd	[DAGCombine] Replace std::monostate equivalent in DAGCombiner.cpp Remove the `UnitT` type and operators in favor of using `std::monostate` directly. Differential Revision: https://reviews.llvm.org/D131778	2022-08-12 21:42:09 -06:00
Simon Pilgrim	4de35f4bbf	[DAG] Add TODO to remove creation of INSERT_SUBVECTOR nodes from SimplifyMultipleUseDemandedBits SimplifyMultipleUseDemandedBits shouldn't be creating general nodes like this - although we allow bitcasts, even general constant folding is avoided. Removing it causes a number of regressions that need addressing first, but I've added a TODO for now.	2022-08-12 10:45:30 +01:00
Filipp Zhinkin	1626ee6a95	[DAGCombine] Hoist shifts out of a logic operations tree. Hoist and combine shift operations from logic operations tree: logic (logic (SH x0, s), y), (logic (SH x1, s), z) --> logic (SH (logic x0, x1), s), (logic y, z) The transformation improves code generated for some cases related to the issue https://github.com/llvm/llvm-project/issues/49541. Correctness: https://alive2.llvm.org/ce/z/pVqVgY https://alive2.llvm.org/ce/z/YVvT-q https://alive2.llvm.org/ce/z/W5zTBq https://alive2.llvm.org/ce/z/YfJsvJ https://alive2.llvm.org/ce/z/3YSyDM https://alive2.llvm.org/ce/z/Bs2kzk https://alive2.llvm.org/ce/z/EoQpzU https://alive2.llvm.org/ce/z/Jnc_5H https://alive2.llvm.org/ce/z/_LP6k_ https://alive2.llvm.org/ce/z/KvZNC9 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D131189	2022-08-12 12:42:16 +03:00
wanglian	061f7ec9fa	[LegalizeTypes][NFC] Use getConstantOperandVal instead of cast constant getvalue Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131642	2022-08-12 14:35:10 +08:00
wanglian	1303057888	[LegalizeTypes][NFC] Use dyn_cast instead of isa and cast Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131544	2022-08-12 14:18:49 +08:00
Chen Zheng	8d19cfb72e	[PowerPC] omit location attribute for TLS variable on AIX TLS debug on AIX is not ready for now. The location generated in no-integrated-as mode is wrong and in integrated-as mode causes AIX linker error. Reviewed By: Esme Differential Revision: https://reviews.llvm.org/D130245	2022-08-12 00:54:48 -04:00
wanglian	3b71f1d5ab	[LegalizeTypes][NFC] Use getConstantOperandAPInt instead of cast constant getAPInt Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D131653	2022-08-12 10:21:54 +08:00
Peter Waller	898699831b	[DAGCombine] Check zext legality in zext-extract-extend combine Discussed in D131503. Fix to D130782.	2022-08-11 14:30:42 +00:00
Andre Vieira	1640679187	[TypePromotion] Search from ZExt + PHI Expand TypePromotion pass to try to promote PHI-nodes in loops that are the operand of a ZExt, using the ZExt's result type to determine the Promote Width. Differential Revision: https://reviews.llvm.org/D111237	2022-08-11 09:50:10 +01:00
Andre Vieira	05fc5037cd	[TypePromotion] Hoist out Promote Width calculation Hoist out promote width calculation to simplify runOnFunction. Differential Revision: https://reviews.llvm.org/D131489	2022-08-11 09:50:10 +01:00
Andre Vieira	e524d61f35	[TypePromotion] Don't delete Insns when iterating Differential Revision: https://reviews.llvm.org/D131488	2022-08-11 09:50:10 +01:00
Andre Vieira	57de4e059d	[TypePromotion] Don't insert Truncate for a no-op ZExt Differential Revision: https://reviews.llvm.org/D131487	2022-08-11 09:50:10 +01:00
aqjune	02e56e2533	[CodeGen] Generate efficient assembly for freeze(poison) version of `mm_cast` intel intrinsics This patch makes the variants of `mm_cast` intel intrinsics that use `shufflevector(freeze(poison), ..)` emit efficient assembly. (These intrinsics are planned to use `shufflevector(freeze(poison), ..)` after shufflevector's semantics update; relevant thread: D103874) To do so, this patch 1. Updates `LowerAVXCONCAT_VECTORS` in X86ISelLowering.cpp to recognize `FREEZE(UNDEF)` operand of `CONCAT_VECTOR` in addition to `UNDEF` 2. Updates X86InstrVecCompiler.td to recognize `insert_subvector` of `FREEZE(UNDEF)` vector as its first operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130339	2022-08-11 13:36:21 +09:00
Simon Pilgrim	8623da5f74	[DAG] visitFREEZE - generalize freeze(op()) -> op(freeze()) to any number of operands canCreateUndefOrPoison currently only handles unary ops, but we intend to change that soon - this more closely matches the pushFreezeToPreventPoisonFromPropagating behaviour where the freeze is pushed up to a single operand value, as long as all others are guaranteed not to be poison/undef. However, pushFreezeToPreventPoisonFromPropagating would freeze all uses of the value - whilst this variant requires the frozen value to be only used in the op - we can look at generalize multiple uses later if the need arises.	2022-08-10 13:12:46 +01:00
Simon Pilgrim	bbc27d0148	[DAG] canCreateUndefOrPoison - add freeze(truncate(x)) -> truncate(freeze(x)) support	2022-08-10 11:27:22 +01:00
David Truby	b1b9c39629	[AArch64][SVE] Use SVE for VLS fcopysign for wide vectors Currently fcopysign for VLS vectors lowers through NEON even when the vector width is wider than a NEON vector, causing bad codegen as the vectors are split. This patch causes SVE to be used for these vectors instead, giving much better codegen on wide VLS vectors. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D128642	2022-08-10 10:17:19 +00:00
Simon Pilgrim	df3ea7365e	[DAG] Use DAG.getFreeze() to create freeze node. NFC.	2022-08-10 10:26:26 +01:00
Adrian Prantl	68f97d2f78	LiveDebugValues: Fix another crash related to unreachable blocks This is a follow-up patch to D130999. In the test, the MIR contains an unreachable MBB but the code attempts to look it up in MLocs. This patch fixes this issue by checking for the default-constructed value. rdar://97226240 Differential Revision: https://reviews.llvm.org/D131453	2022-08-09 10:34:57 -07:00
Simon Pilgrim	ed162d455a	[DAG] Avoid hasOneUse() calls if the cheaper !AssumeSingleUse test has already failed. NFC. Very minor optimization, but every little helps..	2022-08-09 16:42:19 +01:00
Simon Pilgrim	d79e7dc939	[DAG] SimplifyDemandedVectorElts - and/mul(x,y) - if a demanded element of y is known zero then we don't need to demand it in x This fixes most of the remaining regressions from the fixes in rG293899c64b75	2022-08-09 16:24:08 +01:00
Simon Pilgrim	2724143551	[DAG] canCreateUndefOrPoison - add freeze(ctpop(x)) -> ctpop(freeze(x)) and freeze(parity(x)) -> parity(freeze(x)) support Both are guaranteed not to create undef/poison	2022-08-09 10:10:29 +01:00
Luo, Yuanke	aaf6c7b05c	[globalisel] Select register bank for DBG_VALUE The register operand of DBG_VALUE is not selected to a proper register bank in both AArch64 and X86. This would cause getRegClass crash after global ISel. After discussion, we think the MIR should assume all vritual register should be set proper register class after global ISel, so this patch is to fix the gap of DBG_VALUE for AArch64 and X86. Differential Revision: https://reviews.llvm.org/D129037	2022-08-09 13:11:51 +08:00
Yuta Mukai	5357dd2f43	[MachinePipeliner] Fix Phi generation failure for large stages The previous code overwrites VRMap for prologue stages during Phi generation if a register spans many stages. As a result, the wrong register is used as the one coming from the prologue in Phis at later stages. (A process exists to correct this, but it does not work in all cases.) In addition, VRMap for prologue must be preserved until addBranches(). This patch fixes them by separating the map for Phis into a different variable (VRMapPhi). Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D127840	2022-08-09 13:14:26 +09:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Simon Pilgrim	6f2bee667a	[DAG] canCreateUndefOrPoison - add freeze(bswap(x)) -> bswap(freeze(x)) and freeze(bitreverse(x)) -> bitreverse(freeze(x)) support Both are guaranteed not to create undef/poison	2022-08-08 17:27:17 +01:00
Simon Pilgrim	e4b2c52420	[DAG] canCreateUndefOrPoison - add freeze(sext(x)) -> sext(freeze(x)) and freeze(zext(x)) -> zext(freeze(x)) support Both are guaranteed not to create undef/poison	2022-08-08 16:43:40 +01:00
Krzysztof Parzyszek	0f5385b70e	Recommit [RDF] Remove explicit template arguments from Print The build breakages should be addressed by d4abdd2e3d: [CMake] Check CMAKE_CXX_STANDARD and error if it's to old Thanks to Tobias and Roy for addressing these issues.	2022-08-08 07:28:45 -07:00
Simon Pilgrim	9641a201a5	[DAG] Add initial SelectionDAG::canCreateUndefOrPoison support This patch adds basic support for a DAG variant of the canCreateUndefOrPoison call and updates DAGCombiner::visitFREEZE to use it, further Opcodes (including target specific Opcodes) can be handled when we have test coverage. So far, I've left visitFREEZE to just use this for unary nodes (which currently means the existing BITCAST/FREEZE cases) - later patches will add other unary opcodes (with test coverage) and we can also refactor visitFREEZE to support a general number of operands like we do in InstCombinerImpl::pushFreezeToPreventPoisonFromPropagating. I'm not aware of any vector test freeze coverage so the DemandedElts (and the Depth) args are not being used yet - but they are in place. Similarly we will be able to handle poison generating SDNodeFlags as and when it becomes an issue. Part of the work for D106675 / PR50468 Differential Revision: https://reviews.llvm.org/D130646	2022-08-08 15:16:06 +01:00
Simon Pilgrim	b334709467	Remove superfluous ; outside of a function	2022-08-08 12:14:03 +01:00
Shubham Narlawar	ab4fc87a9d	[DAG] Emit table lookup from TargetLowering::expandCTTZ() This patch emits table lookup in expandCTTZ. Context - https://reviews.llvm.org/D113291 transforms set of IR instructions to cttz intrinsic but there are some targets which does not support CTTZ or CTLZ. Hence, I generate a table lookup in TargetLowering::expandCTTZ(). Differential Revision: https://reviews.llvm.org/D128911	2022-08-08 12:08:05 +01:00
Simon Pilgrim	e5e93b6130	[DAG] FoldConstantArithmetic - add initial support for undef elements in bitcasted binop constant folding FoldConstantArithmetic can fold constant vectors hidden behind bitcasts (e.g. vXi64 -> v2Xi32 on 32-bit platforms), but currently bails if either vector contains undef elements. These undefs can often occur due to SimplifyDemandedBits/VectorElts calls recognising that the upper bits are often unnecessary (e.g. funnel-shift/rotate implicit-modulo and AND masks). This patch adds a basic 'FoldValueWithUndef' handler that will attempt to constant fold if one or both of the ops are undef - so far this just handles the AND and MUL cases where we always fold to zero. The RISCV codegen increase is interesting - it looks like the BUILD_VECTOR lowering was loading a constant pool entry but now (with all elements defined constant) it can materialize the constant instead? Differential Revision: https://reviews.llvm.org/D130839	2022-08-08 11:53:56 +01:00
David Green	061e0189a3	[DAG] Ensure Legal BUILD_VECTOR elements types in shuffle->And combine D129150 added a combine from shuffles to And that creates a BUILD_VECTOR of constant elements. We need to ensure that the elements are of a legal type, to prevent asserts during lowering. Fixes #56970. Differential Revision: https://reviews.llvm.org/D131350	2022-08-08 09:47:55 +01:00
Aaron Ballman	32fd0b7fd5	Revert "[RDF] Remove explicit template arguments from Print" This reverts commit `ede96de751`. This breaks the build on Windows with Visual Studio: https://lab.llvm.org/buildbot/#/builders/123/builds/12134	2022-08-07 08:24:01 -04:00
Kazu Hirata	a2d4501718	[llvm] Fix comment typos (NFC)	2022-08-07 00:16:14 -07:00
Kazu Hirata	3b114087c3	[llvm] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2022-08-07 00:16:11 -07:00
Fangrui Song	fa66789d06	[llvm] LLVM_NODISCARD => [[nodiscard]]. NFC With C++17 there is no Clang pedantic warning.	2022-08-07 00:26:33 +00:00
Krzysztof Parzyszek	2bc390bdd6	[RDF] Use default TargetOperandInfo if not given in constructor All current in-tree users use the default implementation.	2022-08-06 14:32:52 -05:00
Krzysztof Parzyszek	ede96de751	[RDF] Remove explicit template arguments from Print CTAD takes care of it.	2022-08-06 13:29:15 -05:00
Filipp Zhinkin	c55899f763	[DAGCombiner] Hoist funnel shifts from logic operation Hoist funnel shift from logic op: logic_op (FSH x0, x1, s), (FSH y0, y1, s) --> FSH (logic_op x0, y0), (logic_op x1, y1), s The transformation improves code generated for some cases related to issue https://github.com/llvm/llvm-project/issues/49541. Reduced amount of funnel shifts can also improve throughput on x86 CPUs by utilizing more available ports: https://quick-bench.com/q/gC7AKkJJsDZzRrs_JWDzm9t_iDM Transformation correctness checks: https://alive2.llvm.org/ce/z/TKPULH https://alive2.llvm.org/ce/z/UvTd_9 https://alive2.llvm.org/ce/z/j8qW3_ https://alive2.llvm.org/ce/z/7Wq7gE https://alive2.llvm.org/ce/z/Xr5w8R https://alive2.llvm.org/ce/z/D5xe_E https://alive2.llvm.org/ce/z/2yBZiy Differential Revision: https://reviews.llvm.org/D130994	2022-08-05 17:02:22 -04:00
Dawid Jurczak	1bd31a6898	[NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T Extracted from https://reviews.llvm.org/D129781 and address comment: https://reviews.llvm.org/D129781#3655571 Differential Revision: https://reviews.llvm.org/D130268	2022-08-05 13:35:41 +02:00
Fangrui Song	7d6017fd31	[TTI] Change new getVectorInstrCost overload to use const reference after D131114 A const reference is preferred over a non-null const pointer. `Type *` is kept as is to match the other overload. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D131197	2022-08-04 15:16:51 -07:00
Mingming Liu	bc8f2f3649	[AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation. 1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method. 2) This patch also changes a few callsites (VectorCombine.cpp, SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method. 3) This is a split of D128302. Differential Revision: https://reviews.llvm.org/D131114	2022-08-04 12:58:25 -07:00
Lorenzo Albano	74940d2668	[VP] Add widening for VP_STRIDED_LOAD and VP_STRIDED_STORE Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D121114	2022-08-04 16:12:01 +02:00
wanglian	b6b0690355	[LegalizeTypes][VP] Add split operand support for VP float and integer casting Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D130685	2022-08-04 15:41:50 +08:00
Mircea Trofin	0cb9746a7d	[nfc][mlgo] Separate logger and training-mode model evaluator This just shuffles implementations and declarations around. Now the logger and the TF C API-based model evaluator are separate. Differential Revision: https://reviews.llvm.org/D131116	2022-08-03 16:20:28 -07:00
Adrian Prantl	905f2d1ecb	Fix LDV InstrRefBasedImpl to not crash when encountering unreachable MBBs. The testcase was delta-reduced from an LTO build with sanitizer coverage and the MIR tail duplication pass caused a machine basic block to become unreachable in MIR. This caused the MBB to be invisible to the reverse post-order traversal used to initialize the MBB <-> RPONumber lookup tables. rdar://97226240 Differential Revision: https://reviews.llvm.org/D130999	2022-08-03 13:05:05 -07:00
Felipe de Azevedo Piovezan	a5a8a05c78	[SelectionDAG] Handle IntToPtr constants in dbg.value The function `handleDebugValue` has custom logic to handle certain kinds constants, namely integers, floats and null pointers. However, it does not handle constant pointers created from IntToPtr ConstantExpressions. This patch addresses the issue by replacing the Constant with its integer operand. A similar bug was addressed for GlobalISel in D130642. Reviewed By: aprantl, #debug-info Differential Revision: https://reviews.llvm.org/D130908	2022-08-03 14:10:05 -04:00
David Truby	9a976f3661	[llvm] Always use TargetConstant for FP_ROUND ISD Nodes This patch ensures consistency in the construction of FP_ROUND nodes such that they always use ISD::TargetConstant instead of ISD::Constant. This additionally fixes a bug in the AArch64 SVE backend where patterns were matching against TargetConstant nodes and sometimes failing when passed a Constant node. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130370	2022-08-03 14:02:11 +01:00
Fraser Cormack	646e2f4803	[VP] Rename VP int<->float conversion ISD opcodes These should be named like the non-VP versions for consistency. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130967	2022-08-03 10:04:38 +01:00
Paul Kirth	d434e40f39	[llvm][NFC] Refactor code to use ProfDataUtils In this patch we replace common code patterns with the use of utility functions for dealing with profiling metadata. There should be no change in functionality, as the existing checks should be preserved in all cases. Reviewed By: bogner, davidxl Differential Revision: https://reviews.llvm.org/D128860	2022-08-03 00:09:45 +00:00
Mircea Trofin	4146c1756d	[nfc] Remove unused parameter in TailDuplicator::duplicateSimpleBB Differential Revision: https://reviews.llvm.org/D131008	2022-08-02 13:39:34 -07:00
Kai Nacke	b38375378d	[GIsel] Add missing libcall for G_MUL to LegalizerHelper The LegalizerHelper misses the code to lower G_MUL to a library call, which this change adds. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D130987	2022-08-02 13:35:25 -04:00
Simon Pilgrim	b651fdff79	[DAG] matchRotateSub - ensure the (pre-extended) shift amount is wide enough for the amount mask (PR56859) matchRotateSub is given shift amounts that will already have stripped any/zero-extend nodes from - so make sure those values are wide enough to take a mask.	2022-08-02 11:38:52 +01:00
Tim Northover	b586dc21a7	Outliner: add "target-cpu" feature from source function to outlined The CPU is used to determine which inline asm instructions are allowed, so needs to be copied across in case the outlined function contains any.	2022-08-02 09:33:29 +01:00
Sotiris Apostolakis	995b61cdac	[SelectOpti] Auto-disable other cmov optis when the new select-opti pass is enabled Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D129817	2022-08-02 00:19:59 +00:00
Fangrui Song	2b70bebc6d	[MachineFunctionPass] Support -print-changed={,c}diff{,-quiet} Follow-up to D130434. Move doSystemDiff to PrintPasses.cpp and call it in MachineFunctionPass.cpp. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D130833	2022-08-01 12:56:15 -07:00
Marius Brehler	ddb6c28638	Avoid comparison of integers of different signs Otherwiese a warning is emitted when compiling with `-Wsign-compare`.	2022-08-01 11:20:41 +00:00
Simon Pilgrim	b43d7aacf8	[DAG] visitINSERT_VECTOR_ELT - extend folding to BUILD_VECTOR if all missing elements from an insertion chain are known zero	2022-08-01 11:32:33 +01:00
David Sherwood	41119a0f52	[DAGCombiner] Extend visitAND to include EXTRACT_SUBVECTOR Eliminate an AND by redefining an anyext\|sext\|zext. (and (extract_subvector (anyext\|sext\|zext v) _) iN_mask) => (extract_subvector (zeroext_iN v)) Differential Revision: https://reviews.llvm.org/D130782	2022-08-01 10:32:32 +01:00
Vladislav Dzhidzhoev	facb3ac385	[GlobalISel][DebugInfo] salvageDebugInfo analogue for gMIR Salvage debug info of instruction that is about to be deleted as dead in Combiner pass. Currently supported instructions are COPY and G_TRUNC. It allows to salvage debug info of some dead arguments of functions, by putting DWARF expression corresponding to the instruction being deleted into related DBG_VALUE instruction. Here is an example of missing variables location https://godbolt.org/z/K48osb9dK. We see that arguments x, y of function foo are not available in debugger, and corresponding DBG_VALUE instructions have undefined register operand instead of variables locaton after Aarch64PreLegalizerCombiner pass. The reason is that registers where variables are located are removed as dead (with instruction G_TRUNC). We can use salvageDebugInfo analogue for gMIR to preserve debug locations of dead variables. Statistics of llvm object files built with vs without this commit on -O2 optimization level (CMAKE_BUILD_TYPE=RelWithDebInfo, -fglobal-isel) on Aarch64 (macOS): Number of variables with 100% of parent scope covered by DW_AT_location has been increased by 7,9%. Number of variables with 0% coverage of parent scope has been decreased by 1,2%. Number of variables processed by location statistics has been increased by 2,9%. Average PC ranges coverage has been increased by 1,8 percentage points. Coverage can be improved by supporting more instructions, or by calling salvageDebugInfo for instructions that are deleted during Combiner rules exection. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D129909	2022-08-01 11:14:53 +02:00
Chuanqi Xu	9701053517	Introduce @llvm.threadlocal.address intrinsic to access TLS variable This belongs to a series of patches which try to solve the thread identification problem in coroutines. See https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015 for a full background. The problem consists of two concrete problems: TLS variable and readnone functions. This patch tries to convert the TLS problem to readnone problem by converting the access of TLS variable to an intrinsic which is marked as readnone. The readnone problem would be addressed in following patches. Reviewed By: nikic, jyknight, nhaehnle, ychen Differential Revision: https://reviews.llvm.org/D125291	2022-08-01 10:51:30 +08:00
Luís Marques	260a641068	[RISCV] Pre-RA expand pseudos pass Expand load address pseudo-instructions earlier (pre-ra) to allow follow-up patches to fold the addi of PseudoLLA instructions into the immediate operand of load/store instructions. Differential Revision: https://reviews.llvm.org/D123264	2022-07-31 23:19:00 +02:00
Kazu Hirata	12b29900a1	Use any_of (NFC)	2022-07-30 10:35:56 -07:00
Dmitry Vassiliev	adc387460d	[CodeGen] Fixed undeclared MISchedCutoff in case of NDEBUG and LLVM_ENABLE_ABI_BREAKING_CHECKS This patch fixes the error llvm/lib/CodeGen/MachineScheduler.cpp(755): error C2065: 'MISchedCutoff': undeclared identifier in case of NDEBUG and LLVM_ENABLE_ABI_BREAKING_CHECKS. Note MISchedCutoff is declared under #ifndef NDEBUG. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130425	2022-07-30 18:24:50 +02:00
Simon Pilgrim	9ad082eb5a	[DAG] Pull out repeated getOperand() calls for shuffle ops. NFC.	2022-07-30 14:02:54 +01:00
Amaury Séchet	226086230c	[DAG] Use recursivelyDeleteUnusedNodes in CommitTargetLoweringOpt. It simplifies the logic and removes the need for manual bookkeeping. Differential Revision: https://reviews.llvm.org/D130445	2022-07-29 13:49:03 +00:00
Simon Pilgrim	af1b7ebcdf	[TargetLowering] Move a few hasOneUse() tests later to reduce unnecessary computations. NFC. Many of these cases, an early-out on the much cheaper getOpcode() check will avoid us needing to call hasOneUse() entirely.	2022-07-29 14:20:35 +01:00
Matt Arsenault	a4834ad068	RegisterCoalescer: Shrink main range after shrinking subranges If the subregister uses were dead, this would leave the main range segment pointing to a deleted instruction. Not sure if this should try to avoid shrinking if we know we don't have dead components.	2022-07-29 08:57:28 -04:00
Simon Pilgrim	641dba9e28	[DAG] Move a few hasOneUse() tests later to reduce unnecessary computations. NFC. Many of these cases, an early-out on the much cheaper getOpcode() check will avoid us needing to call hasOneUse() entirely.	2022-07-29 11:34:39 +01:00
Simon Pilgrim	9082c13106	[Support] Add KnownBits::concat method Add a method for the various cases where we need to concatenate 2 KnownBits together (BUILD_PAIR and SHIFT_PARTS in particular) - uses the existing APInt::concat 'HiBits.concat(LoBits)' convention Differential Revision: https://reviews.llvm.org/D130557	2022-07-29 11:06:39 +01:00
Felipe de Azevedo Piovezan	58526b2d2b	[GlobalISel] Handle nullptr constants in dbg.value Currently, the LLVM IR -> MIR translator fails to translate dbg.values whose first argument is a null pointer. However, in other portions of the code, such pointers are always lowered to the constant zero, for example see IRTranslator::Translate(Constant, Register). This patch addresses the limitation by following the same approach of lowering null pointers to zero. A prior test was checking that null pointers were always lowered to $noreg; this test is changed to check for zero, and the previous behavior is now checked by introducing a dbg.value whose first argument is the address of a global variable. Differential Revision: https://reviews.llvm.org/D130721	2022-07-28 14:58:14 -07:00
Felipe de Azevedo Piovezan	0ef6809c48	[GlobalISel][nfc] Remove unnecessary cast The getOperand method already returns a Constant when it is called on a ConstantExpression, as such the cast is not needed. To prevent a type mismatch between the different return statements of the lambda, the lambda return type is explicitly provided. Differential Revision: https://reviews.llvm.org/D130719	2022-07-28 14:55:07 -07:00
Simon Pilgrim	8c99cef1e7	[DAG] Remove SelectionDAG::GetDemandedBits and use SimplifyMultipleUseDemandedBits directly. GetDemandedBits is mainly a wrapper around SimplifyMultipleUseDemandedBits now, and is only used by DAGCombiner::visitSTORE so I've moved all remaining functionality there. visitSTORE was making use of this to 'simplify' constants for a trunc-store. Just removing this code left to a mixture of regressions and gains - it came down to whether a target preferred a sign or zero extended constant for materialization/truncation. I've just moved the code over for now, but a next step would be to move this to targetShrinkDemandedConstant, but some targets that override the method expect a basic binop, and might react badly to a store node.....	2022-07-28 17:03:44 +01:00
Simon Pilgrim	be488ba7de	[DAG] DAGCombiner::visitTRUNCATE - remove GetDemandedBits call This should now all be handled by SimplifyDemandedBits.	2022-07-28 15:23:04 +01:00
Simon Pilgrim	ea7f14dad0	[DAG] SelectionDAG::GetDemandedBits - don't simplify opaque constants I'm actually trying to get rid of GetDemandedBits - but while dismantling it I noticed that we were altering opaque constants. Fixing that causes a FP_TO_INT_SAT regression that should be addressed separately - I'll raise a bug.	2022-07-28 14:46:59 +01:00
Simon Pilgrim	69d5a038b9	[DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits. There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP. Differential Revision: https://reviews.llvm.org/D77804	2022-07-28 14:10:44 +01:00
Amaury Séchet	474a8ee03d	[DAG] Use recursivelyDeleteUnusedNodes in PromoteLoad It simplifies the code overall and removes the need for manual bookkeeping. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130447	2022-07-28 12:54:52 +00:00
Amaury Séchet	7920805b27	[DAG] Use recursivelyDeleteUnusedNodes in ReplaceLoadWithPromotedLoad It simplifies the code overall and removes the need for manual bookkeeping. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130444	2022-07-28 12:32:37 +00:00
David Spickett	a0ccba5e19	[llvm] Fix some test failures with EXPENSIVE_CHECKS and libstdc++ DebugLocEntry assumes that it either contains 1 item that has no fragment or many items that all have fragments (see the assert in addValues). When EXPENSIVE_CHECKS is enabled, _GLIBCXX_DEBUG is defined. On a few machines I've checked, this causes std::sort to call the comparator even if there is only 1 item to sort. Perhaps to check that it is implemented properly ordering wise, I didn't find out exactly why. operator< for a DbgValueLoc will crash if this happens because the optional Fragment is empty. Compiler/linker/optimisation level seems to make this happen or not. So I've seen this happen on x86 Ubuntu but the buildbot for release EXPENSIVE_CHECKS did not have this issue. Add an explicit check whether we have 1 item. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D130156	2022-07-28 08:53:38 +00:00
Matt Arsenault	bfdca1535c	RegAllocGreedy: Fix nondeterminism in tryLastChanceRecoloring tryLastChanceRecoloring iterates over the set of LiveInterval pointers and used that to seed the recoloring stack, which was nondeterministic. Fixes a future test failing about 20% of the time. This just takes the order the interfering vreg was encountered. Not sure if we should try to order this more intelligently.	2022-07-27 19:02:06 -04:00
Paul Kirth	6e9bab71b6	Revert "[llvm][NFC] Refactor code to use ProfDataUtils" This reverts commit `300c9a7881`. We will reland once these issues are ironed out.	2022-07-27 21:38:11 +00:00
Paul Kirth	300c9a7881	[llvm][NFC] Refactor code to use ProfDataUtils In this patch we replace common code patterns with the use of utility functions for dealing with profiling metadata. There should be no change in functionality, as the existing checks should be preserved in all cases. Reviewed By: bogner, davidxl Differential Revision: https://reviews.llvm.org/D128860	2022-07-27 21:13:54 +00:00
Adrian Prantl	719ab04acf	[GlobalISel] Handle IntToPtr constants in dbg.value Currently, the IR to MIR translator can only handle two kinds of constant inputs to dbg.values intrinsics: constant integers and constant floats. In particular, it cannot handle pointers created from IntToPtr ConstantExpression objects. This patch addresses the limitation above by replacing the IntToPtr with its input integer prior to converting the dbg.value input. Patch by Felipe Piovezan! Differential Revision: https://reviews.llvm.org/D130642	2022-07-27 13:42:07 -07:00
Amara Emerson	65246d3eb4	Use hasNItemsOrLess() in MRI::hasAtMostUserInstrs().	2022-07-27 11:42:14 -07:00
Amara Emerson	19cdd1908b	[AArch64][GlobalISel] Add heuristics for localizing G_CONSTANT. This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of materializing a specific constant in code size. Doing so prevents us from sinking constants which require multiple instructions to generate into use blocks. Code size savings on CTMark -Os: Program size.__text before after diff ClamAV/clamscan 381940.00 382052.00 0.0% lencod/lencod 428408.00 428428.00 0.0% SPASS/SPASS 411868.00 411876.00 0.0% kimwitu++/kc 449944.00 449944.00 0.0% Bullet/bullet 463588.00 463556.00 -0.0% sqlite3/sqlite3 284696.00 284668.00 -0.0% consumer-typeset/consumer-typeset 414492.00 414424.00 -0.0% 7zip/7zip-benchmark 595244.00 594972.00 -0.0% mafft/pairlocalalign 247512.00 247368.00 -0.1% tramp3d-v4/tramp3d-v4 372884.00 372044.00 -0.2% Geomean difference -0.0% Differential Revision: https://reviews.llvm.org/D130554	2022-07-27 10:51:16 -07:00
Simon Pilgrim	c0b3f7a50f	[DAG] SimplifyDemandedBits - ensure we clear known One bits that AssertZext asserts are really known Zero Matches ComputeKnownBits behaviour Thanks to @uabelho for the fuzz regression report on D129765	2022-07-27 13:57:47 +01:00
Simon Pilgrim	529bd4f352	[DAG] SimplifyDemandedBits - don't early-out for multiple use values SimplifyDemandedBits currently early-outs for multi-use values beyond the root node (just returning the knownbits), which is missing a number of optimizations as there are plenty of cases where we can still simplify when initially demanding all elements/bits. @lenary has confirmed that the test cases in aea-erratum-fix.ll need refactoring and the current increase codegen is not a major concern. Differential Revision: https://reviews.llvm.org/D129765	2022-07-27 10:54:06 +01:00
Dmitry Vassiliev	e3e63f30a5	[CodeGen] Fixed ambiguous symbol ExtAddrMode in case of NDEBUG and LLVM_ENABLE_DUMP This patch fixes the following error with MSVC 16.9.2 in case of NDEBUG and LLVM_ENABLE_DUMP: llvm/lib/CodeGen/CodeGenPrepare.cpp(2581): error C2872: 'ExtAddrMode': ambiguous symbol llvm/include/llvm/CodeGen/TargetInstrInfo.h(86): note: could be 'llvm::ExtAddrMode' llvm/lib/CodeGen/CodeGenPrepare.cpp(2447): note: or '`anonymous-namespace'::ExtAddrMode' llvm/lib/CodeGen/CodeGenPrepare.cpp(2581): error C2039: 'print': is not a member of 'llvm::ExtAddrMode' Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D130426	2022-07-27 00:21:57 +02:00
Fangrui Song	f106525de2	[MachineFunctionPass] Support -print-changed and -print-changed=quiet -print-changed for new pass manager is handy beside -print-after-all. Port it to MachineFunctionPass. Note: lib/Passes/StandardInstrumentations.cpp implements a number of misc features. If we want to use them for codegen, we may need to lift some functionality to LLVMIR. Reviewed By: aeubanks, jamieschmeiser Differential Revision: https://reviews.llvm.org/D130434	2022-07-26 10:16:49 -07:00
Simon Pilgrim	1ea7b9c6ee	[DAG] matchRotateSub - set demanded bits to the shift amount type size, not the shift result size. This should fix a report on D130251 of an assert due to a bitwidth mismatch in APInt::isSubSetOf	2022-07-26 17:58:51 +01:00
Stefan Gränitz	1e30820483	[WinEH] Apply funclet operand bundles to nounwind intrinsics that lower to function calls in the course of IR transforms WinEHPrepare marks any function call from EH funclets as unreachable, if it's not a nounwind intrinsic or has no proper funclet bundle operand. This affects ARC intrinsics on Windows, because they are lowered to regular function calls in the PreISelIntrinsicLowering pass. It caused silent binary truncations and crashes during unwinding with the GNUstep ObjC runtime: https://github.com/gnustep/libobjc2/issues/222 This patch adds a new function `llvm::IntrinsicInst::mayLowerToFunctionCall()` that aims to collect all affected intrinsic IDs. * Clang CodeGen uses it to determine whether or not it must emit a funclet bundle operand. * PreISelIntrinsicLowering asserts that the function returns true for all ObjC runtime calls it lowers. * LLVM uses it to determine whether or not a funclet bundle operand must be propagated to inlined call sites. Reviewed By: theraven Differential Revision: https://reviews.llvm.org/D128190	2022-07-26 17:52:43 +02:00
Paul Walker	e5c892dd85	[SVE][SelectionDAG] Use INDEX to generate matching instances of BUILD_VECTOR. This patch starts small, only detecting sequences of the form <a, a+n, a+2n, a+3n, ...> where a and n are ConstantSDNodes. Differential Revision: https://reviews.llvm.org/D125194	2022-07-26 15:28:37 +00:00
wangpc	1a7078d106	[DAGCombine] Mask doesn't have to be (EltSize - 1) exactly when combining rotation I think what we need is the least Log2(EltSize) significant bits are known to be ones. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130251	2022-07-26 21:14:45 +08:00
Sven van Haastregt	c8d91b07bb	Reassoc FMF should not optimize FMA(a, 0, b) to (b) Optimizing (a * 0 + b) to (b) requires assuming that a is finite and not NaN. DAGCombiner will do this optimization when the reassoc fast math flag is set, which is not correct. Change DAGCombiner to only consider UnsafeMath for this optimization. Differential Revision: https://reviews.llvm.org/D130232 Co-authored-by: Andrea Faulds <andrea.faulds@arm.com>	2022-07-26 09:39:12 +01:00
Kazu Hirata	3f3930a451	Remove redundaunt virtual specifiers (NFC) Identified with tidy-modernize-use-override.	2022-07-25 23:00:59 -07:00
jacquesguan	cb370cf413	[DAGCombiner] Teach scalarizeExtractedBinop to support scalable splat. This patch supports the scalable splat part for scalarizeExtractedBinop. Differential Revision: https://reviews.llvm.org/D129725	2022-07-26 09:31:45 +08:00
Amara Emerson	5ae0472694	[GlobalISel] Fix miscompile of G_UREM + G_UDIV due to not checking for equality of the first operands of each. Fixes issue #55287 Differential Revision: https://reviews.llvm.org/D130525	2022-07-25 16:03:05 -07:00
Alexander Shaposhnikov	1e636f2676	[IRBuilder] Add assert for AtomicRMW ordering Add assert for AtomicRMW: Ordering != AtomicOrdering::Unordered (https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/Verifier.cpp#L3944) and adjust expandAtomicStore accordingly. Test plan: 1/ ninja check-llvm check-clang check-lld 2/ Bootstrapped LLVM/Clang pass tests Differential revision: https://reviews.llvm.org/D130457	2022-07-25 22:51:25 +00:00
Matt Arsenault	62531518f9	RegAllocGreedy: Add a command line flag for reverseLocalAssignment Introduce a flag like for some of the other target heuristic controls to help with experimentation.	2022-07-25 15:47:15 -04:00
Vladislav Dzhidzhoev	fc93ba061a	[GlobalISel][DebugInfo] Remove debug info with zero line from constants inserted at entry block Emission of constants having DebugLoc with line 0 causes significant increase of debug_line section size for some source files. To illustrate, we can compare section sizes of several files from llvm test-suite, built with SelectionDAG vs GlobalISel, on Aarch64 (macOS), using -O0 optimization level: \| Source path \| SDAG text sz \| GISel text sz \| SDAG debug_line sz \| GISel debug_line sz \| -------------------------------------------------------------- \| ------------ \| ------------- \| ------------------ \| -------------------- \| `SingleSource/Regression/C/gcc-c-torture/execute/strlen-2.c` \| 15320 \| 660 \| 14872 \| 6340 \| `SingleSource/Regression/C/gcc-c-torture/execute/20040629-1.c` \| 33640 \| 26300 \| 2812 \| 6693 \| `SingleSource/Benchmarks/Misc/flops-4.c` \| 1428 \| 1196 \| 594 \| 1008 \| `MultiSource/Benchmarks/MiBench/consumer-typeset/z31.c` \| 2716 \| 964 \| 809 \| 903 \| `MultiSource/Benchmarks/Prolangs-C/gnugo/showinst.c` \| 2534 \| 2502 \| 189 \| 573 For instance, here is a fragment of `flops-4.c.o` debug line section dump ``` Address Line Column File ISA Discriminator Flags ------------------ ------ ------ ------ --- ------------- ------------- 0x0000000000000000 174 0 1 0 0 is_stmt 0x0000000000000010 0 0 1 0 0 0x0000000000000018 185 4 1 0 0 is_stmt prologue_end 0x000000000000001c 0 0 1 0 0 0x0000000000000024 186 4 1 0 0 is_stmt 0x000000000000002c 189 10 1 0 0 is_stmt 0x0000000000000030 0 0 1 0 0 0x0000000000000038 207 11 1 0 0 is_stmt 0x0000000000000044 208 11 1 0 0 is_stmt 0x0000000000000048 0 0 1 0 0 0x0000000000000058 210 10 1 0 0 is_stmt 0x000000000000005c 0 0 1 0 0 0x0000000000000060 211 10 1 0 0 is_stmt 0x0000000000000064 0 0 1 0 0 0x000000000000006c 212 10 1 0 0 is_stmt 0x0000000000000070 0 0 1 0 0 0x000000000000007c 213 10 1 0 0 is_stmt 0x0000000000000080 0 0 1 0 0 0x0000000000000088 214 10 1 0 0 is_stmt 0x000000000000008c 0 0 1 0 0 0x0000000000000094 215 10 1 0 0 is_stmt ``` Lot of zero lines are produced by constants (global values) having DebugLoc with line 0. It seems that they're not significant for debugging experience. With the commit applied, total size of debug_line sections of llvm shared libraries has reduced by 2.5%. Change of debug line section size of files listed above: \| Source path \| GISel debug_line sz \| Patch debug_line sz \| -------------------------------------------------------------- \| ------------------- \| -------------------- \| `SingleSource/Regression/C/gcc-c-torture/execute/strlen-2.c` \| 6340 \| 1465 \| `SingleSource/Regression/C/gcc-c-torture/execute/20040629-1.c` \| 6693 \| 3782 \| `SingleSource/Benchmarks/Misc/flops-4.c` \| 1008 \| 609 \| `MultiSource/Benchmarks/MiBench/consumer-typeset/z31.c` \| 903 \| 841 \| `MultiSource/Benchmarks/Prolangs-C/gnugo/showinst.c` \| 573 \| 190 Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D127488	2022-07-25 17:19:01 +00:00
Nikita Popov	fb7caa3c7b	[AsmPrinter] Reject ptrtoint to larger size in lowerConstant() When using a ptrtoint to a size larger than the pointer width in a global initializer, we currently create a ptr & low_bit_mask style MCExpr, which will later result in a relocation error during object file emission. This patch rejects the constant expression already during lowerConstant(), which results in a much clearer error message that references the constant expression at fault. This fixes https://github.com/llvm/llvm-project/issues/56400, for certain definitions of "fix". Differential Revision: https://reviews.llvm.org/D130366	2022-07-25 10:18:27 +02:00
Kazu Hirata	b5188591a0	[llvm] Remove redundaunt virtual specifiers (NFC) Identified with modernize-use-override.	2022-07-24 21:50:35 -07:00
Kazu Hirata	acf648b5e9	Use llvm::less_first and llvm::less_second (NFC)	2022-07-24 16:21:29 -07:00
Kazu Hirata	ea29810c9d	[CodeGen] Remove a redundant void (NFC) Identified with modernize-redundant-void-arg.	2022-07-24 12:27:14 -07:00
Matt Arsenault	40abb28f61	RegAllocGreedy: Fix subranges when rematerializing dead subreg defs This would create a new interval missing the subrange and hit this verifier error: * Bad machine code: Live interval for subreg operand has no subranges * - function: test_remat_subreg_def - basic block: %bb.0 (0xa568758) [0B;128B) - instruction: 32B dead undef %4.sub0:vreg_64 = V_MOV_B32_e32 2, implicit $exec	2022-07-24 11:51:59 -04:00
Simon Pilgrim	562ee7cc5f	[DAG] visitSMUL_LOHI/visitUMUL_LOHI - ensure we canonicalize constants to the RHS	2022-07-24 16:09:56 +01:00
Simon Pilgrim	428c0f2adc	[DAG] getNode - assert that SMUL_LOHI/UMUL_LOHI nodes have the correct ops + types	2022-07-24 15:30:57 +01:00
Simon Pilgrim	0708771cce	[DAG] MaskedVectorIsZero - don't bother with (-1).isSubsetOf mask check. NFC. Just use KnownBits::isZero() to ensure all the bits are known zero.	2022-07-24 13:12:21 +01:00
Simon Pilgrim	e82d49bfed	[DAG] SimplifyMultipleUseDemandedBits - early-out for any scalable vector types Noticed while working to remove SelectionDAG::GetDemandedBits - we were relying on the callers to have already bailed for scalable vectors	2022-07-24 12:59:43 +01:00
Simon Pilgrim	a3e38b4a20	[DAG] SimplifyDemandedVectorElts - if every and/mul element-pair has a zero/undef then just constant fold to zero	2022-07-24 12:00:31 +01:00
Kazu Hirata	7bfa06f6c0	[CodeGen] Use range-based for loops (NFC)	2022-07-23 16:10:46 -07:00
Simon Pilgrim	ac8be21365	[DAG] isSplatValue - don't attempt to merge any BITCAST sub elements if they contain UNDEFs We still haven't found a solution that correctly handles 'don't care' sub elements properly - given how close it is to the next release branch, I'm making this fail safe change and we can revisit this later if we can't find alternatives. NOTE: This isn't a reversion of D128570 - it's the removal of undef handling across bitcasts entirely Fixes #56520	2022-07-23 18:38:48 +01:00
Dmitri Gribenko	aba43035bd	Use llvm::sort instead of std::sort where possible llvm::sort is beneficial even when we use the iterator-based overload, since it can optionally shuffle the elements (to detect non-determinism). However llvm::sort is not usable everywhere, for example, in compiler-rt. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D130406	2022-07-23 15:19:05 +02:00
Simon Pilgrim	5f89d2bae9	[DAG] Move OR(AND(X,C1),AND(OR(X,Y),C2)) -> OR(AND(X,OR(C1,C2)),AND(Y,C2)) fold to SimplifyDemandedBits This will fix the SystemZ v3i31 memcpy regression in D77804 (with the help of D129765 as well....). It should also allow us to /bend/ the oneuse limitation for cases where we can use demanded bits to safely peek though multiple uses of the AND ops.	2022-07-23 13:17:24 +01:00
Simon Pilgrim	6aff1b7b3c	[DAG] SimplifyDemandedBits - pull out repeated getValueType() calls. NFC.	2022-07-23 12:01:54 +01:00
Simon Pilgrim	2421a5af72	[DAG] ExpandIntRes_ADDSUB - create UADDO/USUBO instead of ADDCARRY/SUBCARRY if overflow is known to be zero As noticed on D127115, when splitting ADD/SUB nodes we often end up with cases where overflow from the lower bits is impossible - in such cases we're better off breaking the carry chain dependency as soon as possible. This path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen diff without a topological worklist.	2022-07-23 11:13:44 +01:00
Simon Pilgrim	8937252465	[DAG] computeKnownBits - add basic shift-by-parts handling Concat KnownBits from ISD::SHL_PARTS / ISD::SRA_PARTS / ISD::SRL_PARTS lo/hi operands and perform the KnownBits calculation by the shift amount on the extended type, before splitting the KnownBits based on the requested lo/hi result.	2022-07-23 09:46:30 +01:00
ARCHIT SAXENA	3bb1ce2319	Add a nop instruction if a section starts with landing pad for function splitter This change adds a nop instruction if section starts with landing pad. This change is like [D73739](https://reviews.llvm.org/D73739) which avoids zero offset landing pad in basic block sections. Detailed description: The current machine functions splitter can create ˜sections which start with a landing pad themselves. This places landing pad at offset zero from LPStart. ``` .section .text.split.foo10,"ax",@progbits foo10.cold: # %lpad .cfi_startproc .cfi_personality 3, __gxx_personality_v0 .cfi_lsda 3, .Lexception5 .cfi_def_cfa %rsp, 16 .Ltmp11: <--- This is a Landing pad and also LP Start as it is start of this section movq %rax, %rdi <--- first instruction is at offest 0 from LPStart callq _Unwind_Resume@PLT ``` This will cause landing pad entries to become zero (.Ltmp11-foo10.cold) ``` .Lcst_begin4: .uleb128 .Ltmp9-.Lfunc_begin2 # >> Call Site 1 << .uleb128 .Ltmp10-.Ltmp9 # Call between .Ltmp9 and .Ltmp10 .uleb128 .Ltmp11-foo10.cold <---This is zero # jumps to .Ltmp11 .byte 3 # On action: 2 .uleb128 .Ltmp10-.Lfunc_begin2 # >> Call Site 2 << .uleb128 .Lfunc_end9-.Ltmp10 # Call between .Ltmp10 and .Lfunc_end9 .byte 0 # has no landing pad .byte 0 # On action: cleanup .p2align 2 ``` The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. This change adds a nop instruction at start of such sections so that such a case could be avoided. Output: ``` .section .text.split.foo10,"ax",@progbits foo10.cold: # %lpad .cfi_startproc .cfi_personality 3, __gxx_personality_v0 .cfi_lsda 3, .Lexception5 .cfi_def_cfa %rsp, 16 nop <--- new instruction that is added .Ltmp11: movq %rax, %rdi callq _Unwind_Resume@PLT ``` Reviewed By: modimo, snehasish, rahmanl Differential Revision: https://reviews.llvm.org/D130133	2022-07-22 15:20:10 -07:00
Craig Topper	be208b40c1	[DAGCombiner] Simplify code around call to reduceLoadWidth in visitAND. NFC We were looking for loads or any_extend+load. reduceLoadWidth hasn't known how to look through such an any_extend to find the load since D40667 almost 5 years ago. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130333	2022-07-22 08:36:56 -07:00
Nikita Popov	c2be703c6c	[AsmPrinter] Move lowerConstant() error code out of switch (NFC) Move this out of the switch, so that different branches can indicate an error by breaking out of the switch. This becomes important if there are more than the two current error cases.	2022-07-22 16:08:28 +02:00
Cullen Rhodes	bf268a05cd	[AArch64] Emit vector FP cmp when LE is used with fast-math Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130093	2022-07-22 07:53:55 +00:00
jacquesguan	e60eb7053d	recommit "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat." With fix for AArch64 and Hexgon test cases.	2022-07-21 17:34:34 +08:00
David Green	23d6186be0	[SelectionDAG] Fix fptoi.sat scalable vector lowering Vector fptosi_sat and fptoui_sat were being expanded by unrolling the vector operation. This doesn't work for scalable vector, so this patch adds a call to TLI.expandFP_TO_INT_SAT if the vector is scalable. Scalable tests are added for AArch64 and RISCV. Some of the AArch64 fptoi_sat operations should be legal, but that will be handled in another patch. Differential Revision: https://reviews.llvm.org/D130028	2022-07-21 08:00:22 +01:00
esmeyi	339392ecf2	[AIX] follow-up of D124654. Emitting the remaining aliases instead of reporting an error to avoid SPEC2017 PEAK failures. And mark this as a TODO.	2022-07-21 01:10:09 -04:00
Simon Pilgrim	029e83b401	[DAG] getNode - don't bother creating ADDO(X,0) or SUBO(X,0) nodes. Similar to what we already do in getNode for basic ADD/SUB nodes, return the X operand directly, but here we know that there will be no/zero overflow as well. As noted on D127115 - this path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen without a topological worklist.	2022-07-20 12:04:33 +01:00
Simon Pilgrim	766cd95481	[DAG] getNode - assert that ADDO/SUBO nodes have the correct ops + types	2022-07-20 11:23:58 +01:00
Simon Pilgrim	9fc347aa4e	[DAG] PromoteIntRes_BUILD_VECTOR - extend constant boolean vectors according to target BooleanContents PromoteIntRes_BUILD_VECTOR currently always ANY_EXTENDs build vector operands, but if this is a constant boolean vector we're losing the useful ability to keep the vector matching the BooleanContents mode used by the target. This patch extends constant boolean vectors according to target BooleanContents, allowing a number of additional all-bits folds (notable XOR -> NOT conversions) to occur. Differential Revision: https://reviews.llvm.org/D129641	2022-07-20 10:49:31 +01:00
Lorenzo Albano	07d69d9fc9	[VP] Legalize the stride operand for EXPERIMENTAL_VP_STRIDED SDNodes Add promotion and expansion of integer operands for experimental_vp_strided SelectionDAG nodes; the expansion is actually just a truncation of the stride operand. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D123112	2022-07-20 10:22:43 +02:00
Kazu Hirata	76e18cc4f6	[llvm] Use llvm::any_of and llvm::none_of (NFC)	2022-07-20 00:36:19 -07:00
Kazu Hirata	0387da6f4f	Use value instead of getValue (NFC)	2022-07-19 21:18:26 -07:00
Kazu Hirata	41ae78ea3a	Use has_value instead of hasValue (NFC)	2022-07-19 20:15:44 -07:00
Kazu Hirata	bbbb4393ee	[CodeGen] Use value_or instead of getValueOr (NFC)	2022-07-19 19:50:43 -07:00
David Truby	4c82f56d8f	[llvm][SVE] Remove redundant and when comparing against extending load When determining if an `and` should be merged into an extending load the constant argument to the `and` is currently not checked if the argument requires truncation. This prevents the combine happening when the vector width is half the normal available vector width for SVE VLA vectors. Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D129281	2022-07-19 17:08:32 +01:00
Simon Pilgrim	71c502cbca	[DAG] Call SimplifyDemandedBits from ISD::MUL nodes Noticed while triaging D129765.	2022-07-19 14:11:04 +01:00
Benjamin Kramer	8aff88fd3a	[LegalizeDAG] Propagate alignment in ExpandExtractFromVectorThroughStack Unlike the name suggests this can reuse any store as a base for a memory-based vector extract. If that store is underaligned the loads created to extract will have an invalid alignment. Since most CPUs are forgiving wrt alignment this is almost never an issue, on x86 this is only reproducible by extracting a 128 bit vector out of a wider vector. I tried making a test case in the context of https://reviews.llvm.org/D127982 but it's really really fragile, as the output pretty much looks like a missed optimization.	2022-07-19 13:13:55 +02:00
Simon Pilgrim	0f6b0461b0	[DAG] SimplifyDemandedBits - relax "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" to match only demanded bits The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XOR mask being a shifted all-bits mask, but we can relax this to only need to match under the demanded bits. This helps expose more bit extraction/clearing patterns and fixes the PowerPC testCompares*.ll regressions from D127115 Alive2: https://alive2.llvm.org/ce/z/fl7T7K Differential Revision: https://reviews.llvm.org/D129933	2022-07-19 10:59:07 +01:00
Max Kazantsev	69b284aaf6	Revert "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat." This reverts commit `58dfaaaace`. Massive AARCH test failures in buildbot.	2022-07-19 13:41:52 +07:00
jacquesguan	58dfaaaace	[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat. This revision supports to scalarize a binary operation of two scalable splat vectors. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122791	2022-07-19 11:20:51 +08:00
Matt Arsenault	8d0383eb69	CodeGen: Remove AliasAnalysis from regalloc This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.	2022-07-18 17:23:41 -04:00
Jay Foad	dbed4326dd	[LiveIntervals] Find better anchoring end points when repairing ranges r175673 changed repairIntervalsInRange to find anchoring end points for ranges automatically, but the calculation of Begin included the first instruction found that already had an index. This patch changes it to exclude that instruction: 1. For symmetry, so that the half open range [Begin,End) only includes instructions that do not already have indexes. 2. As a possible performance improvement, since repairOldRegInRange will scan fewer instructions. 3. Because repairOldRegInRange hits assertion failures in some cases when it sees a def that already has a live interval. (3) fixes about ten tests in the CodeGen lit test suite when -early-live-intervals is forced on. Differential Revision: https://reviews.llvm.org/D110182	2022-07-18 19:34:43 +01:00
Itay Bookstein	2570f226d1	[SDAG] Remove single-result restriction on commutative CSE The DAG Combiner unnecessarily restricts commutative CSE to nodes with a single result value. This commit removes that restriction. Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D129666	2022-07-18 19:19:13 +03:00
Lorenzo Albano	c00a44fa68	[VP] IR expansion pass for VP gather and scatter Add vp_gather and vp_scatter expansion to unpredicated intrinsics. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D120664	2022-07-18 17:00:38 +02:00
Nikita Popov	56b4b6e81b	[SDAG] Fix release build This variable was only declared in debug builds, but is needed in release builds as well.	2022-07-18 14:10:31 +02:00

... 5 6 7 8 9 ...

33283 Commits