llvm-project

Commit Graph

Author	SHA1	Message	Date
Stefan Pintilie	e021de0aab	[PowerPC] Exploit paddi instruction on Power 10 for constant materialization Starting with Power 10 the instruction paddi is available to use. The instruction allows for immediates that are 34 bits. This patch adds exploitation of the paddi instruction to allow us to materialize constants. Reviewed By: lei, amyk Differential Revision: https://reviews.llvm.org/D93300	2021-03-11 08:37:49 -06:00
Qiu Chaofan	72c4cbd60e	[PowerPC] Fix multi-use case for swap reduction `4c973ae` implemented reduction of vector swap for lane-insensitive operations. This commit fixes it for checking number of uses of the vector operation.	2021-03-11 21:58:33 +08:00
Nikita Popov	46354bac76	[OpaquePtrs] Remove some uses of type-less CreateLoad APIs (NFC) Explicitly pass loaded type when creating loads, in preparation for the deprecation of these APIs. There are still a couple of uses left.	2021-03-11 14:40:57 +01:00
Bradley Smith	860ae9d50c	[AArch64][SVE] Add fixed/scalable lowering of FMAXIMUM/FMINIMUM ISD nodes Differential Revision: https://reviews.llvm.org/D98348	2021-03-11 13:37:47 +00:00
Bradley Smith	ea834c8365	Revert "[AArch64][SVE] Allow accesses to SVE stack objects to use frame pointer" This patch introduced codegen faults. An attempt to fix this was done in https://reviews.llvm.org/D97193, but ultimately it was decided to approach this differently. This reverts commit `42635856ed`. Differential Revision: https://reviews.llvm.org/D98350	2021-03-11 13:32:35 +00:00
Nikita Popov	2489cbaa80	[PowerPC] Fix infinite loop in peephole CR optimization (PR49509) If we encounter a degenerate select node where both operands are the same, then we can continue negating the condition while swapping operands, resulting in an infinite loop. Avoid this by bailing out if both operands are the same. Fixes https://bugs.llvm.org/show_bug.cgi?id=49509. Differential Revision: https://reviews.llvm.org/D98340	2021-03-11 14:25:22 +01:00
Simon Pilgrim	bc5e9ec2dc	Revert rGcd938ab162b0ac560dd0e9fee290980c7e0e47e5 "[X86] canonicalizeShuffleWithBinOps - add X86ISD::PSHUFB handling." Investigating an issue reported by @bkramer, possibly when the PSHUFB mask generates zero elements.	2021-03-11 13:14:00 +00:00
Simon Pilgrim	77394c12a4	[X86] Don't attempt to fold sub(C1, xor(X, C2)) with opaque constants Fixes PR49451	2021-03-11 12:06:40 +00:00
Ruiling Song	66340846b3	[AMDGPU] Always create Stack Object for reserved VGPR As we may overwrite inactive lanes of a caller-save-vgpr, we should always save/restore the reserved vgpr for sgpr spill. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D98319	2021-03-11 10:06:07 +08:00
David Green	1a808286ef	[AArch64] Extend vecreduce -> udot handling to mla reductions We previously have lowering for: vecreduce.add(zext(X)) to vecreduce.add(UDOT(zero, X, one)) This extends that to also handle: vecreduce.add(mul(zext(X), zext(Y)) to vecreduce.add(UDOT(zero, X, Y)) It extends the existing code to optionally handle a mul with equal extends. Differential Revision: https://reviews.llvm.org/D97280	2021-03-10 22:25:12 +00:00
David Green	a02f506876	[AArch64] Extend vecreduce -> udot handling to v8i8 https://reviews.llvm.org/D88577 added v16i8 vecreduce to udot/sdot lowering. This extends that to v8i8 too, generalizing the pattern to handle the extra types. Differential Revision: https://reviews.llvm.org/D97279	2021-03-10 21:03:15 +00:00
Stanislav Mekhanoshin	9931b1f7a4	[AMDGPU] Disable SCC bit on fp atomics Differential Revision: https://reviews.llvm.org/D98221	2021-03-10 12:36:09 -08:00
Stanislav Mekhanoshin	574a9dabc6	[AMDGPU] Always expand system scope fp atomics on gfx90a FP atomics in system scope cannot be used and shall always be expanded in a CAS loop. Differential Revision: https://reviews.llvm.org/D98085	2021-03-10 12:35:23 -08:00
Amy Kwan	8b540c542c	[PowerPC] Implement patterns for PC-Rel zextload/extload byte loads This patch adds patterns to select the PC-Relative extloadi1 and zextloadi1 byte loads. Differential Revision: https://reviews.llvm.org/D98042	2021-03-10 12:18:13 -06:00
Craig Topper	9106d04554	[RISCV][SelectionDAG] Introduce an ISD::SPLAT_VECTOR_PARTS node that can represent a splat of 2 i32 values into a nxvXi64 vector for riscv32. On riscv32, i64 isn't a legal scalar type but we would like to support scalable vectors of i64. This patch introduces a new node that can represent a splat made of multiple scalar values. I've used this new node to solve the current crashes we experience when getConstant is used after type legalization. For RISCV, we are now default expanding SPLAT_VECTOR to SPLAT_VECTOR_PARTS when needed and then handling the SPLAT_VECTOR_PARTS later during LegalizeOps. I've remove the special case I previously put in for ABS for D97991 as the default expansion is now able to succesfully use getConstant. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98004	2021-03-10 09:46:18 -08:00
Craig Topper	0c73a506e8	[RISCV] Starting fixing issues that prevent us from testing vXi64 intrinsics on RV32. Currently we crash in type legalization any time an intrinsic uses a scalar i64 on RV32. This patch adds support for type legalizing this to prevent crashing. I don't promise that it uses the best possible codegen just that it is functional. This first version handles 3 cases. vmv.v.x intrinsic, vmv.s.x intrinsic and intrinsics that take a scalar input, splat it and then do some operation. For vmv.v.x we'll either rely on hardware sign extension for constants or we'll convert it to multiple splats and bit manipulation. For vmv.s.x we use a really unoptimal sequence inspired by what we do for an INSERT_VECTOR_ELT. For the third case we'll either try to use the .vi form for constants or convert to a complicated splat and bitmanip and use the .vv form of the operation. I've renamed the ExtendOperand field to SplatOperand now use it specifically for the third case. The first two cases are handled by custom lowering specifically for those intrinsics. I haven't updated all tests yet, but I tried to cover a subset that includes single-width, widening, and narrowing. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97895	2021-03-10 09:45:38 -08:00
Craig Topper	1e39118638	[RISCV] Manually split vector operands to VECREDUCE when handling vXi64 vectors on RV32. The type legalizer will visit the result before the operands. To avoid creating an illegal target specific node or falling back to scalarization, we need to manually split vector operands. This still doesn't handle the case of non-power of 2 operands which need to be widened. I'm not sure the type legalizer is ready for it. I think we would need to insert an INSERT_SUBVECTOR with the power of 2 type we want, with an undef first operand, and the non-power of 2 orignal operand as the vector to insert. Then fill in the neutral elements into the elements the padded elements. Alternatively we INSERT_SUBVECTOR into a neutral vector. From there we carry on splitting if needed to get to a legal type then do the target specific code. The problem with this is the type legalizer doesn't know how to widen an insert_subvector yet. We would need to add that including the handling for a non-undef first vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98292	2021-03-10 09:27:38 -08:00
Stephen Tozer	1db137b185	[DebugInfo] Handle DBG_VALUES with multiple variable location operands in MIR This patch adds handling for DBG_VALUE_LIST in the MIR-passes (after finalize-isel), excluding the debug liveness passes and DWARF emission. This most significantly affects MachineSink, which now needs to consider all used registers of a debug value when sinking, but for most passes this change is simply replacing getDebugOperand(0) with an iteration over all debug operands. Differential Revision: https://reviews.llvm.org/D92578	2021-03-10 17:15:24 +00:00
Jay Foad	70f013fd3b	[AMDGPU] Fix isReallyTriviallyReMaterializable for V_MOV_* D57708 changed SIInstrInfo::isReallyTriviallyReMaterializable to reject V_MOVs with extra implicit operands, but it accidentally rejected all V_MOVs because of their implicit use of exec. Fix it but avoid adding a moderately expensive call to MI.getDesc().getNumImplicitUses(). In real graphics shaders this changes quite a few vgpr copies into move- immediates, which is good for avoiding stalls on GFX10. Differential Revision: https://reviews.llvm.org/D98347	2021-03-10 16:18:12 +00:00
Yusra Syeda	023b5c1ed8	[SystemZ][NFC] Renaming of ELF specific variables. Rename ELF specific variables, making it easier to add the XPLink variables in future patches. Reviewed By: abhina.sreeskantharajan, Kai Differential Revision: https://reviews.llvm.org/D98199	2021-03-10 10:15:01 -05:00
Jingu Kang	25951c5ab8	[AArch64] Add missing intrinsics for scalar FP rounding Differential Revision: https://reviews.llvm.org/D98269	2021-03-10 13:22:29 +00:00
Qiu Chaofan	4c973ae51b	[PowerPC] Reduce symmetrical swaps for lane-insensitive vector ops This patch simplifies pattern (xxswap (vec-op (xxswap a) (xxswap b))) into (vec-op a b) if vec-op is lane-insensitive. The motivating case is ScalarToVector-VecOp-ExtractElement sequence on LE, but the peephole itself is not related to endianness, so BE may also benefit from this. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D97658	2021-03-10 15:21:32 +08:00
David Blaikie	b627802e81	Remove unused variable (rolling it into an assert)	2021-03-09 16:06:44 -08:00
Albion Fung	9b6ac9e999	[P10] [Power PC] Exploiting new load rightmost vector element instructions. This pull request implements patterns to exploit the load rightmost vector element instructions for loading element 0 on little endian PowerPC subtargets into v8i16 and v16i8 vector registers for i16 and i8 data types. Differential Revision: https://reviews.llvm.org/D94816#inline-921403	2021-03-09 16:08:17 -05:00
Amara Emerson	45a9dca015	[AArch64][GlobalISel] Form G_DUPLANE32 for <2 x s32> shufflevectors in lowering. For <2 x s32>, we can use G_DUPLANE32, but with a <4 x s32> source. To make it work, we can just widen the original source with a concat_vectors. Doing this allows <2 x float> indexed fmul instruction selection patterns to fire, which gives a nice 0.3% code size saving on Bullet with -Os. Differential Revision: https://reviews.llvm.org/D98059	2021-03-09 11:36:26 -08:00
Amara Emerson	e60ab72137	[AArch64][GlobalISel] Add combine for extract_vector_elt(build_vector, cst) Differential Revision: https://reviews.llvm.org/D97835	2021-03-09 11:08:02 -08:00
Jay Foad	288ea820cf	[AMDGPU] Refactor AMDGPUTargetStreamer::EmitCodeEnd Refactor and add comments to explain where the magic numbers come from in terms of the instruction cache line size. NFC. Differential Revision: https://reviews.llvm.org/D98266	2021-03-09 19:02:18 +00:00
Craig Topper	351844edf1	[RISCV] Add support for VECTOR_REVERSE for scalable vector types. I've left mask registers to a future patch as we'll need to convert them to full vectors, shuffle, and then truncate. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97609	2021-03-09 10:03:45 -08:00
Amara Emerson	3ce9e223cb	[AArch64][GlobalISel] Lower scalar G_{SMIN, SMAX, UMIN, UMAX}.	2021-03-09 10:03:16 -08:00
Christudasan Devadasan	24c0ad7143	[AMDGPU] Fix the dead frame indices during custom spill lowering. AMDGPU target tries to handle the SGPR and VGPR spills in a custom pass before the actual frame lowering pass. Once they are handled and the respective frames are eliminated in the custom pass, certain uses of them still remain. For instance, the DBG_VALUE instructions inserted by the allocator alongside the spill instruction will use the corresponding frame index. They become dead later during PEI and causes a crash while trying to replace the frame indices. We should possibly avoid this custom pass. For now, replacing such dead references with null register value. Reviewed By: arsenm, scott.linder Differential Revision: https://reviews.llvm.org/D98038	2021-03-09 23:22:49 +05:30
Craig Topper	77ac3166e5	[RISCV] Add support for fixed vector reductions. I've included tests that require type legalization to split the vector. The i64 version of these scalarizes on RV32 due to type legalization visiting the result before the vector type. So we have to abort our custom expansion to avoid creating target specific nodes with an illegal type. Then type legalization ends up scalarizing. We might be able to fix this by doing custom splitting for large vectors in our handler to get down to a legal type. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98102	2021-03-09 09:39:59 -08:00
Craig Topper	1c7ad4dd88	[RISCV] Don't modify the SEW immediate on the V extension pseudo instructions after inserting VSETVLI. Previously we set the value to -1, but the SEW information could be useful for scheduling. Reviewed By: frasercrmck, rogfer01 Differential Revision: https://reviews.llvm.org/D98062	2021-03-09 09:02:19 -08:00
Craig Topper	72ecf2f43f	[RISCV] Optimize fixed vector ABS. Fix crash on scalable vector ABS for SEW=64 with RV32. The default fixed vector expansion uses sra+xor+add since it can't see that smax is legal due to our custom handling. So we select smax(X, sub(0, X)) manually. Scalable vectors are able to use the smax expansion automatically for most cases. It crashes in one case because getConstant can't build a SPLAT_VECTOR for nxvXi64 when i64 scalars aren't legal. So we manually emit a SPLAT_VECTOR_I64 for that case. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97991	2021-03-09 08:51:03 -08:00
Craig Topper	478317fbb7	[RISCV] Make the hasStdExtM() check in RISCVInstrInfo::getVLENFactoredAmount emit a diagnostic rather than an assert. As far as I know we're not enforcing the StdExtM must be enabled to use the V extension. If we use an assert here and hit this code in a release build we'll silently emit an invalid instruction. By using a diagnostic we report the error to the user in release builds. I think there may still be a later fatal error from the code emitter though. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97970	2021-03-09 08:50:02 -08:00
gbtozers	df69c69427	[DebugInfo] Handle multiple variable location operands in IR This patch updates the various IR passes to correctly handle dbg.values with a DIArgList location. This patch does not actually allow DIArgLists to be produced by salvageDebugInfo, and it does not affect any pass after codegen-prepare. Other than that, it should cover every IR pass. Most of the changes simply extend code that operated on a single debug value to operate on the list of debug values in the style of any_of, all_of, for_each, etc. Instances of setOperand(0, ...) have been replaced with with replaceVariableLocationOp, which takes the value that is being replaced as an additional argument. In places where this value isn't readily available, we have to track the old value through to the point where it gets replaced. Differential Revision: https://reviews.llvm.org/D88232	2021-03-09 16:44:38 +00:00
Oliver Stannard	8d632ca436	[ARM] Add comment explaining stack frame layout Add a comment explaining how we lay out stack frames for ARM targets, based on the existing one for AArch64. Also expand the comment to explain reserved call frames for both architectures. Differential revision: https://reviews.llvm.org/D98258	2021-03-09 15:20:32 +00:00
Simon Pilgrim	d0884541cc	[X86] canonicalizeShuffleWithBinOps - add binary shuffle handling	2021-03-09 13:57:03 +00:00
Liu, Chen3	b70e02a7e7	[X86][NFC] Move instruction selection of the x86_tdpb[s,u]d_internal and x86_tilezero_internal to X86InstrAMX.td Differential Revision: https://reviews.llvm.org/D97997	2021-03-09 21:27:39 +08:00
Liu, Chen3	3618b21298	[X86][NFC] Adding one flag to imply whether the instruction should check the predicate when compress EVEX instructions to VEX encoding. Some EVEX instructions should check the predicates when compress to VEX encoding. For example, avx512vnni instructions. This is because avx512vnni doesn't mean that avxvnni is supported on the target. This patch moving the manually added check to .inc that generated by tablegen. Differential Revision: https://reviews.llvm.org/D98011	2021-03-09 19:58:01 +08:00
Simon Pilgrim	4f7dd715b5	M68kInstrInfo::AnalyzeBranchImpl - fix MSVC build. NFCI. MSVC couldn't resolve the decltype of a capture of a capture - but is happy with an auto.	2021-03-09 11:27:26 +00:00
Cullen Rhodes	2750f3ed31	[IR] Introduce llvm.experimental.vector.splice intrinsic This patch introduces a new intrinsic @llvm.experimental.vector.splice that constructs a vector of the same type as the two input vectors, based on a immediate where the sign of the immediate distinguishes two variants. A positive immediate specifies an index into the first vector and a negative immediate specifies the number of trailing elements to extract from the first vector. For example: @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> ; index @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing element count These intrinsics support both fixed and scalable vectors, where the former is lowered to a shufflevector to maintain existing behaviour, although while marked as experimental the recommended way to express this operation for fixed-width vectors is to use shufflevector. For scalable vectors where it is not possible to express a shufflevector mask for this operation, a new ISD node has been implemented. This is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker and Cullen Rhodes. [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D94708	2021-03-09 10:44:22 +00:00
ShihPo Hung	5cdb2e9860	[RISCV][MC] Fix nf encoding for vector ld/st whole register The three bit nf is one less than the number of NFIELDS, so we manually decrement 1 for VS1/2/4/8R & VL1/2/4/8R. Reviewed By: craig.topper Differential revision: https://reviews.llvm.org/D98185	2021-03-08 19:30:24 -08:00
Ruiling Song	67a05f4e09	[AMDGPU] Remove unused function opcodeEmitsNoInsts() This was missed in the patch D97545, and cause buildbot failure. Reviewed by: critson Differential Revision: https://reviews.llvm.org/D98229	2021-03-09 10:48:30 +08:00
Ruiling Song	f0ccdde3c9	[AMDGPU] Remove SI_MASK_BRANCH This is already deprecated, so remove code working on this. Also update the tests by using S_CBRANCH_EXECZ instead of SI_MASK_BRANCH. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D97545	2021-03-09 09:13:23 +08:00
Akira Hatanaka	dca5737945	Move ObjCARCUtil.h back to llvm/Analysis Instead of adding the header to llvm/IR, just duplicate the marker string in the auto upgrader.	2021-03-08 16:35:24 -08:00
Lei Huang	535a4192a9	[AIX][TLS] Generate 64-bit general-dynamic access code sequence Add support for the TLS general dynamic access model to assembly files on AIX 64-bit. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D98078	2021-03-08 16:41:25 -06:00
Masoud Ataei	820f508b08	[PowerPC] Removing _massv place holder Since P8 is the oldest machine supported by MASSV pass, _massv place holder is removed and the oldest version of MASSV functions is assumed. If the P9 vector specific is detected in the compilation process, the P8 prefix will be updated to P9. Differential Revision: https://reviews.llvm.org/D98064	2021-03-08 21:43:24 +00:00
Jessica Paquette	5c26be214d	[AArch64][GlobalISel] Lower G_BUILD_VECTOR -> G_DUP If we have ``` %vec = G_BUILD_VECTOR %reg, %reg, ..., %reg ``` Then lower it to ``` %vec = G_DUP %reg ``` Also update the selector to handle constant splats on G_DUP. This will not combine when the splat is all zeros or ones. Tablegen-imported patterns rely on these being G_BUILD_VECTOR. Minor code size improvements on CTMark at -Os. Also adds some utility functions to make it a bit easier to recognize splats, and an AArch64-specific splat helper. Differential Revision: https://reviews.llvm.org/D97731	2021-03-08 13:01:10 -08:00
Min-Yih Hsu	5ac19e0acf	[M68k](5/8) Target-specific lowering - TargetMachine implementation for M68k - ISel, ISched for M68k - Other lowering (e.g. FrameLowering) - AsmPrinter Authors: myhsu, m4yers, glaubitz Differential Revision: https://reviews.llvm.org/D88391	2021-03-08 12:30:57 -08:00
Min-Yih Hsu	8dddc15297	[M68k](4/8) MC layer and object file support - Add the M68k-specific MC layer implementation - Add ELF support for M68k - Add M68k-specifc CC and reloc TODO: Currently AsmParser and disassembler are not implemented yet. Please use this bug to track the status: https://bugs.llvm.org/show_bug.cgi?id=48976 Authors: myhsu, m4yers, glaubitz Differential Revision: https://reviews.llvm.org/D88390	2021-03-08 12:30:57 -08:00
Min-Yih Hsu	bec7b16692	[M68k](3/8) Skeleton and target description files - Infrastructure for the target (i.e. build files, target triple etc.) - All of the target description TableGen file Authors: myhsu, m4yers, glaubitz Differential Revision: https://reviews.llvm.org/D88389	2021-03-08 12:30:57 -08:00
Yuta Saito	aa0c571a5f	[WebAssembly] Add new relocation for location relative data This `R_WASM_MEMORY_ADDR_SELFREL_I32` relocation represents an offset between its relocating address and the symbol address. It's very similar to `R_X86_64_PC32` but restricted to be used for only data segments. ``` S + A - P ``` A: Represents the addend used to compute the value of the relocatable field. P: Represents the place of the storage unit being relocated. S: Represents the value of the symbol whose index resides in the relocation entry. Proposal: https://github.com/WebAssembly/tool-conventions/issues/162 Differential Revision: https://reviews.llvm.org/D96659	2021-03-08 11:34:10 -08:00
Craig Topper	7a64cc4a76	[RISCV] Make use of DAG.getNeutralElement in lowerVECREDUCE to avoid repeating the same list of constants. NFC Reviewed By: frasercrmck, khchen Differential Revision: https://reviews.llvm.org/D98091	2021-03-08 09:11:10 -08:00
Craig Topper	a2651266c5	[RISCV] Add explicit i64 types to RV64 isel patterns to stop tablegen from generating unneeded i32 patterns for RV32 HwMode.	2021-03-08 09:06:56 -08:00
Nemanja Ivanovic	b0f0115308	[AIX][TLS] Generate 32-bit general-dynamic access code sequence Adds support for the TLS general dynamic access model to assembly files on AIX 32-bit. To generate the correct code sequence when accessing a TLS variable `v`, we first create two TOC entry nodes, one for the variable offset, one for the region handle. These nodes are followed by a `PPCISD::TLSGD_AIX` node (new node introduced by this patch). The `PPCISD::TLSGD_AIX` node (`TLSGDAIX` pseudo instruction) is expanded to 2 copies (to put the variable offset and region handle in the right registers) and a call to `__tls_get_addr`. This patch also changes the way TC entries are generated in asm files. If the generated TC entry is for the region handle of a TLS variable, we add the `@m` relocation and the `.` prefix to the entry name. For example: ``` L..C0: .tc .v[TC],v[TL]@m -> region handle L..C1: .tc v[TC],v[TL] -> variable offset ``` Reviewed By: nemanjai, sfertile Differential Revision: https://reviews.llvm.org/D97948	2021-03-08 09:30:19 -06:00
Anirudh Prasad	7a46d34a19	[SystemZ][z/OS] Add support to validate a HLASM Label. - This patch adds in support to determine whether a particular label is valid for the hlasm variant - The label syntax being checked is that of an ordinary HLASM symbol (Reference, Chapter 2 (Coding and Structure) - Terms, Literals and Expressions - Terms - Symbols - Ordinary Symbol) - To achieve this, the virtual function isLabel defined in MCTargetAsmParser.h is made use of - The isLabel function is overridden in SystemZAsmParser for the hlasm variant, and the syntax is checked appropriately - Things remain unchanged for the att variant - Further patches will add in support to emit the label. These future patches will make use of this isLabel function Reviewed By: uweigand, Kai Differential Revision: https://reviews.llvm.org/D97748	2021-03-08 09:55:39 -05:00
gbtozers	e5d958c456	[DebugInfo] Support DIArgList in DbgVariableIntrinsic This patch updates DbgVariableIntrinsics to support use of a DIArgList for the location operand, resulting in a significant change to its interface. This patch does not update all IR passes to support multiple location operands in a dbg.value; the only change is to update the DbgVariableIntrinsic interface and its uses. All code outside of the intrinsic classes assumes that an intrinsic will always have exactly one location operand; they will still support DIArgLists, but only if they contain exactly one Value. Among other changes, the setOperand and setArgOperand functions in DbgVariableIntrinsic have been made private. This is to prevent code from setting the operands of these intrinsics directly, which could easily result in incorrect/invalid operands being set. This does not prevent these functions from being called on a debug intrinsic at all, as they can still be called on any CallInst pointer; it is assumed that any code directly setting the operands on a generic call instruction is doing so safely. The intention for making these functions private is to prevent DIArgLists from being overwritten by code that's naively trying to replace one of the Values it points to, and also to fail fast if a DbgVariableIntrinsic is updated to use a DIArgList without a valid corresponding DIExpression.	2021-03-08 14:36:13 +00:00
Ahsan Saghir	acce401068	[PowerPC] Change target data layout for 16-byte stack alignment This changes the target data layout to make stack align to 16 bytes on Power10. Before this change, stack was being aligned to 32 bytes. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D96265	2021-03-08 08:13:08 -06:00
Simon Pilgrim	f71cee136d	[X86] Break if-else chain. NFCI. Both if blocks affect control flow - we don't need the else. Fixes clang-tidy warning.	2021-03-08 11:44:31 +00:00
Fraser Cormack	18173c57bd	[RISCV] Add new entry points to getContainerForFixedLengthVector While working on adding fixed-length vectors to the calling convention, it was necessary to be able to query for a fixed-length vector container type without access to an instance of SelectionDAG. This patch modifies the "main" getContainerForFixedLengthVector function to use an instance of TargetLowering rather than SelectionDAG, and preserves the SelectionDAG overload as a wrapper. An additional non-static version of the function was also added to simplify the common case in RISCVTargetLowering. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97925	2021-03-08 09:26:19 +00:00
Freddy Ye	5f9489b754	[X86] Refine "Support -march=alderlake" Refine "Support -march=alderlake" Compare with tremont, it includes 25 more new features. They are adx, aes, avx, avx2, avxvnni, bmi, bmi2, cldemote, f16c, fma, hreset, invpcid, kl, lzcnt, movdir64b, movdiri, pclmulqdq, pconfig, pku, serialize, shstk, vaes, vpclmulqdq, waitpkg, widekl. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D97832	2021-03-08 13:17:18 +08:00
Craig Topper	c91b3c9e63	[RISCV] Fold (select_cc (setlt X, Y), 0, ne, trueV, falseV) -> (select_cc X, Y, lt, trueV, falseV) A setcc can be created during LegalizeDAG after select_cc has been created. This combine will enable us to fold these late setccs. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98132	2021-03-07 09:44:56 -08:00
Craig Topper	fdbd5d3206	[RISCV] Fold (select_cc (xor X, Y), 0, eq/ne, trueV, falseV) -> (select_cc X, Y, eq/ne, trueV, falseV) This pattern occurs when lowering for overflow operations introduce an xor after select_cc has already been formed. I had to rework another combine that looked for select_cc of an xor with 1. That xor will now get combined away so we just need to look for the RHS of the select_cc being 1. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98130	2021-03-07 09:29:55 -08:00
Simon Pilgrim	cd938ab162	[X86] canonicalizeShuffleWithBinOps - add X86ISD::PSHUFB handling.	2021-03-07 12:56:35 +00:00
Simon Pilgrim	772a501bf4	[X86] canonicalizeShuffleWithBinOps - shuffle oneuse constants. We can freely shuffle all ones/zeros constants but we can also freely shuffle other constants as long as they only have one use.	2021-03-07 11:17:03 +00:00
Sean Fertile	f0904a6208	[PowePC][AIX] Handle variadic vector call operands. Patch adds support for passing vector call operands to variadic functions. Arguments which are fixed shadow GPRs and stack space even when they are passed in vector registers, while arguments passed through ellipses are passed in properly aligned GPRs if available and on the stack once all GPR arguments registers are consumed. Differential Revision: https://reviews.llvm.org/D97956	2021-03-06 13:49:55 -05:00
Alexey Lapshin	cf7cdaff64	[X86][VARARG] Avoid spilling xmm registers for va_start. That review is extracted from D69372. It fixes https://bugs.llvm.org/show_bug.cgi?id=42219 bug. For the noimplicitfloat mode, the compiler mustn't generate floating-point code if it was not asked directly to do so. This rule does not work with variable function arguments currently. Though compiler correctly guards block of code, which copies xmm vararg parameters with a check for %al, it does not protect spills for xmm registers. Thus, such spills are generated in non-protected areas and could break code, which does not expect floating-point data. The problem happens in -O0 optimization mode. With this optimization level there is used FastRegisterAllocator, which spills virtual registers at basic block boundaries. Register Allocator does not protect spills with additional control-flow modifications. Thus to resolve that problem, it is suggested to not copy incoming physical registers into virtual registers. Instead, store incoming physical xmm registers into the memory from scratch. Differential Revision: https://reviews.llvm.org/D80163	2021-03-06 15:25:47 +03:00
Jay Foad	99682bc039	Revert "Revert "[AMDGPU] Restore the s_memtime instruction in gfx1030"" This reverts commit `e58d68fcd0`. This reinstates commit `fc28f600e5` with a fix to initialize HasShaderCyclesRegister. See https://reviews.llvm.org/D97928.	2021-03-06 09:00:01 +00:00
Fangrui Song	2d922de3af	[MC][RISCV] Support .reloc , BFD_RELOC_{NONE,32,64}, BFD_RELOC_NONE is useful for ld --gc-sections: it provides a generic way indicating a dependency between two sections.	2021-03-05 21:45:11 -08:00
Fangrui Song	59ff9315fd	[MC][ARM] Support .reloc , BFD_RELOC_{NONE,8,16,32}, BFD_RELOC_NONE is useful for ld --gc-sections: it provides a generic way indicating a dependency between two sections.	2021-03-05 21:39:16 -08:00
Fangrui Song	3110187f1f	[MC][PowerPC] Support .reloc , BFD_RELOC_{NONE,16,32,64}, BFD_RELOC_NONE is useful for ld --gc-sections: it provides a generic way indicating a dependency between two sections.	2021-03-05 21:31:45 -08:00
Fangrui Song	aceea45d87	[MC][AArch64] Support .reloc , BFD_RELOC_{NONE,16,32,64}, BFD_RELOC_NONE is useful for ld --gc-sections: it provides a generic way indicating a dependency between two sections.	2021-03-05 21:31:08 -08:00
Fangrui Song	4f7562d52f	[MC][X86] Support .reloc , BFD_RELOC_{NONE,8,16,32,64}, The names are unfortunate, but BFD_RELOC_NONE provides a generic way indicating a dependency between two sections, which is useful for ld --gc-sections. See https://sourceware.org/bugzilla/show_bug.cgi?id=27530	2021-03-05 21:31:05 -08:00
Mitch Phillips	e58d68fcd0	Revert "[AMDGPU] Restore the s_memtime instruction in gfx1030" Broke the ASan/MSan buildbots. See more comments in the original patch, https://reviews.llvm.org/D97928. Build failure at http://lab.llvm.org:8011/#/builders/5/builds/5327 This reverts commit `fc28f600e5`.	2021-03-05 18:24:59 -08:00
Jay Foad	fc28f600e5	[AMDGPU] Restore the s_memtime instruction in gfx1030 gfx1030 added a new way to implement readcyclecounter using the SHADER_CYCLES hardware register, but the s_memtime instruction still exists, so the MC layer should still accept it and the llvm.amdgcn.s.memtime intrinsic should still work. Differential Revision: https://reviews.llvm.org/D97928	2021-03-05 20:19:11 +00:00
Zarko Todorovski	2b50ce1524	[PowerPC][AIX] Enable the default AltiVec ABI on AIX This patch adds support for the default AltiVec ABI for AIX. Vector registers 20 through 31 are marked as reserved and cannot be used in the default ABI. This patch adds handling for this case and also remove the default AltiVec ABI errors. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D96351	2021-03-05 12:46:27 -05:00
RamNalamothu	3998a8e797	[AMDGPU] Do not attempt sgpr spills to vgpr, when it is disabled This covers a path missed in https://reviews.llvm.org/D95768. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D98013	2021-03-05 22:47:21 +05:30
Jinsong Ji	cc21de6789	[PowerPC] Update Copy/Paste encodings according to ISA3.1 Copy-paste P9 insns were added back in 2016, however, looks like the opcodes has changed in ISA3.1. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D97416	2021-03-05 17:05:50 +00:00
Simon Pilgrim	87d5b34c24	[X86] X86ISelDAGToDAG.cpp - include cstdint instead of stdint.h NFCI. Fixes clang-tidy warning	2021-03-05 15:58:20 +00:00
Simon Pilgrim	f11f86c114	[X86] X86DAGToDAGISel::Select - merge X86::TEST load bitsize checks. NFCI.	2021-03-05 15:58:20 +00:00
LemonBoy	8725b24c6d	[AArch64] Legalize horizontal fmax/fmin reductions on f16 vectors Expand the horizontal reduction during the instruction selection phase, but only if the target doesn't support the full fp16 instruction set. Fixes https://bugs.llvm.org/show_bug.cgi?id=49401 Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D97840	2021-03-05 16:09:37 +01:00
Ilya Leoshkevich	a7137b238a	[BPF] Add support for floats and doubles Some BPF programs compiled on s390 fail to load, because s390 arch-specific linux headers contain float and double types. At the moment there is no BTF_KIND for floats and doubles, so the release version of LLVM ends up emitting type id 0 for them, which the in-kernel verifier does not accept. Introduce support for such types to libbpf by representing them using the new BTF_KIND_FLOAT. Reviewed By: yonghong-song Differential Revision: https://reviews.llvm.org/D83289	2021-03-05 15:10:11 +01:00
Stephen Tozer	f677413071	Reapply "[DebugInfo] Add new instruction and DIExpression operator for variadic debug values" Rewrites test to use correct architecture triple; fixes incorrect reference in SourceLevelDebugging doc; simplifies `spillReg` behaviour so as to not be dependent on changes elsewhere in the patch stack. This reverts commit `d2000b45d0`.	2021-03-05 12:32:05 +00:00
Sebastian Neubauer	e0e73714fb	[AMDGPU] Keep skip branch for ds instructions Same as other memory instructions, ds instructions add latency even if exec is zero. Jumping over them if exec=0 is cheaper than executing them. With this change, the branch instruction that skips over a basic block if exec=0 is not removed when the block contains a ds instruction. Differential Revision: https://reviews.llvm.org/D97922	2021-03-05 12:34:09 +01:00
Jingu Kang	9b302513f6	[AArch64] Add missing intrinsics for vrnd	2021-03-05 11:26:12 +00:00
Simon Pilgrim	3fd2fa1220	Revert rG8198d83965ba4b9db6922b44ef3041030b2bac39: "[X86] Pass to transform amx intrinsics to scalar operation." This reverts commit 8198d83965ba4b9db6922b44ef3041030b2bac39.due to buildbot breakages	2021-03-05 11:09:14 +00:00
Simon Pilgrim	d7b8cb4d57	[X86] X86ISelLowering.cpp - try to use for-range loops. NFCI.	2021-03-05 11:09:14 +00:00
Petar Avramovic	36beaa3ba3	Reland AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect Recommit `bf5a582650`. Depends on `4c8fb7ddd6` which was reverted. RegBankSelect creates zext and trunc when it selects banks for uniform i1. Add zext_trunc_fold from generic combiner to post RegBankSelect combiner. Differential Revision: https://reviews.llvm.org/D95432	2021-03-05 11:05:37 +01:00
Luo, Yuanke	8198d83965	[X86] Pass to transform amx intrinsics to scalar operation. This pass runs in any situations but we skip it when it is not O0 and the function doesn't have optnone attribute. With -O0, the def of shape to amx intrinsics is near the amx intrinsics code. We are not able to find a point which post-dominate all the shape and dominate all amx intrinsics. To decouple the dependency of the shape, we transform amx intrinsics to scalar operation, so that compiling doesn't fail. In long term, we should improve fast register allocation to allocate amx register. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D93594	2021-03-05 16:02:02 +08:00
Luke	d28297ff68	[RISCV] Enable fixed-length vectorization of LoopVectorizer for RISC-V Vector By implementing the method "unsigned RISCVTTIImpl::getRegisterBitWidth(bool Vector)", fixed-length vectorization is enabled when possible. Without this method, the "#pragma clang loop" directive is needed to enable vectorization(or the cost model may inform LLVM that "Vectorization is possible but not beneficial"). Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97549	2021-03-05 10:54:51 +08:00
Chen Zheng	87bbf3d1f8	[XCOFF][DebugInfo] support DWARF for XCOFF for assembly output. Reviewed By: jasonliu Differential Revision: https://reviews.llvm.org/D95518	2021-03-04 21:07:52 -05:00
Yonghong Song	9c0274cdea	BPF: permit type modifiers for __builtin_btf_type_id() relocation Lorenz Bauer from Cloudflare tried to use "const struct <name>" as the type for __builtin_btf_type_id(*(const struct <name>)0, 1) relocation and hit a llvm BPF fatal error. https://lore.kernel.org/bpf/a3782f71-3f6b-1e75-17a9-1827822c2030@fb.com/ ... fatal error: error in backend: Empty type name for BTF_TYPE_ID_REMOTE reloc Currently, we require the debuginfo type itself must have a name. In this case, the debuginfo type is "const" which points to "struct <name>". The "const" type does not have a name, hence the above fatal error will be triggered. Let us permit "const" and "volatile" type modifiers. We skip modifiers in some other cases as well like structure member type tracing. This can aviod the above fatal error. Differential Revision: https://reviews.llvm.org/D97986	2021-03-04 16:27:23 -08:00
David Blaikie	a2a55def35	Move llvm/Analysis/ObjCARCUtil.h to IR to fix layering. This is included from IR files, and IR doesn't/can't depend on Analysis (because Analysis depends on IR). Also fix the implementation - don't use non-member static in headers, as it leads to ODR violations, inaccurate "unused function" warnings, etc. And fix the header protection macro name (we don't generally include "LIB" in the names, so far as I can tell).	2021-03-04 16:14:53 -08:00
Amara Emerson	501f6a4e9e	[AArch64][GlobalISel][RegBankSelect] Improve rbs of G_BUILD_VECTOR when fed by fp values. This is actually two changes. One is to avoid copies when fp values are fed into a build_vector, without being able to tell from the opcode. The other is that build_vectors are also marked as only defining FP, since they produce vector results. Differential Revision: https://reviews.llvm.org/D97968	2021-03-04 15:09:05 -08:00
Heejin Ahn	2b957ed4ff	[WebAssembly] Fix ExceptionInfo grouping again This is a case D97677 missed. When taking out remaining BBs that are reachable from already-taken-out exceptions (because they are not subexcptions but unwind destinations), I assumed the remaining BBs are not EH pads, but they can be. For example, ``` try { try { throw 0; } catch (int) { // (a) } } catch (int) { // (b) } try { foo(); } catch (int) { // (c) } ``` In this code, (b) is the unwind destination of (a) so its exception is taken out of (a)'s exception, But even though the next try-catch is not inside the first two-level try-catches, because the first try always throws, its continuation BB is unreachable and the whole rest of the function is dominated by EH pad (a), including EH pad (c). So after we take out of (b)'s exception out of (a)'s, we also need to take out (c)'s exception out of (a)'s, because (c) is reachable from (b). This adds one more step before what we did for remaining BBs in D97677; it traverses EH pads first to take subexceptions out of their incorrect parent exception. It's the same thing as D97677, but because we can do this before we add BBs to exceptions' sets, we don't need to fix sets and only need to fix parent exception pointers. Other changes are variable name changes (I changed `WE` -> `SrcWE`, `UnwindWE` -> `DstWE` for clarity), some comment changes, and a drive-by fix in a bug in a `LLVM_DEBUG` print statement. Fixes https://github.com/emscripten-core/emscripten/issues/13588. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D97929	2021-03-04 15:05:13 -08:00
Heejin Ahn	561abd83ff	[WebAssembly] Disable uses of __clang_call_terminate Background: Wasm EH, while using Windows EH (catchpad/cleanuppad based) IR, uses Itanium-based libraries and ABIs with some modifications. `__clang_call_terminate` is a wrapper generated in Clang's Itanium C++ ABI implementation. It contains this code, in C-style pseudocode: ``` void __clang_call_terminate(void *exn) { __cxa_begin_catch(exn); std::terminate(); } ``` So this function is a wrapper to call `__cxa_begin_catch` on the exception pointer before termination. In Itanium ABI, this function is called when another exception is thrown while processing an exception. The pointer for this second, violating exception is passed as the argument of this `__clang_call_terminate`, which calls `__cxa_begin_catch` with that pointer and calls `std::terminate` to terminate the program. The spec (https://libcxxabi.llvm.org/spec.html) for `__cxa_begin_catch` says, ``` When the personality routine encounters a termination condition, it will call __cxa_begin_catch() to mark the exception as handled and then call terminate(), which shall not return to its caller. ``` In wasm EH's Clang implementation, this function is called from cleanuppads that terminates the program, which we also call terminate pads. Cleanuppads normally don't access the thrown exception and the wasm backend converts them to `catch_all` blocks. But because we need the exception pointer in this cleanuppad, we generate `wasm.get.exception` intrinsic (which will eventually be lowered to `catch` instruction) as we do in the catchpads. But because terminate pads are cleanup pads and should run even when a foreign exception is thrown, so what we have been doing is: 1. In `WebAssemblyLateEHPrepare::ensureSingleBBTermPads()`, we make sure terminate pads are in this simple shape: ``` %exn = catch call @__clang_call_terminate(%exn) unreachable ``` 2. In `WebAssemblyHandleEHTerminatePads` pass at the end of the pipeline, we attach a `catch_all` to terminate pads, so they will be in this form: ``` %exn = catch call @__clang_call_terminate(%exn) unreachable catch_all call @std::terminate() unreachable ``` In `catch_all` part, we don't have the exception pointer, so we call `std::terminate()` directly. The reason we ran HandleEHTerminatePads at the end of the pipeline, separate from LateEHPrepare, was it was convenient to assume there was only a single `catch` part per `try` during CFGSort and CFGStackify. --- Problem: While it thinks terminate pads could have been possibly split or calls to `__clang_call_terminate` could have been duplicated, `WebAssemblyLateEHPrepare::ensureSingleBBTermPads()` assumes terminate pads contain no more than calls to `__clang_call_terminate` and `unreachable` instruction. I assumed that because in LLVM very limited forms of transformations are done to catchpads and cleanuppads to maintain the scoping structure. But it turned out to be incorrect; passes can merge cleanuppads into one, including terminate pads, as long as the new code has a correct scoping structure. One pass that does this I observed was `SimplifyCFG`, but there can be more. After this transformation, a single cleanuppad can contain any number of other instructions with the call to `__clang_call_terminate` and can span many BBs. It wouldn't be practical to duplicate all these BBs within the cleanuppad to generate the equivalent `catch_all` blocks, only with calls to `__clang_call_terminate` replaced by calls to `std::terminate`. Unless we do more complicated transformation to split those calls to `__clang_call_terminate` into a separate cleanuppad, it is tricky to solve. --- Solution (?): This CL just disables the generation and use of `__clang_call_terminate` and calls `std::terminate()` directly in its place. The possible downside of this approach can be, because the Itanium ABI intended to "mark" the violating exception handled, we don't do that anymore. What `__cxa_begin_catch` actually does is increment the exception's handler count and decrement the uncaught exception count, which in my opinion do not matter much given that we are about to terminate the program anyway. Also it does not affect info like stack traces that can be possibly shown to developers. And while we use a variant of Itanium EH ABI, we can make some deviations if we choose to; we are already different in that in the current version of the EH spec we don't support two-phase unwinding. We can possibly consider a more complicated transformation later to reenable this, but I don't think that has high priority. Changes in this CL contains: - In Clang, we don't generate a call to `wasm.get.exception()` intrinsic and `__clang_call_terminate` function in terminate pads anymore; we simply generate calls to `std::terminate()`, which is the default implementation of `CGCXXABI::emitTerminateForUnexpectedException`. - Remove `WebAssembly::ensureSingleBBTermPads() function and `WebAssemblyHandleEHTerminatePads` pass, because terminate pads are already `catch_all` now (because they don't need the exception pointer) and we don't need these transformations anymore. - Change tests to use `std::terminate` directly. Also removes tests that tested `LateEHPrepare::ensureSingleBBTermPads` and `HandleEHTerminatePads` pass. - Drive-by fix: Add some function attributes to EH intrinsic declarations Fixes https://github.com/emscripten-core/emscripten/issues/13582. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97834	2021-03-04 14:26:35 -08:00
Jay Foad	ed7458398a	[AMDGPU] Don't check for VMEM hazards on GFX10 The hazard where a VMEM reads an SGPR written by a VALU counts as a data dependency hazard, so no nops are required on GFX10. Tested with Vulkan CTS on GFX10.1 and GFX10.3. Differential Revision: https://reviews.llvm.org/D97926	2021-03-04 21:44:56 +00:00
Jinsong Ji	7967221a72	[PowerPC] Disable more extended mne on AIX To avoid assembler errors. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D97418	2021-03-04 21:13:37 +00:00
Benjamin Kramer	e897feeb8a	[PPC] Silence unused variable warning in release builds. NFC.	2021-03-04 21:43:19 +01:00
Akira Hatanaka	1900503595	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `ed4718eccb`, which was reverted because it was causing a miscompile. The bug that was causing the miscompile has been fixed in `75805dce5f`. Original commit message: Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-03-04 11:22:30 -08:00
Caroline Concatto	f2b749be15	[CostModel][SVE] Add cost model for shuffle reverse with i1 and scalable vector This patch adds the cost model for experimental.vector.reverse with scalable vector types: nxv16i1, nxv8i1, nxv4i1 and nxv2i1. These types are missing from the previous cost model patch D95603. The cost model for experimental.vector.reverse with 1 bit mask is used by loop vectorization in the patch D95363 Differential Revision: https://reviews.llvm.org/D97758	2021-03-04 18:52:59 +00:00
Sean Fertile	aaeffbe007	[PowerPC][AIX] Handle variadic vector formal arguments. Patch adds support for passing vector arguments to variadic functions. Arguments which are fixed shadow GPRs and stack space even when they are passed in vector registers, while arguments passed through ellipses are passed in(properly aligned GPRs if available and on the stack once all GPR arguments registers are consumed. Differential Revision: https://reviews.llvm.org/D97485	2021-03-04 10:56:53 -05:00
Nico Weber	e68de60bc4	Revert "AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect" This reverts commit `bf5a582650`. Also depends on now-reverted `4c8fb7ddd6`	2021-03-04 10:16:11 -05:00
Petar Avramovic	bf5a582650	AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect RegBankSelect creates zext and trunc when it selects banks for uniform i1. Add zext_trunc_fold from generic combiner to post RegBankSelect combiner. Differential Revision: https://reviews.llvm.org/D95432	2021-03-04 15:05:24 +01:00
Ayke van Laethem	a1155ae64d	[AVR] Fix lifeness issues in the AVR backend This patch is a large number of small changes that should hopefully not affect the generated machine code but are still important to get right so that the machine verifier won't complain about them. The llvm/test/CodeGen/AVR/pseudo/*.mir changes are also necessary because without the liveins the used registers are considered undefined by the machine verifier and it will complain about them. Differential Revision: https://reviews.llvm.org/D97172	2021-03-04 14:04:39 +01:00
Simon Pilgrim	7cbc5df438	[X86] X86TargetLowering::isSafeMemOpType - break if-else chain. NFCI. All if-else blocks return - fixes clang-tidy warning.	2021-03-04 12:15:08 +00:00
Stephen Tozer	d2000b45d0	Revert "[DebugInfo] Add new instruction and DIExpression operator for variadic debug values" This reverts commit `d07f106f4a`.	2021-03-04 11:59:21 +00:00
gbtozers	d07f106f4a	[DebugInfo] Add new instruction and DIExpression operator for variadic debug values This patch adds a new instruction that can represent variadic debug values, DBG_VALUE_VAR. This patch alone covers the addition of the instruction and a set of basic code changes in MachineInstr and a few adjacent areas, but does not correctly handle variadic debug values outside of these areas, nor does it generate them at any point. The new instruction is similar to the existing DBG_VALUE instruction, with the following differences: the operands are in a different order, any number of values may be used in the instruction following the Variable and Expression operands (these are referred to in code as “debug operands”) and are indexed from 0 so that getDebugOperand(X) == getOperand(X+2), and the Expression in a DBG_VALUE_VAR must use the DW_OP_LLVM_arg operator to pass arguments into the expression. The new DW_OP_LLVM_arg operator is only valid in expressions appearing in a DBG_VALUE_VAR; it takes a single argument and pushes the debug operand at the index given by the argument onto the Expression stack. For example the sub-expression `DW_OP_LLVM_arg, 0` has the meaning “Push the debug operand at index 0 onto the expression stack.” Differential Revision: https://reviews.llvm.org/D82363	2021-03-04 11:45:35 +00:00
Oliver Stannard	aac056c528	[objdump][ARM] Use correct offset when printing ARM/Thumb branch targets llvm-objdump only uses one MCInstrAnalysis object, so if ARM and Thumb code is mixed in one object, or if an object is disassembled without explicitly setting the triple to match the ISA used, then branch and call targets will be printed incorrectly. This could be fixed by creating two MCInstrAnalysis objects in llvm-objdump, like we currently do for SubtargetInfo. However, I don't think there's any reason we need two separate sub-classes of MCInstrAnalysis, so instead these can be merged into one, and the ISA determined by checking the opcode of the instruction. Differential revision: https://reviews.llvm.org/D97766	2021-03-04 11:15:57 +00:00
Andrew Savonichev	d791695cb5	[MCA] Add support for in-order CPUs This patch adds a pipeline to support in-order CPUs such as ARM Cortex-A55. In-order pipeline implements a simplified version of Dispatch, Scheduler and Execute stages as a single stage. Entry and Retire stages are common for both in-order and out-of-order pipelines. Differential Revision: https://reviews.llvm.org/D94928	2021-03-04 14:08:19 +03:00
Simon Pilgrim	1584e55a26	[X86] canonicalizeShuffleWithBinOps - handle general unaryshuffle(binop(x,c)) patterns not just xor(x,-1) Generalize the shuffle(not(x)) -> not(shuffle(x)) fold to handle any binop with 0/-1. Hopefully we can further generalize to help push target unary/binary shuffles through binops similar to what we do in DAGCombiner::visitVECTOR_SHUFFLE	2021-03-04 10:44:38 +00:00
Fraser Cormack	8e7ceffd0b	[RISCV] Fix crash when inserting large fixed-length subvectors This patch addresses a compiler crash resulting from passing a fixed-length type to one that expects scalable vector types. An assertion was added to prevent this regressing in the future. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97868	2021-03-04 09:27:16 +00:00
Fraser Cormack	d8e1d2ebf4	[RISCV] Preserve fixed-length VL on insert_vector_elt in more cases This patch fixes up one case where the fixed-length-vector VL was dropped (falling back to VLMAX) when inserting vector elements, as the code would lower via ISD::INSERT_VECTOR_ELT (at index 0) which loses the fixed-length vector information. To this end, a custom node, VMV_S_XF_VL, was introduced to carry the VL operand through to the final instruction. This node wraps the RVV vmv.s.x and vmv.s.f instructions, which were being selected by insert_vector_elt anyway. There should be no observable difference in scalable-vector codegen. There is still one outstanding drop from fixed-length VL to VLMAX, when an i64 element is inserted into a vector on RV32; the splat (which is custom legalized) has no notion of the original fixed-length vector type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97842	2021-03-04 09:21:10 +00:00
David Green	a968e7b82e	[ARM] KnownBits for CSINC/CSNEG/CSINV This adds some simple known bits handling for the three CSINC/NEG/INV instructions. From the operands known bits we can compute the common bits of the first operand and incremented/negated/inverted second operand. The first, especially CSINC ZR, ZR, comes up fair amount in the tests. The others are more rare so a unit test for them is added. Differential Revision: https://reviews.llvm.org/D97788	2021-03-04 08:40:20 +00:00
Florian Hahn	75805dce5f	[AArch64] Add implicit uses for operands when expanding BLR_RVMARKER. Make sure we preserve info about passed arguments as implicit uses, to make sure later passes still have access to this information. This fixes a mis-compile where the machine-combiner would pick an incorrect free register.	2021-03-03 21:56:05 +00:00
Jonas Paulsson	7334b3dc3e	[SystemZ] Reimplement the i8/i16 compare-and-swap logic. Even though the implementation in emitAtomicCmpSwapW() was correct, it made Valgrind report an error. Instead of using a RISBG on CmpVal, an LL[CH]R can be made on the OldVal, and the problem is avoided. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D97604	2021-03-03 14:04:32 -06:00
Florian Hahn	8c3a70a78f	[AArch64] Move CALL_RVMARKER definition after CALL. This is a NFC with respect to the generated code. But it fixes a crash when using -debug, because of the position in the enum CALL_RVMARKER nodes were treated as memops. That caused a crash when printing CALL_RVMARKER nodes.	2021-03-03 19:42:16 +00:00
Stanislav Mekhanoshin	b70c483e04	[AMDGPU] Exclude always_inline from max bb threshold Honor always_inline attribute when processing -amdgpu-inline-max-bb. It was lost during the ports of the heuristic. There is no reason to honor inline hint, but not always inline. Differential Revision: https://reviews.llvm.org/D97790	2021-03-03 10:21:56 -08:00
Simon Pilgrim	aa4afebbf9	[X86] Fold scalar_to_vector(x) -> extract_subvector(broadcast(x),0) iff broadcast(x) exists Add handling for reusing an existing broadcast(x) to a wider vector.	2021-03-03 15:50:37 +00:00
Hans Wennborg	0a5dd06718	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR" This caused miscompiles of Chromium tests for iOS due clobbering of live registers. See discussion on the code review for details. > Background: > > This fixes a longstanding problem where llvm breaks ARC's autorelease > optimization (see the link below) by separating calls from the marker > instructions or retainRV/claimRV calls. The backend changes are in > https://reviews.llvm.org/D92569. > > https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue > > What this patch does to fix the problem: > > - The front-end adds operand bundle "clang.arc.attachedcall" to calls, > which indicates the call is implicitly followed by a marker > instruction and an implicit retainRV/claimRV call that consumes the > call result. In addition, it emits a call to > @llvm.objc.clang.arc.noop.use, which consumes the call result, to > prevent the middle-end passes from changing the return type of the > called function. This is currently done only when the target is arm64 > and the optimization level is higher than -O0. > > - ARC optimizer temporarily emits retainRV/claimRV calls after the calls > with the operand bundle in the IR and removes the inserted calls after > processing the function. > > - ARC contract pass emits retainRV/claimRV calls after the call with the > operand bundle. It doesn't remove the operand bundle on the call since > the backend needs it to emit the marker instruction. The retainRV and > claimRV calls are emitted late in the pipeline to prevent optimization > passes from transforming the IR in a way that makes it harder for the > ARC middle-end passes to figure out the def-use relationship between > the call and the retainRV/claimRV calls (which is the cause of > PR31925). > > - The function inliner removes an autoreleaseRV call in the callee if > nothing in the callee prevents it from being paired up with the > retainRV/claimRV call in the caller. It then inserts a release call if > claimRV is attached to the call since autoreleaseRV+claimRV is > equivalent to a release. If it cannot find an autoreleaseRV call, it > tries to transfer the operand bundle to a function call in the callee. > This is important since the ARC optimizer can remove the autoreleaseRV > returning the callee result, which makes it impossible to pair it up > with the retainRV/claimRV call in the caller. If that fails, it simply > emits a retain call in the IR if retainRV is attached to the call and > does nothing if claimRV is attached to it. > > - SCCP refrains from replacing the return value of a call with a > constant value if the call has the operand bundle. This ensures the > call always has at least one user (the call to > @llvm.objc.clang.arc.noop.use). > > - This patch also fixes a bug in replaceUsesOfNonProtoConstant where > multiple operand bundles of the same kind were being added to a call. > > Future work: > > - Use the operand bundle on x86-64. > > - Fix the auto upgrader to convert call+retainRV/claimRV pairs into > calls with the operand bundles. > > rdar://71443534 > > Differential Revision: https://reviews.llvm.org/D92808 This reverts commit `ed4718eccb`.	2021-03-03 15:51:40 +01:00
Ayke van Laethem	15f495c0bc	[AVR] Fix def state of operands Some instructions (especially mov+pop instructions) were setting the wrong operands. For example, the pop instruction had the register set as a source operand while it is a destination operand (the value is loaded into the register). I have found these issues using the machine verifier and using manual code inspection. Differential Revision: https://reviews.llvm.org/D97159	2021-03-03 15:36:05 +01:00
Ayke van Laethem	bbfef8ac95	[AVR] Fix expansion of NEGW The previous expansion used SBCI, which is incorrect because the NEGW pseudo instruction accepts a DREGS operand (2xGPR8) and SBCI only allows LD8 registers. One solution could be to correct the NEGW pseudo instruction, but another solution is to use a different instruction (sbc) that does accept a GPR8 register and therefore allows more freedom to the register allocator. The output now matches avr-gcc for the following code: int foo(int n) { return -n; } I've found this issue using the machine instruction verifier: it was complaining about the wrong register class in NEGWRd.mir. Differential Revision: https://reviews.llvm.org/D97131	2021-03-03 15:36:05 +01:00
Ayke van Laethem	4f6d7985d4	[AVR] Add register aliases XL, YH, etc These aliases are sometimes used in assembly code and make the code more readable. They are supported by avr-gcc too. Differential Revision: https://reviews.llvm.org/D96492	2021-03-03 15:36:05 +01:00
Matt Arsenault	78dcff4841	GlobalISel: Add default implementation of assignValueToReg Refactor insertion of the asserting ops. This enables using them for AMDGPU. This code should essentially be the same for every target. Mips, X86 and ARM all have different code there now, but this seems to be an accident. The assignment functions are called with different types than they would be in the DAG, so this is all likely an assortment of hacks to get around that.	2021-03-03 09:29:53 -05:00
Piotr Sobczak	4672bac177	[AMDGPU] Introduce Strict WQM mode * Add amdgcn_strict_wqm intrinsic. * Add a corresponding STRICT_WQM machine instruction. * The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active. * The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96258	2021-03-03 14:19:16 +01:00
Piotr Sobczak	c3ce7bae80	[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm * Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm The change is done for consistency as the "strict" prefix will become an important, distinguishing factor between amdgcn_wqm and amdgcn_strictwqm in the future. The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict mode will always be enabled irrespective of control flow decisions. The amdgcn_wwm will be removed, but doing so in two steps gives users time to switch to the new name at their own pace. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96257	2021-03-03 09:33:57 +01:00
Carl Ritson	2ddac69f98	[AMDGPU] Rename llvm.amdgcn.msaa.load to llvm.amdgcn.msaa.load.x While the underlying instruction is called image_msaa_load, the resource must be x component only. Rename the intrinsic for clarity. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D97829	2021-03-03 17:30:39 +09:00
David Green	ab280cbaa3	[ARM] Ensure undef is propagated to CBZ/CBNZ flags In some rare circumstances we can be using an undef register for a compare. When folded into a CBZ/CBNZ the undef flags are lost, leading to machine verifier problems. This propagates the existing flags to the new instruction.	2021-03-03 08:02:58 +00:00
Andy Wingo	4307069df4	[WebAssembly] Swap operand order of call_indirect in text format The WebAssembly text and binary formats have different operand orders for the "type" and "table" fields of call_indirect (and return_call_indirect). In LLVM we use the binary order for the MCInstr, but when we produce or consume the text format we should use the text order. For compilation units targetting WebAssembly 1.0 (without the reference types feature), we omit the table operand entirely. Differential Revision: https://reviews.llvm.org/D97761	2021-03-03 08:51:21 +01:00
Qiu Chaofan	72d4a41ba6	[PowerPC] Allow spilling GPR to VSR on AIX This patch enables spilling GPR to VSRs instead of stack under AIX ABI. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D97367	2021-03-03 13:32:39 +08:00
Victor Huang	1756b2adc9	[AIX][TLS] Generate TLS variables in assembly files This patch allows generating TLS variables in assembly files on AIX. Initialized and external uninitialized variables are generated with the .csect pseudo-op and local uninitialized variables are generated with the .comm/.lcomm pseudo-ops. The patch also adds a check to explicitly say that TLS is not yet supported on AIX. Reviewed by: daltenty, jasonliu, lei, nemanjai, sfertile Originally patched by: bsaleil Commandeered by: NeHuang Differential Revision: https://reviews.llvm.org/D96184	2021-03-02 18:22:48 -06:00
Matt Arsenault	fd82cbcf7d	GlobalISel: Merge and cleanup more AMDGPU call lowering code This merges more AMDGPU ABI lowering code into the generic call lowering. Start cleaning up by factoring away more of the pack/unpack logic into the buildCopy{To\|From}Parts functions. These could use more improvement, and the SelectionDAG versions are significantly more complex, and we'll eventually have to emulate all of those cases too. This is mostly NFC, but does result in some minor instruction reordering. It also removes some of the limitations with mismatched sizes the old code had. However, similarly to the merge on the input, this is forcing gfx6/gfx7 to use the gfx8+ ABI (which is what we actually want, but SelectionDAG is stuck using the weird emergent ABI). This also changes the load/store size for stack passed EVTs for AArch64, which makes it consistent with the DAG behavior.	2021-03-02 17:31:13 -05:00
Heejin Ahn	4a58116b7e	[WebAssembly] Fix more ExceptionInfo grouping bugs This fixes two bugs in `WebAssemblyExceptionInfo` grouping, created by D97247. These two bugs are not easy to split into two different CLs, because tests that fail for one also tend to fail for the other. - In D97247, when fixing `ExceptionInfo` grouping by taking out the unwind destination' exception from the unwind src's exception, we just iterated the BBs in the function order, but this was incorrect; this changes it to dominator tree preorder. Please refer to the comments in the code for the reason and an example. - After this subexception-taking-out fix, there still can be remaining BBs we have to take out. When Exception B is taken out of Exception A (because EHPad B is the unwind destination of EHPad A), there can still be BBs within Exception A that are reachable from Exception B, which also should be taken out. Please refer to the comments in the code for more detailed explanation on why this can happen. To make this possible, this splits `WebAssemblyException::addBlock` into two parts: adding to a set and adding to a vector. We need to iterate on BBs within a `WebAssemblyException` to fix this, so we add BBs to sets first. But we add BBs to vectors later after we fix all incorrectness because deleting BBs from vectors is expensive. I considered removing the vector from `WebAssemblyException`, but it was not easy because this class has to maintain a similar interface with `MachineLoop` to be wrapped into a single interface `SortRegion`, which is used in CFGSort. Other misc. drive-by fixes: - Make `WebAssemblyExceptionInfo` do not even run when wasm EH is not used or the function doesn't have any EH pads, not to waste time - Add `LLVM_DEBUG` lines for easy debugging - Fix `preds` comments in cfg-stackify-eh.ll - Fix `__cxa_throw`'s signature in cfg-stackify-eh.ll Fixes https://github.com/emscripten-core/emscripten/issues/13554. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97677	2021-03-02 13:44:09 -08:00
Yonghong Song	51cdb780db	BPF: Fix a bug in peephole TRUNC elimination optimization Andrei Matei reported a llvm11 core dump for his bpf program https://bugs.llvm.org/show_bug.cgi?id=48578 The core dump happens in LiveVariables analysis phase. #4 0x00007fce54356bb0 __restore_rt #5 0x00007fce4d51785e llvm::LiveVariables::HandleVirtRegUse(unsigned int, llvm::MachineBasicBlock, llvm::MachineInstr&) #6 0x00007fce4d519abe llvm::LiveVariables::runOnInstr(llvm::MachineInstr&, llvm::SmallVectorImpl<unsigned int>&) #7 0x00007fce4d519ec6 llvm::LiveVariables::runOnBlock(llvm::MachineBasicBlock, unsigned int) #8 0x00007fce4d51a4bf llvm::LiveVariables::runOnMachineFunction(llvm::MachineFunction&) The bug can be reproduced with llvm12 and latest trunk as well. Futher analysis shows that there is a bug in BPF peephole TRUNC elimination optimization, which tries to remove unnecessary TRUNC operations (a <<= 32; a >>= 32). Specifically, the compiler did wrong transformation for the following patterns: %1 = LDW ... %2 = SLL_ri %1, 32 %3 = SRL_ri %2, 32 ... %3 ... %4 = SRA_ri %2, 32 ... %4 ... The current transformation did not check how many uses of %2 and did transformation like %1 = LDW ... ... %1 ... %4 = SRL_ri %2, 32 ... %4 ... and pseudo register %2 is used by not defined and caused LiveVariables analysis core dump. To fix the issue, when traversing back from SRL_ri to SLL_ri, check to ensure SLL_ri has only one use. Otherwise, don't do transformation. Differential Revision: https://reviews.llvm.org/D97792	2021-03-02 13:03:42 -08:00
Amara Emerson	8a316045ed	[AArch64][GlobalISel] Enable use of the optsize predicate in the selector. To do this while supporting the existing functionality in SelectionDAG of using PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis dependencies to the instruction selector pass. Then, use the predicate to generate constant pool loads for f32 materialization, if we're targeting optsize/minsize. Differential Revision: https://reviews.llvm.org/D97732	2021-03-02 12:55:51 -08:00
David Green	438c98515c	[ARM] Use 0, not ZR during ISel for CSINC/INV/NEG Instead of converting the 0 into a ZR reg during lowering, do that with tablegen by matching the zero immediate. This when combined with other optimizations is more likely to use ZR and helps keep the DAG more easily optimizable. It should not otherwise effect code generation.	2021-03-02 19:01:14 +00:00
Jonas Paulsson	52bbbf4d44	[SystemZ] Assign the full space for promoted and split outgoing args. When a large "irregular" (e.g. i96) integer call argument is converted to indirect, 64-bit parts are stored to the stack. The full stack space (e.g. i128) was not allocated prior to this patch, but rather just the exact space of the original type. This caused neighboring values on the stack to be overwritten. Thanks to Josh Stone for reporting this. Review: Ulrich Weigand Fixes https://bugs.llvm.org/show_bug.cgi?id=49322 Differential Revision: https://reviews.llvm.org/D97514	2021-03-02 12:56:47 -06:00
Joe Nash	5531f24cc2	[AMDGPU] Make OMod explicit for V_CVT_{U,I}* Make OMod explicit instead of implied by HasModifiers in the operand list. Requires explicitly setting HasOMod=1 for irregular OMod usage in instruction V_CVT_{U,I}* Reviewed By: foad Differential Revision: https://reviews.llvm.org/D97587 Change-Id: I230e1476f529e816eec60e242531f23a99e3839f	2021-03-02 13:32:06 -05:00
Fraser Cormack	c1695ddf7d	[RISCV] Support fixed-length INSERT_VECTOR_ELT This patch enables support for lowering INSERT_VECTOR_ELT on fixed-length vector types. The strategy follows that for scalable vector types. This patch also includes a quick fix to prevent the compiler infinitely looping between lowering BUILD_VECTOR as VECTOR_SHUFFLE and back again. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97698	2021-03-02 16:48:38 +00:00
Simon Pilgrim	25b788716b	[AMDGPU] Fix "initialization is never read" clang-tidy warnings. NFCI.	2021-03-02 12:06:24 +00:00
Fraser Cormack	de2b70010a	[RISCV] Lower CONCAT_VECTORS to INSERT_SUBVECTOR nodes The default expansion of CONCAT_VECTORS goes through the stack. This patch avoids that penalty by custom-lowering CONCAT_VECTORS to a series of INSERT_SUBVECTOR nodes. Futher optimizations are possible, but this is a good start. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97692	2021-03-02 11:13:59 +00:00
Benjamin Kramer	10c256ccaf	Revert "[X86] Fold shuffle(not(x),undef) -> not(shuffle(x,undef))" This reverts commit `925093d88a`. Causes an infinite loop when compiling some shuffles: $ cat bugpoint-reduced-simplified.ll target triple = "x86_64-unknown-linux-gnu" define void @foo() { entry: %0 = load i8, i8* undef, align 1 %broadcast.splatinsert = insertelement <16 x i8> poison, i8 %0, i32 0 %1 = icmp ne <16 x i8> %broadcast.splatinsert, zeroinitializer %2 = shufflevector <16 x i1> %1, <16 x i1> undef, <16 x i32> zeroinitializer %wide.load = load <16 x i8>, <16 x i8>* undef, align 1 %3 = icmp ne <16 x i8> %wide.load, zeroinitializer %4 = and <16 x i1> %3, %2 %5 = zext <16 x i1> %4 to <16 x i8> store <16 x i8> %5, <16 x i8>* undef, align 1 ret void } $ llc < bugpoint-reduced-simplified.ll <timeout>	2021-03-02 11:24:07 +01:00
Dmitry Preobrazhensky	28f164bca7	[AMDGPU][MC][GFX9+] Corrected encoding of op_sel_hi for unused operands in VOP3P Corrected encoding of VOP3P op_sel_hi for unused operands. See bug 49363. Differential Revision: https://reviews.llvm.org/D97689	2021-03-02 13:02:25 +03:00
David Green	d6ba8ecb60	[ARM] Add handling of t2LDRSB/t2LDRSH in Constant Island Pass These constant pool loads should be treated similarly to t2LDRB/t2LDRH, acting on the same offset ranges. Add handling and a simple test.	2021-03-02 08:46:07 +00:00
Stanislav Mekhanoshin	7c724a896f	[AMDGPU] Do not check max-bb for a single block callee -amdgpu-inline-max-bb option could lead to a suboptimal codegen preventing inlining of really simple functions including pure wrapper calls. Relax the cutoff by allowing to call a function with a single block on the grounds that it will not increase total number of blocks after inlining. Differential Revision: https://reviews.llvm.org/D97744	2021-03-01 19:48:50 -08:00
Jian Cai	c35105055e	[ARM] support symbolic expressions as branch target in b.w Currently ARM backend validates the range of branch targets before the layout of fragments is finalized. This causes build failure if symbolic expressions are used, with the exception of a single symbolic value. For example, "b.w ." works but "b.w . + 2" currently fails to assemble. This fixes the issue by delaying this check (in ARMAsmParser::validateInstruction) of b.w instructions until the symbol expressions are resolved (in ARMAsmBackend::adjustFixupValue). Link: https://github.com/ClangBuiltLinux/linux/issues/1286 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D97568	2021-03-01 17:41:35 -08:00
Jessica Paquette	3e8223b165	[AArch64][GlobalISel] NFC: Remove dead G_BUILD_VECTOR legalization rule Remove a rule which allows larger scalar types than the destination vector element type. This appears to be irrelevant now that we have G_BUILD_VECTOR_TRUNC. Plus, making a G_BUILD_VECTOR which satisfies this introduces a verifier failure anyway. Differential Revision: https://reviews.llvm.org/D97727	2021-03-01 14:04:40 -08:00
David Green	e880f8b88a	[ARM] Rename pass to MVETPAndVPTOptimisationsPass This pass has for a while performed Tail predication as well as VPT block optimizations. Rename the pass to make that clear.	2021-03-01 21:57:19 +00:00
Amara Emerson	b783aa8979	[AArch64] Fix emitting an AdrpAddLdr LOH when there's a potential clobber of the def of the adrp before the ldr. Apparently this pass used to have liveness analysis but it was removed for scompile time reasons. This workaround prevents the LOH from being emitted unless the ADD and LDR are adjacent. Fixes https://github.com/JuliaLang/julia/issues/39820 Differential Revision: https://reviews.llvm.org/D97571	2021-03-01 13:52:57 -08:00
Anirudh Prasad	5cb417527c	[SystemZ] Introduce distinction between the jg/jl family of mnemonics for GNU as vs HLASM - This patch adds in the distinction between jg[] and jl[] pc-relative mnemonics based on the variant/dialect. - Under the hlasm variant, we use the jl[] family of mnemonics and under the att (GNU as) variant, we use the jg[] family of mnemonics. - jgnop which was added in https://reviews.llvm.org/D92185, is now restricted to att variant. jlnop is introduced and restricted to hlasm variant. - The br[]l additional mnemonics are mapped to either jl[]/jg[*] based on the variant. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D97581	2021-03-01 16:36:07 -05:00

1 2 3 4 5 ...

61829 Commits