llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Simon Pilgrim	9ea54ac9ce	[X86] X86ISelDAGToDAG.cpp - use auto for all values derived from cast/dyn_cast (style). NFC.	2022-08-08 14:35:06 +01:00
Kazu Hirata	a2d4501718	[llvm] Fix comment typos (NFC)	2022-08-07 00:16:14 -07:00
Craig Topper	91e8079cd5	[X86] Teach PostprocessISelDAG to fold ANDrm+TESTrr when chain result is used. The isOnlyUserOf prevented the fold if the chain result had any users. What we really care about is the the data result from the AND is only used by the TEST, and the flags results from the ANDs aren't used at all. It's ok if the chain has users, we just need to replace those users with the chain from the TESTrm. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D131117	2022-08-03 21:00:22 -07:00
Kazu Hirata	129b531c9c	[llvm] Use value_or instead of getValueOr (NFC)	2022-06-18 23:07:11 -07:00
Kazu Hirata	7c987bb4d9	[X86] Use default member initialization (NFC) Identified with modernize-use-default-member-init.	2022-06-18 12:05:34 -07:00
Guillaume Chatelet	0788186182	[Alignment][NFC] Remove usage of MemSDNode::getAlignment I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times. Differential Revision: https://reviews.llvm.org/D126910	2022-06-07 13:52:20 +00:00
Jonas Paulsson	46f83caebc	[InlineAsm] Add support for address operands ("p"). This patch adds support for inline assembly address operands using the "p" constraint on X86 and SystemZ. This was in fact broken on X86 (see example at https://reviews.llvm.org/D110267, Nov 23). These operands should probably be treated the same as memory operands by CodeGenPrepare, which have been commented with "TODO" there. Review: Xiang Zhang and Ulrich Weigand Differential Revision: https://reviews.llvm.org/D122220	2022-04-13 12:50:21 +02:00
Simon Pilgrim	76cd11f303	[DAG] Add llvm::isMinSignedConstant helper. NFC Pulled out of D122754	2022-04-01 17:47:34 +01:00
Simon Pilgrim	c64f37f818	[X86] matchAddressRecursively - add XOR(X, MIN_SIGNED_VALUE) handling Allows us to fold XOR(X, MIN_SIGNED_VALUE) == ADD(X, MIN_SIGNED_VALUE) into LEA patterns As mentioned on PR52267. Differential Revision: https://reviews.llvm.org/D122815	2022-04-01 17:26:29 +01:00
Craig Topper	4b28980772	[X86] Simplify the interface to getCondNoFromDesc. Instead of taking a SkipDefs parameter, rename to getCondSrcNoFromDesc and have it return the source operand number. Make getCondFromMI responsible for adding the number of Defs for MI instructions. While there remove some unneeded casts to unsigned and check for negative numbers instead of explicitly -1. Less than 0 is easier for a compiler to codegen. Differential Revision: https://reviews.llvm.org/D122113	2022-03-20 22:41:39 -07:00
Shengchen Kan	cb26730aaa	[X86][NFC] Unify implementations of getting condition code	2022-03-21 11:31:16 +08:00
Shengchen Kan	076a9dc99a	[X86][NFC] Rename hasCMOV() to canUseCMOV(), hasLAHFSAHF() to canUseLAHFSAHF() To make them less like other feature functions. This is a follow-up patch for D121978.	2022-03-20 12:00:25 +08:00
Shengchen Kan	920c2e5763	[X86][NFC] Rename target feature hasCMov->hasCMOV This is a follow-up patch for D121975.	2022-03-18 14:05:52 +08:00
Sanjay Patel	67e9151096	[x86] try harder to use shift instead of test if it can save some immediate bytes We favor 'and' and 'test' in earlier phases of optimization, and that's usually the better option, but we can save a few instruction bytes by converting a mask constant to a shift here. Differential Revision: https://reviews.llvm.org/D121147	2022-03-17 09:10:57 -04:00
Sanjay Patel	83413bb617	[x86] reduce indentation; NFC We may be able to refine the conditions for these transforms ( D120648 ).	2022-03-16 13:39:02 -04:00
Matthias Braun	84ef62126a	X86ISelDAGToDAG: Transform TEST + MOV64ri to SHR + TEST Optimize a pattern where a sequence of 8/16 or 32 bits is tested for zero: LLVM normalizes this towards and `AND` with mask which is usually good, but does not work well on X86 when the mask does not fit into a 64bit register. This DagToDAG peephole transforms sequences like: ``` movabsq $562941363486720, %rax # imm = 0x1FFFE00000000 testq %rax, %rdi ``` to ``` shrq $33, %rdi testw %di, %di ``` The result has a shorter encoding and saves a register if the tested value isn't used otherwise. Differential Revision: https://reviews.llvm.org/D121320	2022-03-15 14:18:04 -07:00
Sanjay Patel	9fce696110	[x86] reduce code duplication for select of X86ISD::CMP; NFC	2022-03-07 15:14:20 -05:00
Simon Pilgrim	3f22a4962d	[X86] selectLEAAddr - add X86ISD::SMUL/UMULO handling After D118128 relaxed the heuristic to require only one EFLAGS generating operand, it now makes sense to avoid X86ISD::SMUL/UMULO duplication as well. Differential Revision: https://reviews.llvm.org/D119578	2022-02-17 13:51:02 +00:00
Simon Pilgrim	0b00cd19e6	[X86] selectLEAAddr - relax heuristic to only require one operand to be a MathWithFlags op (PR46809) As suggested by @craig.topper, relaxing LEA matching to only require the ADD to be fed from a single op with EFLAGS helps avoid duplication when the EFLAGS are consumed in a later, dependent instruction. There was some concern about whether the heuristic is too simple, not taking into account lost loads that can't fold by using a LEA, but some basic tests (included in select-lea.ll) don't suggest that's really a problem. Differential Revision: https://reviews.llvm.org/D118128	2022-02-08 15:09:22 +00:00
Sanjay Patel	be059a1263	[x86] avoid compile-time warning for parens; NFC	2022-02-07 16:59:50 -05:00
Sanjay Patel	40a50f8701	[x86] avoid false dependency stall on 'sbb' with same source reg This is effectively inverting the transform added with D116804 because the downside of the false dependency of something like "sbb %eax, %eax" is much greater than the upside of eliminating a zeroing instruction on (all?) Intel CPUs. Differential Revision: https://reviews.llvm.org/D118843	2022-02-07 10:12:12 -05:00
Sanjay Patel	f523e83b20	[x86] make helper function to create sbb with zero operands; NFC As noted in D116804, we want to effectively invert that patch for CPUs (intel) that don't break the false dependency on sbb %eax, %eax So we will likely want to create that here in the X86DAGToDAGISel::Select() case for X86::SETCC_CARRY.	2022-02-02 16:56:10 -05:00
Kazu Hirata	f3a344d212	[Target] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-06 22:01:44 -08:00
Kazu Hirata	e5947760c2	Revert "[llvm] Remove redundant member initialization (NFC)" This reverts commit `fd4808887e`. This patch causes gcc to issue a lot of warnings like: warning: base class ‘class llvm::MCParsedAsmOperand’ should be explicitly initialized in the copy constructor [-Wextra]	2022-01-03 11:28:47 -08:00
Kazu Hirata	fd4808887e	[llvm] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-01 16:18:18 -08:00
Kazu Hirata	efa896e5f7	[Target] Use SDNode::uses (NFC)	2021-11-12 21:23:04 -08:00
Simon Pilgrim	fd485d8cda	[X86][AVX] Prefer VINSERTF128 over VPERM2F128 for 128->256 subvector concatenations The VINSERTF128 instruction is often much quicker, and never slower, than the more general VPERM2F128 instruction, so we should try to use that in more circumstances. This requires a fallback to a commuted VPERM2F128 for the case where we need to fold the 256-bit vector source instead of the 128-bit subvector source. There is one interesting side effect - DAGCombine's narrowExtractedVectorLoad combine gets called in a number of locations, this often creates an extracted subvector load without regard to other uses of the original wider load. I'm expecting AVX cpus to be capable of merging such aliased loads, but I do wonder whether narrowExtractedVectorLoad's call to X86TargetLowering::shouldReduceLoadWidth needs to be altered to check for more partial uses? Noticed while investigating the quality of interleaved load/store codegen. Differential Revision: https://reviews.llvm.org/D111960	2021-11-01 10:45:50 +00:00
Sanjay Patel	285b8abce4	[x86] limit vector increment fold to allow load folding The tests are based on the example from: https://llvm.org/PR52032 I suspect that it looks worse than it actually is. :) That is, llvm-mca says there's no uop/timing difference with the load folding and pcmpeq vs. broadcast on Haswell (and probably other targets). The load-folding definitely makes the code smaller, so it's good for that at least. So this requires carving a narrow hole in the transform to get just this case without changing others that look good as-is (in other words, the transform still seems good for most examples). Differential Revision: https://reviews.llvm.org/D112464	2021-10-29 15:48:35 -04:00
Phoebe Wang	eb55c1f153	[X86][NFC] Add the missed `break;` for `79f9dfef0d`	2021-10-27 13:58:31 +08:00
Phoebe Wang	79f9dfef0d	[X86] Move splat addends from the gather/scatter index operand to the base address This can avoid a vector add and a constant pool load. Or an explicit broadcast in case of non-constant. Also reverse the transform any time we encounter a constant index addend that can't be moved to base. In that case pull the constant from base into the index. This reduces code size needed for the displacement since we needed the index add anyway. Limit this to scale of 1 to avoid divisibility and wrap issues. Authored by Craig. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111595	2021-10-26 12:35:57 +08:00
Jay Foad	a9bceb2b05	[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC. Stop using APInt constructors and methods that were soft-deprecated in D109483. This fixes all the uses I found in llvm, except for the APInt unit tests which should still test the deprecated methods. Differential Revision: https://reviews.llvm.org/D110807	2021-10-04 08:57:44 +01:00
Roman Lebedev	0852f8706b	[X86] X86DAGToDAGISel::matchBitExtract(): support 'num high bits to clear' pattern Currently, we only deal with the case where we can match the number of low bits to be kept, i.e.: ``` x & ((1 << y) - 1) ``` will extract low `y` bits of `x`. But what will ``` x & (-1 >> y) ``` do? Logically, it will extract `bitwidth(x) - y` low bits, i.e.: ``` x & ~(-1 << (bitwidth(x)-y)) ``` ... except we can't do such a transformation in IR in general, because if we wanted to extract all the bits `(-1 >> 0)` is fine, but `-1 << bitwidth(x)` would be `poison`: https://alive2.llvm.org/ce/z/BKJZfw, Yet, here with BMI's BEXTR and BMI2's BZHI we don't have any such problems with edge-cases. So what we can do is: https://alive2.llvm.org/ce/z/gm5M2B As briefly discussed with @craig.topper, this appears to be not worse than what we'd end up with currently (a pair of shifts): * https://godbolt.org/z/nsPb8bejs (direct data dependency, sequential execution) * https://godbolt.org/z/7bj3zeh1d (no direct data dependency, parallel execution) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107923	2021-09-08 19:27:08 +03:00
Craig Topper	da3ef8b756	[X86] Handle inverted inputs when matching VPTERNLOG from 2 binary ops. This is a more general version of D109273. Though it doesn't peek through bitcasts or rearange broadcasts. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D109295	2021-09-06 17:44:52 -07:00
Wang, Pengfei	6f7f5b54c8	[X86] AVX512FP16 instructions enabling 1/6 1. Enable FP16 type support and basic declarations used by following patches. 2. Enable new instructions VMOVW and VMOVSH. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105263	2021-08-10 12:46:01 +08:00
Craig Topper	cc6d302c91	[X86] Fix a bug in TEST with immediate creation This code tries to form a TEST from CMP+AND with an optional truncate in between. If we looked through the truncate, we may have extra bits in the AND mask that shouldn't participate in the checks. Normally SimplifyDemendedBits takes care of this, but the AND may have another user. So manually mask out any extra bits. Fixes PR51175. Differential Revision: https://reviews.llvm.org/D106634	2021-07-23 09:03:53 -07:00
Craig Topper	0f3bc00a7d	[X86] Simplify part of the isel for X86ISD::FCMP/STRICT_FCMP/STRICT_FCMPS. We don't need to have the compare output a value and then copy it to FPSW for use by FNSTSW. Instead we can just have the compare output Glue and glue the FNSTSW to it. InstrEmitter effectively performed this optimization when emitting the Machine IR. Doing it directly simplifies the codes and reduces the work in InstrEmitter. There's no change in the machine IR at the end of isel before and after this change.	2021-06-25 11:39:01 -07:00
Bing1 Yu	56d5c46b49	[X86] Support __tile_stream_loadd intrinsic for new AMX interface Adding support for __tile_stream_loadd intrinsic. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D103784	2021-06-11 17:28:43 +08:00
Craig Topper	3c0735c6d8	[X86] Call insertDAGNode on trunc/zext created in tryShiftAmountMod. This puts the new nodes in the proper place in the topologically sorted list of nodes. Fixes PR50431, which was introduced recently in D101944.	2021-05-24 10:23:22 -07:00
Roman Lebedev	5f78ba001c	[X86][Codegen] Shift amount mod: sh? i64 x, (32-y) --> sh? i64 x, -(y+32) I've seen this in the RawSpeed's BitPumpMSB*::push() hotpath, after fixing the buffer abstraction to a more sane one, when looking into a +5% runtime regression. I was hoping that this would fix it, but it does not look it does. This seems to be at least not worse than the original pattern. But i'm actually mainly interested in the case where we already compute `(y+32)` (see last test), https://alive2.llvm.org/ce/z/ZCzJio Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D101944	2021-05-11 19:39:41 +03:00
Simon Pilgrim	759b97e55a	[X86] Replace repeated isa/cast<ConstantSDNode> calls with single single dyn_cast<>. NFCI. Noticed while looking at D101944	2021-05-11 14:18:45 +01:00
Harald van Dijk	1b788607f5	[X32][CET] Fix handling of indirect branches As X32 uses 32-bit pointers without having 32-bit indirect branch instructions, we need to fix up indirect branches by extending the branch targets to 64 bits. This was already done for BRIND but not yet for NT_BRIND. The same logic works for both, so this applies that existing logic to NT_BRIND as well. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D101499	2021-04-29 08:33:22 +01:00
Liu, Chen3	b70e02a7e7	[X86][NFC] Move instruction selection of the x86_tdpb[s,u]d_internal and x86_tilezero_internal to X86InstrAMX.td Differential Revision: https://reviews.llvm.org/D97997	2021-03-09 21:27:39 +08:00
Simon Pilgrim	87d5b34c24	[X86] X86ISelDAGToDAG.cpp - include cstdint instead of stdint.h NFCI. Fixes clang-tidy warning	2021-03-05 15:58:20 +00:00
Simon Pilgrim	f11f86c114	[X86] X86DAGToDAGISel::Select - merge X86::TEST load bitsize checks. NFCI.	2021-03-05 15:58:20 +00:00
Liu, Chen3	4bc7c8631a	[X86] Support amx-bf16 intrinsic. Adding support for intrinsics of AMX-BF16. This patch alse fix a bug that AMX-INT8 instructions will be selected with wrong predicate. Differential Revision: https://reviews.llvm.org/D97358	2021-02-25 09:06:48 +08:00
Liu, Chen3	f8b9035aae	[X86] Support amx-int8 intrinsic. Adding support for intrinsics of TDPBSUD/TDPBUSD/TDPBUUD. Differential Revision: https://reviews.llvm.org/D97259	2021-02-23 17:08:05 +08:00
Wang, Pengfei	a5d9e0c79b	[X86] Fix tile config register spill issue. This is an optimized approach for D94155. Previous code build the model that tile config register is the user of each AMX instruction. There is a problem for the tile config register spill. When across function, the ldtilecfg instruction may be inserted on each AMX instruction which use tile config register. This cause all tile data register clobber. To fix this issue, we remove the model of tile config register. Instead, we analyze the AMX instructions between one call to another. We will insert ldtilecfg after the first call if we find any AMX instructions. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D95136	2021-01-30 12:53:57 +08:00
Craig Topper	74784a5aa4	[X86] In shrinkAndImmediate, place the new constant into the topological sort. Revert the change to use APInt::isSignedIntN from `5ff5cf8e05`. Its clear that the games we were playing to avoid the topological sort aren't working. So just fix it once and for all. Fixes PR48888.	2021-01-26 13:18:04 -08:00
Luo, Yuanke	64132f541e	Revert "[X86][AMX] Fix tile config register spill issue." This reverts commit `20013d02f3`.	2021-01-21 18:11:43 +08:00

1 2 3 4 5 ...

1052 Commits