llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	ada6bcc13f	[X86] X86tcret_1reg - use cast<> instead of dyn_cast<> to avoid dereference of nullptr The pointer is always dereferenced, so assert the cast is correct instead of returning nullptr	2022-02-17 11:54:12 +00:00
Jessica Paquette	6d58f4ab07	[MachineOutliner] NFC: Hide LRU-related stuff behind helper functions It's not particularly user-friendly to have to call `initLRU` everywhere. Also, it wasn't particularly great that the LRU for registers used in a sequence was also initialized by `initLRU`. This patch hides this stuff behind some helper functions: * `isAvailableAcrossAndOutOfSeq` * `isAnyUnavailableAcrossOrOutOfSeq` * `isAvailableInsideSeq` This allows the user to avoid calling `initLRU` explicitly. Also, it allows us to separate initializing the used-in-sequence LRU from the main LRU. Since both ARM and AArch64 check LR liveness in `insertOutlinedCall`, this refactor requires that we de-const the Candidate there. Some other quality-of-code improvements: * LRUs in outliner::Candidate now have more descriptive names * Use `Register` instead of `unsigned` in some places * Improve readability in some places by using ranges rather than `std::for_each` This is a preparatory commit for a larger compile time related change for the AArch64 outliner.	2022-02-16 11:39:07 -08:00
Shao-Ce SUN	2aed07e96c	[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter` Reviewed By: skan Differential Revision: https://reviews.llvm.org/D119846	2022-02-16 13:10:09 +08:00
Shao-Ce SUN	9cc49c1951	Revert "[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`" This reverts commit `fe25c06cc5`.	2022-02-16 11:57:49 +08:00
Shao-Ce SUN	fe25c06cc5	[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter` For ten years, it seems that `MCRegisterInfo` is not used by any target. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D119846	2022-02-16 11:47:17 +08:00
Simon Pilgrim	2808743cbd	[X86] LowerVSETCC - always split 512-bit vectors before lowering to PCMPEQ/GT (PR53842) Extend the existing split where we already do this for v32i16/v64i8 We can end up trying to use PCMPEQ/GT if the result needs to be sign-extended (typically due to the DAGCombiner::foldSextSetcc fold). Fixes #53842	2022-02-15 14:21:12 +00:00
Markus Böck	78c27a3cee	[X86][Win64] Avoid statepoints in trailing call position The "avoid trailing call pass" makes sure that no function ends with a call instruction for the purpose of the unwinder. It starts of by skipping over any non real instruction, which is approximated via the Pseudo and Meta property. This sadly leads to issues when the last machine instruction is a STATEPOINT, as it is skipped despite it lowering to a call. This patch fixes the use of a statepoint in the trailing call position by making sure call instructions are not skipped. Differential Revision: https://reviews.llvm.org/D119644	2022-02-15 12:17:19 +01:00
Simon Pilgrim	890beda4e1	[X86] combineArithReduction - pull out (near) duplicate v4i8/v8i8 widening code. NFC.	2022-02-13 21:02:50 +00:00
Sanjay Patel	c486b82cfb	[x86] try harder to scalarize a vector load with extracted integer op uses This is a retry of `b4b97ec813` - that was reverted because it could cause miscompiles by illegally reordering memory operations. A new test based on #53695 is added here to verify we do not have that same problem. extract_vec_elt (load X), C --> scalar load (X+C) As noted in the comment, DAGCombiner has this fold -- and the code in this patch is adapted from DAGCombiner::scalarizeExtractedVectorLoad() -- but x86 should benefit even if the loaded vector has other uses as long as we apply some other x86-specific conditions. The motivating example from #50310 is shown in vec_int_to_fp.ll. Fixes #50310 Fixes #53695 Differential Revision: https://reviews.llvm.org/D118376	2022-02-13 08:32:21 -05:00
Phoebe Wang	2aa732a918	[X86][MS] Fix the wrong alignment of vector variable arguments on Win32 D108887 fixed alignment mismatch by changing the caller's alignment in ABI. However, we found some cases that still assume the alignment is vector size. This patch fixes them to avoid the runtime crash. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D114536	2022-02-13 10:23:18 +08:00
Simon Pilgrim	9c55b0e121	[X86] LowerFunnelShift - enable v16i16 support	2022-02-12 17:04:59 +00:00
Simon Pilgrim	a4ed0c2f03	[X86] combineAndnp - if an input has a zero (after inversion for Op0) in a vector element, then we don't demand that bit/element in the other input Similar to what we already perform in combineAnd	2022-02-12 16:49:05 +00:00
Simon Pilgrim	1f43367377	[X86] getTargetVShiftNode - Fix Wparentheses gcc warning.	2022-02-12 16:37:24 +00:00
Simon Pilgrim	6320c3e77c	[X86] combineAndnp - pull out repeated operands. NFC.	2022-02-12 16:35:24 +00:00
Simon Pilgrim	dcf465731d	[X86] combineAnd - add SimplifyMultipleUseDemandedBits handling to masked vector element analysis Extend the existing fold to use SimplifyMultipleUseDemandedBits as well as SimplifyDemandedVectorElts/SimplifyDemandedBits when attempting to simplify based off known zero vector elements.	2022-02-12 15:30:53 +00:00
Simon Pilgrim	1e1b60138c	[X86] Improve uniform funnelshift/rotation amount handling To find uniform shift/rotation amounts, we currently use SelectionDAG::getSplatValue which creates a node that extracts the scalar value from the source vector, this makes it more difficult for later combines to remove the extraction and stay on the SIMD unit, and can be a problem when the scalar type is illegal (i.e. i64 vs v2i64 on 32-bit targets). This patch begins to use SelectionDAG::getSplatSourceVector (which SelectionDAG::getSplatValue uses internally) and adds a new variant of getTargetVShiftNode that takes the source vector and the splat index, and adjusts the vector in place to create the zero-extended value suitable for the SSE PSLL/PSRL/PSRA uniform instructions. I'm still addressing a number of regressions when used for normal vector shifts, so I've just handled the funnelshift/rotation lowering for this first patch. I can then focus on the yak shaving (SimplifyDemandedBits/Elts in particular) necessary to always use SelectionDAG::getSplatSourceVector. Differential Revision: https://reviews.llvm.org/D119090	2022-02-12 14:46:30 +00:00
Simon Pilgrim	37cf7275cd	[X86] Enable vector splitting of ISD::AVGCEILU nodes on AVX1 and non-BWI targets	2022-02-12 14:04:55 +00:00
David Green	f810b40c3b	[X86] Replace X86ISD::AVG with generic ISD::AVGCEILU Pulled out of D106237, this replaces the X86ISD::AVG DAG node with the generic ISD::AVGCEILU. It doesn't remove the detectAVGPattern method, but the extra generic ISel matching does alter the existing test. Differential Revision: https://reviews.llvm.org/D119073	2022-02-11 18:57:18 +00:00
Simon Pilgrim	20af71f8ec	[X86] combineVSelectToBLENDV - handle vselect(vXi1,A,B) -> blendv(sext(vXi1),A,B) For pre-AVX512 targets, attempt to sign-extend a vXi1 condition mask to pass to a X86ISD::BLENDV node Fixes Issue #53760	2022-02-11 18:38:17 +00:00
Simon Pilgrim	48e1434a0a	[X86] Move combineToExtendBoolVectorInReg before the select combines. NFC. Avoid the need for a forward declaration. Cleanup prep for Issue #53760	2022-02-11 16:51:46 +00:00
Simon Pilgrim	827d0c51be	[X86] combineToExtendBoolVectorInReg - use explicit arguments. NFC. Replace the *_EXTEND node with the raw operands, this will make it easier to use combineToExtendBoolVectorInReg for any boolvec extension combine. Cleanup prep for Issue #53760	2022-02-11 16:40:29 +00:00
Bill Wendling	74aa44a887	[X86] Zero out the 32-bit GPRs explicitly This should ensure that only the 32-bit xors are emitted, and not the 64-bit xors. Differential Revision: https://reviews.llvm.org/D119523	2022-02-10 23:09:00 -08:00
Yuanfang Chen	f927021410	Reland "[clang-cl] Support the /JMC flag" This relands commit `b380a31de0`. Restrict the tests to Windows only since the flag symbol hash depends on system-dependent path normalization.	2022-02-10 15:16:17 -08:00
Yuanfang Chen	b380a31de0	Revert "[clang-cl] Support the /JMC flag" This reverts commit `bd3a1de683`. Break bots: https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8822587673277278177/overview	2022-02-10 14:17:37 -08:00
Simon Pilgrim	8c82d42e97	[TTI][X86] Pull out repeated getSizeInBits() calls. NFC.	2022-02-10 18:58:32 +00:00
Yuanfang Chen	bd3a1de683	[clang-cl] Support the /JMC flag The introduction and some examples are on this page: https://devblogs.microsoft.com/cppblog/announcing-jmc-stepping-in-visual-studio/ The `/JMC` flag enables these instrumentations: - Insert at the beginning of every function immediately after the prologue with a call to `void __fastcall __CheckForDebuggerJustMyCode(unsigned char *JMC_flag)`. The argument for `__CheckForDebuggerJustMyCode` is the address of a boolean global variable (the global variable is initialized to 1) with the name convention `__<hash>_<filename>`. All such global variables are placed in the `.msvcjmc` section. - The `<hash>` part of `__<hash>_<filename>` has a one-to-one mapping with a directory path. MSVC uses some unknown hashing function. Here I used DJB. - Add a dummy/empty COMDAT function `__JustMyCode_Default`. - Add `/alternatename:__CheckForDebuggerJustMyCode=__JustMyCode_Default` link option via ".drectve" section. This is to prevent failure in case `__CheckForDebuggerJustMyCode` is not provided during linking. Implementation: All the instrumentations are implemented in an IR codegen pass. The pass is placed immediately before CodeGenPrepare pass. This is to not interfere with mid-end optimizations and make the instrumentation target-independent (I'm still working on an ELF port in a separate patch). Reviewed By: hans Differential Revision: https://reviews.llvm.org/D118428	2022-02-10 10:26:30 -08:00
Simon Pilgrim	e95fc20f04	[X86] getFMA3OpcodeToCommuteOperands - use unreachable to detect fma3 format mismatch Matches what we do in getThreeSrcCommuteCase. Fixes static analyzer out of bounds array access warning.	2022-02-10 17:14:39 +00:00
Jeremy Morse	662799c851	[DebugInfo][InstrRef] Avoid duplicate instruction numbers in x86-lea-fixup This new-ish LEA-fixup code path creates two substitutions for an instruction number -- this is incorrect because each Value should be replaced by a single replacement Value. Fix by deleting the duplicate substitution. Add some test coverage for this path with debug-info attached. Differential Revision: https://reviews.llvm.org/D119232	2022-02-10 16:36:50 +00:00
Reid Kleckner	f3481f43bb	[X86] Only force FP usage in the presence of pushf/popf on Win64 This ensures that the Windows unwinder will work at every instruction boundary, and allows other targets to read and write flags without setting up a frame pointer. Fixes GH-46875 Differential Revision: https://reviews.llvm.org/D119391	2022-02-09 18:23:16 -08:00
Tong Zhang	2fe315162e	[X86] TCRETURNmi fix for 32bit platform This fix is similar to 3cf3ffce240e("Fix the TCRETURNmi64 bug differently.") after allocating register for index+base, we will only have one register left This bug affects linux kernel compilation for x86 target. Error happens when compiling kmod_si476x_core. clang complains: error: ran out of registers during register allocation The full command is: clang -Wp,-MMD,drivers/mfd/.si476x-cmd.o.d -nostdinc -isystem /opt/toolchain/main/lib/clang/14.0.0/include -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -Qunused-arguments -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu89 -no-integrated-as --prefix=/usr/bin/ -Werror=unknown-warning-option -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=none -m32 -msoft-float -mregparm=3 -freg-struct-return -fno-pic -mstack-alignment=4 -march=atom -mtune=atom -mtune=generic -Wa,-mtune=generic32 -ffreestanding -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -Wno-frame-address -Wno-address-of-packed-member -O2 -Wframe-larger-than=1024 -fno-stack-protector -Wno-format-invalid-specifier -Wno-gnu -mno-global-merge -Wno-unused-but-set-variable -Wno-unused-const-variable -fomit-frame-pointer -ftrivial-auto-var-init=pattern -fno-stack-clash-protection -falign-functions=32 -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-array-bounds -fno-strict-overflow -fno-stack-check -Werror=date-time -Werror=incompatible-pointer-types -Wno-initializer-overrides -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-pointer-to-enum-cast -Wno-tautological-constant-out-of-range-compare -DKBUILD_MODFILE='"drivers/mfd/si476x-core"' -DKBUILD_BASENAME='"si476x_cmd"' -DKBUILD_MODNAME='"si476x_core"' -D__KBUILD_MODNAME=kmod_si476x_core -c -o drivers/mfd/si476x-cmd.o drivers/mfd/si476x-cmd.c ------------- LLVM cannot compile the following code for x86 32bit target, the reason is tail call(TCRETURNmi) is using 2 registers for index+base and we want to use more than one registers for passing function args and that is impossible. This fix is similar to 3cf3ffce240e("Fix the TCRETURNmi64 bug differently."). We will only use tail call when it is using <=1 registers for passing args. ``` struct BIG_PARM { int ver; }; static struct { int (foo) (struct BIG_PARM a, void b); int (bar) (struct BIG_PARM* a); int (zoo0) (void); int (zoo1) (void); int (zoo2) (void); int (zoo3) (void); int (zoo4) (void); } vtable[] = { [0] = { .foo = (int ()(struct BIG_PARM* a, void b))0xdeadbeef, }, }; int something(struct BIG_PARM a, void* b) { return vtable[a->ver].foo(a,b); } ``` ``` $ clang -std=gnu89 -m32 -mregparm=3 -mtune=generic -fno-strict-overflow -O2 -c t0.c -o t0.c.o error: ran out of registers during register allocation 1 error generated. ``` Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D118312	2022-02-09 20:34:04 +08:00
Tim Northover	8366e182d5	Revert "X86: gate all vmovsh instructions on FP16 support." This reverts commit `3fc40b6e66`. It was pushed unintentionally.	2022-02-09 12:33:23 +00:00
Tim Northover	3fc40b6e66	X86: gate all vmovsh instructions on FP16 support. Previously the `let Predicates = ...` line only applied to the rr version, and so VMOVSH was being emitted whenever HasAVX512 (the default) applied. This is not right.	2022-02-09 12:29:16 +00:00
serge-sans-paille	ef736a1c39	Cleanup LLVMMC headers There's a few relevant forward declarations in there that may require downstream adding explicit includes: llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h Counting preprocessed lines required to rebuild llvm-project on my setup: before: 1052436830 after: 1049293745 Which is significant and backs up the change in addition to the usual benefits of decreasing coupling between headers and compilation units. Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D119244	2022-02-09 11:09:17 +01:00
Bill Wendling	d295a53a92	[X86] Specify Undef for the registers we xor Fixes expensive check failures from D110869.	2022-02-09 02:06:12 -08:00
Bill Wendling	deaf22bc0e	[X86] Implement -fzero-call-used-regs option The "-fzero-call-used-regs" option tells the compiler to zero out certain registers before the function returns. It's also available as a function attribute: zero_call_used_regs. The two upper categories are: - "used": Zero out used registers. - "all": Zero out all registers, whether used or not. The individual options are: - "skip": Don't zero out any registers. This is the default. - "used": Zero out all used registers. - "used-arg": Zero out used registers that are used for arguments. - "used-gpr": Zero out used registers that are GPRs. - "used-gpr-arg": Zero out used GPRs that are used as arguments. - "all": Zero out all registers. - "all-arg": Zero out all registers used for arguments. - "all-gpr": Zero out all GPRs. - "all-gpr-arg": Zero out all GPRs used for arguments. This is used to help mitigate Return-Oriented Programming exploits. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D110869	2022-02-08 17:42:54 -08:00
Craig Topper	56d6ccd4cb	[X86] Update register RCL/RCR by 1 and immediate scheduling for Intel CPUs Most Intel CPU scheduler files lumped the immediate and 1 instructions together, but uops.info shows they are quite different. For the most part the by 1 instructions were pretty accurate to the uops.info data except the latency was 3 instead of 2 as uops.info indicates. The by immediate instructions need 7 or 8 uops and have higher latency. It looks like the 8-bit by immediate instructions may need even more uops, but I just lumped them with the 16/32/64. Noticed while checking out PR53648. So mostly I cared about the by 1 instructions. Reviewed By: RKSimon, pengfei Differential Revision: https://reviews.llvm.org/D119217	2022-02-08 09:20:20 -08:00
Simon Pilgrim	0b00cd19e6	[X86] selectLEAAddr - relax heuristic to only require one operand to be a MathWithFlags op (PR46809) As suggested by @craig.topper, relaxing LEA matching to only require the ADD to be fed from a single op with EFLAGS helps avoid duplication when the EFLAGS are consumed in a later, dependent instruction. There was some concern about whether the heuristic is too simple, not taking into account lost loads that can't fold by using a LEA, but some basic tests (included in select-lea.ll) don't suggest that's really a problem. Differential Revision: https://reviews.llvm.org/D118128	2022-02-08 15:09:22 +00:00
Sanjay Patel	a68e098024	[SDAG] move x86 select-with-identity-constant fold behind a target hook; NFC This is no-functional-change-intended because only the x86 target enables the TLI hook currently. We can add fmul/fdiv opcodes to the switch similar to the proposal D119111, but we don't need to make other changes like enabling target-specific combines. We can also add integer opcodes (add, or, shl, etc.) to the switch because this function is called from all of the generic binary opcodes. The goal is to incrementally enable the profitable diffs from D90113 while avoiding regressions. Differential Revision: https://reviews.llvm.org/D119150	2022-02-08 09:55:05 -05:00
Simon Pilgrim	fd2bb51f1e	[ADT] Add APInt/MathExtras isShiftedMask variant returning mask offset/length In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bitmask. This patch adds variants of APInt::isShiftedMask, isShiftedMask_32 and isShiftedMask_64 that return these values as additional arguments. I've updated a number of cases that were either performing seperate size/position calculations or had created their own local wrapper versions of these. Differential Revision: https://reviews.llvm.org/D119019	2022-02-08 12:04:13 +00:00
Sanjay Patel	be059a1263	[x86] avoid compile-time warning for parens; NFC	2022-02-07 16:59:50 -05:00
Sanjay Patel	40a50f8701	[x86] avoid false dependency stall on 'sbb' with same source reg This is effectively inverting the transform added with D116804 because the downside of the false dependency of something like "sbb %eax, %eax" is much greater than the upside of eliminating a zeroing instruction on (all?) Intel CPUs. Differential Revision: https://reviews.llvm.org/D118843	2022-02-07 10:12:12 -05:00
Simon Pilgrim	d7be2bff16	[X86] combineShiftRightArithmetic - break if-else chain as they all return (style). NFC.	2022-02-07 09:54:34 +00:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
Simon Pilgrim	74b98ab1db	[X86] Fold ZERO_EXTEND_VECTOR_INREG(BUILD_VECTOR(X,Y,?,?)) -> BUILD_VECTOR(X,0,Y,0) Helps avoid some unnecessary shift by splat amount extensions before shuffle combining gets limited by with one use checks	2022-02-06 12:53:11 +00:00
Phoebe Wang	0b7669f333	[X86] Introduce more common modern tunings into `generic` GCC has updated its generic `-mtune` to haswell. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 Update it to match with GCC. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D118534	2022-02-05 10:31:30 +08:00
Sanjay Patel	fff3e1dbaa	[x86] enable fast sqrtss/sqrtps tuning for AMD Zen cores As discussed in D118534, all of the recent AMD CPUs have relatively fast (<14 cycle latency) "sqrtss" and "sqrtps" instructions: https://uops.info/table.html?search=sqrtps&cb_lat=on&cb_tp=on&cb_SNB=on&cb_SKL=on&cb_ZENp=on&cb_ZEN2=on&cb_ZEN3=on&cb_measurements=on&cb_avx=on&cb_sse=on So we should set this tuning flag to alter codegen of plain "sqrt(X)" expansion (as opposed to reciprocal-sqrt - there is other test coverage for that pattern). The expansion is both slower and less accurate than the hardware instruction. Differential Revision: https://reviews.llvm.org/D119001	2022-02-04 13:59:20 -05:00
Sanjay Patel	7b03725097	Revert "[x86] try harder to scalarize a vector load with extracted integer op uses" This reverts commit `b4b97ec813`. As discussed in post-commit feedback at: https://reviews.llvm.org/D118376 ...there's a stage 2 failure on a Mac running a clang-refactor tool test.	2022-02-04 07:45:57 -05:00
Simon Pilgrim	ea7a3e6a6a	[X86] simplifyX86varShift - use KnownBits.getMaxValue().ult() to check for out of bounds shift amounts This is easier to grok than MaskedValueIsZero for high bits.	2022-02-03 16:02:45 +00:00
Fangrui Song	de88c1aba2	[asan][X86] Change some std::string variables to StringRef. NFC	2022-02-02 16:34:35 -08:00
Sanjay Patel	f523e83b20	[x86] make helper function to create sbb with zero operands; NFC As noted in D116804, we want to effectively invert that patch for CPUs (intel) that don't break the false dependency on sbb %eax, %eax So we will likely want to create that here in the X86DAGToDAGISel::Select() case for X86::SETCC_CARRY.	2022-02-02 16:56:10 -05:00

1 2 3 4 5 ...

22349 Commits