Commit Graph

22349 Commits

Author SHA1 Message Date
Simon Pilgrim ada6bcc13f [X86] X86tcret_1reg - use cast<> instead of dyn_cast<> to avoid dereference of nullptr
The pointer is always dereferenced, so assert the cast is correct instead of returning nullptr
2022-02-17 11:54:12 +00:00
Jessica Paquette 6d58f4ab07 [MachineOutliner] NFC: Hide LRU-related stuff behind helper functions
It's not particularly user-friendly to have to call `initLRU` everywhere. Also,
it wasn't particularly great that the LRU for registers used in a sequence was
also initialized by `initLRU`.

This patch hides this stuff behind some helper functions:

* `isAvailableAcrossAndOutOfSeq`
* `isAnyUnavailableAcrossOrOutOfSeq`
* `isAvailableInsideSeq`

This allows the user to avoid calling `initLRU` explicitly. Also, it allows
us to separate initializing the used-in-sequence LRU from the main LRU.

Since both ARM and AArch64 check LR liveness in `insertOutlinedCall`, this
refactor requires that we de-const the Candidate there.

Some other quality-of-code improvements:

* LRUs in outliner::Candidate now have more descriptive names
* Use `Register` instead of `unsigned` in some places
* Improve readability in some places by using ranges rather than `std::for_each`

This is a preparatory commit for a larger compile time related change for the
AArch64 outliner.
2022-02-16 11:39:07 -08:00
Shao-Ce SUN 2aed07e96c [NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`
Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D119846
2022-02-16 13:10:09 +08:00
Shao-Ce SUN 9cc49c1951 Revert "[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`"
This reverts commit fe25c06cc5.
2022-02-16 11:57:49 +08:00
Shao-Ce SUN fe25c06cc5 [NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`
For ten years, it seems that `MCRegisterInfo` is not used by any target.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D119846
2022-02-16 11:47:17 +08:00
Simon Pilgrim 2808743cbd [X86] LowerVSETCC - always split 512-bit vectors before lowering to PCMPEQ/GT (PR53842)
Extend the existing split where we already do this for v32i16/v64i8

We can end up trying to use PCMPEQ/GT if the result needs to be sign-extended (typically due to the DAGCombiner::foldSextSetcc fold).

Fixes #53842
2022-02-15 14:21:12 +00:00
Markus Böck 78c27a3cee [X86][Win64] Avoid statepoints in trailing call position
The "avoid trailing call pass" makes sure that no function ends with a call instruction for the purpose of the unwinder.
It starts of by skipping over any non real instruction, which is approximated via the Pseudo and Meta property. This sadly leads to issues when the last machine instruction is a STATEPOINT, as it is skipped despite it lowering to a call.

This patch fixes the use of a statepoint in the trailing call position by making sure call instructions are not skipped.

Differential Revision: https://reviews.llvm.org/D119644
2022-02-15 12:17:19 +01:00
Simon Pilgrim 890beda4e1 [X86] combineArithReduction - pull out (near) duplicate v4i8/v8i8 widening code. NFC. 2022-02-13 21:02:50 +00:00
Sanjay Patel c486b82cfb [x86] try harder to scalarize a vector load with extracted integer op uses
This is a retry of b4b97ec813 - that was reverted because it
could cause miscompiles by illegally reordering memory operations.
A new test based on #53695 is added here to verify we do not have
that same problem.

extract_vec_elt (load X), C --> scalar load (X+C)

As noted in the comment, DAGCombiner has this fold -- and the code in this
patch is adapted from DAGCombiner::scalarizeExtractedVectorLoad() -- but
x86 should benefit even if the loaded vector has other uses as long as we
apply some other x86-specific conditions. The motivating example from #50310
is shown in vec_int_to_fp.ll.

Fixes #50310
Fixes #53695

Differential Revision: https://reviews.llvm.org/D118376
2022-02-13 08:32:21 -05:00
Phoebe Wang 2aa732a918 [X86][MS] Fix the wrong alignment of vector variable arguments on Win32
D108887 fixed alignment mismatch by changing the caller's alignment in
ABI. However, we found some cases that still assume the alignment is
vector size. This patch fixes them to avoid the runtime crash.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D114536
2022-02-13 10:23:18 +08:00
Simon Pilgrim 9c55b0e121 [X86] LowerFunnelShift - enable v16i16 support 2022-02-12 17:04:59 +00:00
Simon Pilgrim a4ed0c2f03 [X86] combineAndnp - if an input has a zero (after inversion for Op0) in a vector element, then we don't demand that bit/element in the other input
Similar to what we already perform in combineAnd
2022-02-12 16:49:05 +00:00
Simon Pilgrim 1f43367377 [X86] getTargetVShiftNode - Fix Wparentheses gcc warning. 2022-02-12 16:37:24 +00:00
Simon Pilgrim 6320c3e77c [X86] combineAndnp - pull out repeated operands. NFC. 2022-02-12 16:35:24 +00:00
Simon Pilgrim dcf465731d [X86] combineAnd - add SimplifyMultipleUseDemandedBits handling to masked vector element analysis
Extend the existing fold to use SimplifyMultipleUseDemandedBits as well as SimplifyDemandedVectorElts/SimplifyDemandedBits when attempting to simplify based off known zero vector elements.
2022-02-12 15:30:53 +00:00
Simon Pilgrim 1e1b60138c [X86] Improve uniform funnelshift/rotation amount handling
To find uniform shift/rotation amounts, we currently use SelectionDAG::getSplatValue which creates a node that extracts the scalar value from the source vector, this makes it more difficult for later combines to remove the extraction and stay on the SIMD unit, and can be a problem when the scalar type is illegal (i.e. i64 vs v2i64 on 32-bit targets).

This patch begins to use SelectionDAG::getSplatSourceVector (which SelectionDAG::getSplatValue uses internally) and adds a new variant of getTargetVShiftNode that takes the source vector and the splat index, and adjusts the vector in place to create the zero-extended value suitable for the SSE PSLL/PSRL/PSRA uniform instructions.

I'm still addressing a number of regressions when used for normal vector shifts, so I've just handled the funnelshift/rotation lowering for this first patch. I can then focus on the yak shaving (SimplifyDemandedBits/Elts in particular) necessary to always use SelectionDAG::getSplatSourceVector.

Differential Revision: https://reviews.llvm.org/D119090
2022-02-12 14:46:30 +00:00
Simon Pilgrim 37cf7275cd [X86] Enable vector splitting of ISD::AVGCEILU nodes on AVX1 and non-BWI targets 2022-02-12 14:04:55 +00:00
David Green f810b40c3b [X86] Replace X86ISD::AVG with generic ISD::AVGCEILU
Pulled out of D106237, this replaces the X86ISD::AVG DAG node with the
generic ISD::AVGCEILU. It doesn't remove the detectAVGPattern method,
but the extra generic ISel matching does alter the existing test.

Differential Revision: https://reviews.llvm.org/D119073
2022-02-11 18:57:18 +00:00
Simon Pilgrim 20af71f8ec [X86] combineVSelectToBLENDV - handle vselect(vXi1,A,B) -> blendv(sext(vXi1),A,B)
For pre-AVX512 targets, attempt to sign-extend a vXi1 condition mask to pass to a X86ISD::BLENDV node

Fixes Issue #53760
2022-02-11 18:38:17 +00:00
Simon Pilgrim 48e1434a0a [X86] Move combineToExtendBoolVectorInReg before the select combines. NFC.
Avoid the need for a forward declaration.

Cleanup prep for Issue #53760
2022-02-11 16:51:46 +00:00
Simon Pilgrim 827d0c51be [X86] combineToExtendBoolVectorInReg - use explicit arguments. NFC.
Replace the *_EXTEND node with the raw operands, this will make it easier to use combineToExtendBoolVectorInReg for any boolvec extension combine.

Cleanup prep for Issue #53760
2022-02-11 16:40:29 +00:00
Bill Wendling 74aa44a887 [X86] Zero out the 32-bit GPRs explicitly
This should ensure that only the 32-bit xors are emitted, and not the
64-bit xors.

Differential Revision: https://reviews.llvm.org/D119523
2022-02-10 23:09:00 -08:00
Yuanfang Chen f927021410 Reland "[clang-cl] Support the /JMC flag"
This relands commit b380a31de0.

Restrict the tests to Windows only since the flag symbol hash depends on
system-dependent path normalization.
2022-02-10 15:16:17 -08:00
Yuanfang Chen b380a31de0 Revert "[clang-cl] Support the /JMC flag"
This reverts commit bd3a1de683.

Break bots:
https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8822587673277278177/overview
2022-02-10 14:17:37 -08:00
Simon Pilgrim 8c82d42e97 [TTI][X86] Pull out repeated getSizeInBits() calls. NFC. 2022-02-10 18:58:32 +00:00
Yuanfang Chen bd3a1de683 [clang-cl] Support the /JMC flag
The introduction and some examples are on this page:
https://devblogs.microsoft.com/cppblog/announcing-jmc-stepping-in-visual-studio/

The `/JMC` flag enables these instrumentations:
- Insert at the beginning of every function immediately after the prologue with
  a call to `void __fastcall __CheckForDebuggerJustMyCode(unsigned char *JMC_flag)`.
  The argument for `__CheckForDebuggerJustMyCode` is the address of a boolean
  global variable (the global variable is initialized to 1) with the name
  convention `__<hash>_<filename>`. All such global variables are placed in
  the `.msvcjmc` section.
- The `<hash>` part of `__<hash>_<filename>` has a one-to-one mapping
  with a directory path. MSVC uses some unknown hashing function. Here I
  used DJB.
- Add a dummy/empty COMDAT function `__JustMyCode_Default`.
- Add `/alternatename:__CheckForDebuggerJustMyCode=__JustMyCode_Default` link
  option via ".drectve" section. This is to prevent failure in
  case `__CheckForDebuggerJustMyCode` is not provided during linking.

Implementation:
All the instrumentations are implemented in an IR codegen pass. The pass is placed immediately before CodeGenPrepare pass. This is to not interfere with mid-end optimizations and make the instrumentation target-independent (I'm still working on an ELF port in a separate patch).

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D118428
2022-02-10 10:26:30 -08:00
Simon Pilgrim e95fc20f04 [X86] getFMA3OpcodeToCommuteOperands - use unreachable to detect fma3 format mismatch
Matches what we do in getThreeSrcCommuteCase.

Fixes static analyzer out of bounds array access warning.
2022-02-10 17:14:39 +00:00
Jeremy Morse 662799c851 [DebugInfo][InstrRef] Avoid duplicate instruction numbers in x86-lea-fixup
This new-ish LEA-fixup code path creates two substitutions for an
instruction number -- this is incorrect because each Value should be
replaced by a single replacement Value. Fix by deleting the duplicate
substitution. Add some test coverage for this path with debug-info
attached.

Differential Revision: https://reviews.llvm.org/D119232
2022-02-10 16:36:50 +00:00
Reid Kleckner f3481f43bb [X86] Only force FP usage in the presence of pushf/popf on Win64
This ensures that the Windows unwinder will work at every instruction
boundary, and allows other targets to read and write flags without
setting up a frame pointer.

Fixes GH-46875

Differential Revision: https://reviews.llvm.org/D119391
2022-02-09 18:23:16 -08:00
Tong Zhang 2fe315162e [X86] TCRETURNmi fix for 32bit platform
This fix is similar to 3cf3ffce240e("Fix the TCRETURNmi64 bug differently.")

after allocating register for index+base, we will only have one register left

This bug affects linux kernel compilation for x86 target. Error happens when compiling kmod_si476x_core.

clang complains:
    error: ran out of registers during register allocation

The full command is:

clang -Wp,-MMD,drivers/mfd/.si476x-cmd.o.d  -nostdinc -isystem /opt/toolchain/main/lib/clang/14.0.0/include -I./arch/x86/include -I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -Qunused-arguments -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu89 -no-integrated-as --prefix=/usr/bin/ -Werror=unknown-warning-option -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=none -m32 -msoft-float -mregparm=3 -freg-struct-return -fno-pic -mstack-alignment=4 -march=atom -mtune=atom -mtune=generic -Wa,-mtune=generic32 -ffreestanding -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -Wno-frame-address -Wno-address-of-packed-member -O2 -Wframe-larger-than=1024 -fno-stack-protector -Wno-format-invalid-specifier -Wno-gnu -mno-global-merge -Wno-unused-but-set-variable -Wno-unused-const-variable -fomit-frame-pointer -ftrivial-auto-var-init=pattern -fno-stack-clash-protection -falign-functions=32 -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-array-bounds -fno-strict-overflow -fno-stack-check -Werror=date-time -Werror=incompatible-pointer-types -Wno-initializer-overrides -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-pointer-to-enum-cast -Wno-tautological-constant-out-of-range-compare     -DKBUILD_MODFILE='"drivers/mfd/si476x-core"' -DKBUILD_BASENAME='"si476x_cmd"' -DKBUILD_MODNAME='"si476x_core"' -D__KBUILD_MODNAME=kmod_si476x_core -c -o drivers/mfd/si476x-cmd.o drivers/mfd/si476x-cmd.c

-------------

LLVM cannot compile the following code for x86 32bit target, the reason is tail call(TCRETURNmi) is using 2 registers for index+base and we want to use more than one registers for passing function args and that is impossible.
This fix is similar to 3cf3ffce240e("Fix the TCRETURNmi64 bug differently.").
We will only use tail call when it is using <=1 registers for passing args.

```
struct BIG_PARM {
	int ver;
};

static struct {
	int (*foo)  (struct BIG_PARM* a, void *b);
	int (*bar)  (struct BIG_PARM* a);
	int (*zoo0) (void);
	int (*zoo1) (void);
	int (*zoo2) (void);
	int (*zoo3) (void);
	int (*zoo4) (void);
} vtable[] = {
	[0] = {
		.foo = (int (*)(struct BIG_PARM* a, void *b))0xdeadbeef,
	},
};

int something(struct BIG_PARM *a, void* b) {
	return vtable[a->ver].foo(a,b);
}

```

```
$ clang -std=gnu89 -m32 -mregparm=3 -mtune=generic -fno-strict-overflow -O2 -c t0.c -o t0.c.o
error: ran out of registers during register allocation
1 error generated.
```

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D118312
2022-02-09 20:34:04 +08:00
Tim Northover 8366e182d5 Revert "X86: gate all vmovsh instructions on FP16 support."
This reverts commit 3fc40b6e66.

It was pushed unintentionally.
2022-02-09 12:33:23 +00:00
Tim Northover 3fc40b6e66 X86: gate all vmovsh instructions on FP16 support.
Previously the `let Predicates = ...` line only applied to the rr version, and
so VMOVSH was being emitted whenever HasAVX512 (the default) applied. This is
not right.
2022-02-09 12:29:16 +00:00
serge-sans-paille ef736a1c39 Cleanup LLVMMC headers
There's a few relevant forward declarations in there that may require downstream
adding explicit includes:

llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h
llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h
llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h

Counting preprocessed lines required to rebuild llvm-project on my setup:
before: 1052436830
after:  1049293745

Which is significant and backs up the change in addition to the usual benefits of
decreasing coupling between headers and compilation units.

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119244
2022-02-09 11:09:17 +01:00
Bill Wendling d295a53a92 [X86] Specify Undef for the registers we xor
Fixes expensive check failures from D110869.
2022-02-09 02:06:12 -08:00
Bill Wendling deaf22bc0e [X86] Implement -fzero-call-used-regs option
The "-fzero-call-used-regs" option tells the compiler to zero out
certain registers before the function returns. It's also available as a
function attribute: zero_call_used_regs.

The two upper categories are:

  - "used": Zero out used registers.
  - "all": Zero out all registers, whether used or not.

The individual options are:

  - "skip": Don't zero out any registers. This is the default.
  - "used": Zero out all used registers.
  - "used-arg": Zero out used registers that are used for arguments.
  - "used-gpr": Zero out used registers that are GPRs.
  - "used-gpr-arg": Zero out used GPRs that are used as arguments.
  - "all": Zero out all registers.
  - "all-arg": Zero out all registers used for arguments.
  - "all-gpr": Zero out all GPRs.
  - "all-gpr-arg": Zero out all GPRs used for arguments.

This is used to help mitigate Return-Oriented Programming exploits.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D110869
2022-02-08 17:42:54 -08:00
Craig Topper 56d6ccd4cb [X86] Update register RCL/RCR by 1 and immediate scheduling for Intel CPUs
Most Intel CPU scheduler files lumped the immediate and 1 instructions
together, but uops.info shows they are quite different.

For the most part the by 1 instructions were pretty accurate to the uops.info
data except the latency was 3 instead of 2 as uops.info indicates.

The by immediate instructions need 7 or 8 uops and have higher latency.

It looks like the 8-bit by immediate instructions may need even more
uops, but I just lumped them with the 16/32/64.

Noticed while checking out PR53648. So mostly I cared about the by 1
instructions.

Reviewed By: RKSimon, pengfei

Differential Revision: https://reviews.llvm.org/D119217
2022-02-08 09:20:20 -08:00
Simon Pilgrim 0b00cd19e6 [X86] selectLEAAddr - relax heuristic to only require one operand to be a MathWithFlags op (PR46809)
As suggested by @craig.topper, relaxing LEA matching to only require the ADD to be fed from a single op with EFLAGS helps avoid duplication when the EFLAGS are consumed in a later, dependent instruction.

There was some concern about whether the heuristic is too simple, not taking into account lost loads that can't fold by using a LEA, but some basic tests (included in select-lea.ll) don't suggest that's really a problem.

Differential Revision: https://reviews.llvm.org/D118128
2022-02-08 15:09:22 +00:00
Sanjay Patel a68e098024 [SDAG] move x86 select-with-identity-constant fold behind a target hook; NFC
This is no-functional-change-intended because only the
x86 target enables the TLI hook currently.

We can add fmul/fdiv opcodes to the switch similar to the
proposal D119111, but we don't need to make other changes
like enabling target-specific combines.

We can also add integer opcodes (add, or, shl, etc.) to
the switch because this function is called from all of the
generic binary opcodes.

The goal is to incrementally enable the profitable diffs
from D90113 while avoiding regressions.

Differential Revision: https://reviews.llvm.org/D119150
2022-02-08 09:55:05 -05:00
Simon Pilgrim fd2bb51f1e [ADT] Add APInt/MathExtras isShiftedMask variant returning mask offset/length
In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bitmask.

This patch adds variants of APInt::isShiftedMask, isShiftedMask_32 and isShiftedMask_64 that return these values as additional arguments.

I've updated a number of cases that were either performing seperate size/position calculations or had created their own local wrapper versions of these.

Differential Revision: https://reviews.llvm.org/D119019
2022-02-08 12:04:13 +00:00
Sanjay Patel be059a1263 [x86] avoid compile-time warning for parens; NFC 2022-02-07 16:59:50 -05:00
Sanjay Patel 40a50f8701 [x86] avoid false dependency stall on 'sbb' with same source reg
This is effectively inverting the transform added with D116804
because the downside of the false dependency of something like
"sbb %eax, %eax" is much greater than the upside of eliminating
a zeroing instruction on (all?) Intel CPUs.

Differential Revision: https://reviews.llvm.org/D118843
2022-02-07 10:12:12 -05:00
Simon Pilgrim d7be2bff16 [X86] combineShiftRightArithmetic - break if-else chain as they all return (style). NFC. 2022-02-07 09:54:34 +00:00
Kazu Hirata 3a3cb929ab [llvm] Use = default (NFC) 2022-02-06 22:18:35 -08:00
Simon Pilgrim 74b98ab1db [X86] Fold ZERO_EXTEND_VECTOR_INREG(BUILD_VECTOR(X,Y,?,?)) -> BUILD_VECTOR(X,0,Y,0)
Helps avoid some unnecessary shift by splat amount extensions before shuffle combining gets limited by with one use checks
2022-02-06 12:53:11 +00:00
Phoebe Wang 0b7669f333 [X86] Introduce more common modern tunings into `generic`
GCC has updated its generic `-mtune` to haswell. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
Update it to match with GCC.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D118534
2022-02-05 10:31:30 +08:00
Sanjay Patel fff3e1dbaa [x86] enable fast sqrtss/sqrtps tuning for AMD Zen cores
As discussed in D118534, all of the recent AMD CPUs have
relatively fast (<14 cycle latency) "sqrtss" and "sqrtps"
instructions:
https://uops.info/table.html?search=sqrtps&cb_lat=on&cb_tp=on&cb_SNB=on&cb_SKL=on&cb_ZENp=on&cb_ZEN2=on&cb_ZEN3=on&cb_measurements=on&cb_avx=on&cb_sse=on

So we should set this tuning flag to alter codegen of plain
"sqrt(X)" expansion (as opposed to reciprocal-sqrt - there
is other test coverage for that pattern). The expansion is
both slower and less accurate than the hardware instruction.

Differential Revision: https://reviews.llvm.org/D119001
2022-02-04 13:59:20 -05:00
Sanjay Patel 7b03725097 Revert "[x86] try harder to scalarize a vector load with extracted integer op uses"
This reverts commit b4b97ec813.

As discussed in post-commit feedback at:
https://reviews.llvm.org/D118376
...there's a stage 2 failure on a Mac running a clang-refactor tool test.
2022-02-04 07:45:57 -05:00
Simon Pilgrim ea7a3e6a6a [X86] simplifyX86varShift - use KnownBits.getMaxValue().ult() to check for out of bounds shift amounts
This is easier to grok than MaskedValueIsZero for high bits.
2022-02-03 16:02:45 +00:00
Fangrui Song de88c1aba2 [asan][X86] Change some std::string variables to StringRef. NFC 2022-02-02 16:34:35 -08:00
Sanjay Patel f523e83b20 [x86] make helper function to create sbb with zero operands; NFC
As noted in D116804, we want to effectively invert that patch
for CPUs (intel) that don't break the false dependency on
sbb %eax, %eax

So we will likely want to create that here in the
X86DAGToDAGISel::Select() case for X86::SETCC_CARRY.
2022-02-02 16:56:10 -05:00