Commit Graph

1018 Commits

Author SHA1 Message Date
Wang, Pengfei 6f7f5b54c8 [X86] AVX512FP16 instructions enabling 1/6
1. Enable FP16 type support and basic declarations used by following patches.
2. Enable new instructions VMOVW and VMOVSH.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105263
2021-08-10 12:46:01 +08:00
Craig Topper cc6d302c91 [X86] Fix a bug in TEST with immediate creation
This code tries to form a TEST from CMP+AND with an optional
truncate in between. If we looked through the truncate, we may
have extra bits in the AND mask that shouldn't participate in
the checks. Normally SimplifyDemendedBits takes care of this, but
the AND may have another user. So manually mask out any extra bits.

Fixes PR51175.

Differential Revision: https://reviews.llvm.org/D106634
2021-07-23 09:03:53 -07:00
Craig Topper 0f3bc00a7d [X86] Simplify part of the isel for X86ISD::FCMP/STRICT_FCMP/STRICT_FCMPS.
We don't need to have the compare output a value and then copy it
to FPSW for use by FNSTSW. Instead we can just have the compare
output Glue and glue the FNSTSW to it. InstrEmitter effectively
performed this optimization when emitting the Machine IR. Doing
it directly simplifies the codes and reduces the work in
InstrEmitter. There's no change in the machine IR at the end of
isel before and after this change.
2021-06-25 11:39:01 -07:00
Bing1 Yu 56d5c46b49 [X86] Support __tile_stream_loadd intrinsic for new AMX interface
Adding support for __tile_stream_loadd intrinsic.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D103784
2021-06-11 17:28:43 +08:00
Craig Topper 3c0735c6d8 [X86] Call insertDAGNode on trunc/zext created in tryShiftAmountMod.
This puts the new nodes in the proper place in the topologically
sorted list of nodes.

Fixes PR50431, which was introduced recently in D101944.
2021-05-24 10:23:22 -07:00
Roman Lebedev 5f78ba001c
[X86][Codegen] Shift amount mod: sh? i64 x, (32-y) --> sh? i64 x, -(y+32)
I've seen this in the RawSpeed's BitPumpMSB*::push() hotpath,
after fixing the buffer abstraction to a more sane one,
when looking into a +5% runtime regression.
I was hoping that this would fix it, but it does not look it does.

This seems to be at least not worse than the original pattern.
But i'm actually mainly interested in the case where we already
compute `(y+32)` (see last test),

https://alive2.llvm.org/ce/z/ZCzJio

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101944
2021-05-11 19:39:41 +03:00
Simon Pilgrim 759b97e55a [X86] Replace repeated isa/cast<ConstantSDNode> calls with single single dyn_cast<>. NFCI.
Noticed while looking at D101944
2021-05-11 14:18:45 +01:00
Harald van Dijk 1b788607f5
[X32][CET] Fix handling of indirect branches
As X32 uses 32-bit pointers without having 32-bit indirect branch
instructions, we need to fix up indirect branches by extending the
branch targets to 64 bits. This was already done for BRIND but not yet
for NT_BRIND. The same logic works for both, so this applies that
existing logic to NT_BRIND as well.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D101499
2021-04-29 08:33:22 +01:00
Liu, Chen3 b70e02a7e7 [X86][NFC] Move instruction selection of the x86_tdpb[s,u]d_internal and x86_tilezero_internal to X86InstrAMX.td
Differential Revision: https://reviews.llvm.org/D97997
2021-03-09 21:27:39 +08:00
Simon Pilgrim 87d5b34c24 [X86] X86ISelDAGToDAG.cpp - include cstdint instead of stdint.h NFCI.
Fixes clang-tidy warning
2021-03-05 15:58:20 +00:00
Simon Pilgrim f11f86c114 [X86] X86DAGToDAGISel::Select - merge X86::TEST load bitsize checks. NFCI. 2021-03-05 15:58:20 +00:00
Liu, Chen3 4bc7c8631a [X86] Support amx-bf16 intrinsic.
Adding support for intrinsics of AMX-BF16.
This patch alse fix a bug that AMX-INT8 instructions will be selected with wrong
predicate.

Differential Revision: https://reviews.llvm.org/D97358
2021-02-25 09:06:48 +08:00
Liu, Chen3 f8b9035aae [X86] Support amx-int8 intrinsic.
Adding support for intrinsics of TDPBSUD/TDPBUSD/TDPBUUD.

Differential Revision: https://reviews.llvm.org/D97259
2021-02-23 17:08:05 +08:00
Wang, Pengfei a5d9e0c79b [X86] Fix tile config register spill issue.
This is an optimized approach for D94155.

Previous code build the model that tile config register is the user of
each AMX instruction. There is a problem for the tile config register
spill. When across function, the ldtilecfg instruction may be inserted
on each AMX instruction which use tile config register. This cause all
tile data register clobber.

To fix this issue, we remove the model of tile config register. Instead,
we analyze the AMX instructions between one call to another. We will
insert ldtilecfg after the first call if we find any AMX instructions.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D95136
2021-01-30 12:53:57 +08:00
Craig Topper 74784a5aa4 [X86] In shrinkAndImmediate, place the new constant into the topological sort.
Revert the change to use APInt::isSignedIntN from
5ff5cf8e05.

Its clear that the games we were playing to avoid the topological
sort aren't working. So just fix it once and for all.

Fixes PR48888.
2021-01-26 13:18:04 -08:00
Luo, Yuanke 64132f541e Revert "[X86][AMX] Fix tile config register spill issue."
This reverts commit 20013d02f3.
2021-01-21 18:11:43 +08:00
Luo, Yuanke 20013d02f3 [X86][AMX] Fix tile config register spill issue.
Previous code build the model that tile config register is the user of
each AMX instruction. There is a problem for the tile config register
spill. When across function, the ldtilecfg instruction may be inserted
on each AMX instruction which use tile config register. This cause all
tile data register clobber.
To fix this issue, we remove the model of tile config register. We
analyze the regmask of call instruction and insert ldtilecfg if there is
any tile data register live across the call. Inserting the sttilecfg
before the call is unneccessary, because the tile config doesn't change
and we can just reload the config.
Besides we also need check tile config register interference. Since we
don't model the config register we should check interference from the
ldtilecfg to each tile data register def.
             ldtilecfg
             /       \
            BB1      BB2
            /         \
           call       BB3
           /           \
       %1=tileload   %2=tilezero
We can start from the instruction of each tile def, and backward to
ldtilecfg. If there is any call instruction, and tile data register is
not preserved, we should insert ldtilecfg after the call instruction.

Differential Revision: https://reviews.llvm.org/D94155
2021-01-21 16:01:50 +08:00
Luo, Yuanke 08665b1805 Support tilezero intrinsic and c interface for AMX.
Differential Revision: https://reviews.llvm.org/D92837
2020-12-31 13:24:57 +08:00
Luo, Yuanke 981a0bd858 [X86] Add x86_amx type for intel AMX.
The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when
it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it
is used by load/store instruction. So amx intrinsics only operate on type x86_amx.
It can help to separate amx intrinsics from llvm IR instructions (+-*/).
Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981.

Differential Revision: https://reviews.llvm.org/D91927
2020-12-30 13:52:13 +08:00
Harald van Dijk 09d0e7a7c1
[X86] Avoid %fs:(%eax) references in x32 mode
The ABI explains that %fs:(%eax) zero-extends %eax to 64 bits, and adds
that the TLS base address, but that the TLS base address need not be
at the start of the TLS block, TLS references may use negative offsets.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D93158
2020-12-16 22:39:57 +00:00
Luo, Yuanke e52bc1d2bb [X86] Add chain in ISel for x86_tdpbssd_internal intrinsic. 2020-12-12 21:14:38 +08:00
Luo, Yuanke f80b29878b [X86] AMX programming model.
This patch implements amx programming model that discussed in llvm-dev
 (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html).
 Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet.
 This patch implemeted 7 components.

1. The c interface to end user.
2. The AMX intrinsics in LLVM IR.
3. Transform load/store <256 x i32> to AMX intrinsics or split the
   type into two <128 x i32>.
4. The Lowering from AMX intrinsics to AMX pseudo instruction.
5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx
   intruction.
6. The register allocation for tile register.
7. Morph AMX pseudo instruction to AMX real instruction.

Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0

Differential Revision: https://reviews.llvm.org/D87981
2020-12-10 17:01:54 +08:00
Craig Topper 5ff5cf8e05 [X86] Use APInt::isSignedIntN instead of isIntN for 64-bit ANDs in X86DAGToDAGISel::IsProfitableToFold
Pretty sure we meant to be checking signed 32 immediates here
rather than unsigned 32 bit. I suspect I messed this up because
in MathExtras.h we have isIntN and isUIntN so isIntN differs in
signedness depending on whether you're using APInt or plain integers.

This fixes a case where we didn't fold a constant created
by shrinkAndImmediate. Since shrinkAndImmediate doesn't topologically
sort constants it creates, we can fail to convert the Constant
to a TargetConstant. This leads to very strange behavior later.

Fixes PR48458.
2020-12-09 13:39:07 -08:00
H.J. Lu 18ce612353
Use PC-relative address for x32 TLS address
Since x32 supports PC-relative address, it shouldn't use EBX for TLS
address.  Instead of checking N.getValueType(), we should check
Subtarget->is32Bit().  This fixes PR 22676.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D16474
2020-12-02 22:20:36 +00:00
Craig Topper f385823e04 [X86] Alternate implementation of D88194.
This uses PreprocessISelDAG to replace the constant before
instruction selection instead of matching opcodes after.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D89178
2020-10-27 00:20:03 -07:00
Wei Wang d602e79a81 [X86] Encode global address in small code model
In small code model, program and its symbols are linked in the lower 2 GB of
the address space. Try encoding global address even when the range is unknown
in such case.

Differential Revision: https://reviews.llvm.org/D89341
2020-10-26 23:14:06 -07:00
Craig Topper a7e45ea30d [X86] Add memory operand to AESENC/AESDEC Key Locker instructions.
This removes FIXMEs from selectAddr.
2020-10-03 21:42:16 -07:00
Craig Topper 39fc4a0b0a [X86] Move ENCODEKEY128/256 handling from lowering to selection.
We should avoid emitting MachineSDNodes from lowering.

We can use the the implicit def handling in InstrEmitter to avoid
manually copying from each xmm result register. We only need to
manually emit the copies for the implicit uses.
2020-10-03 18:44:53 -07:00
Craig Topper adccc0bfa3 [X86] Add X86ISD opcodes for the Key Locker AESENC*KL and AESDEC*KL instructions
Instead of emitting MachineSDNodes during lowering, emit X86ISD
opcodes. These opcodes will either be selected by tablegen
patterns or custom selection code.

Emitting MachineSDNodes during lowering is uncommon so this makes
things more consistent. It also allows selectAddr to be called to
perform address matching during instruction selection.

I had trouble getting tablegen to accept XMM0-XMM7 as results in
an isel pattern for the WIDE instructions so I had to use custom
instruction selection.
2020-10-03 16:55:19 -07:00
Craig Topper 46673763fe [X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract
Fixes PR47482
2020-09-14 16:59:04 -07:00
Craig Topper da1aaa0b70 Revert "[X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract."
I got the bug number wrong.

This reverts commit 3251593890.
2020-09-14 16:58:57 -07:00
Craig Topper 3251593890 [X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract.
Fixes PR47525
2020-09-14 16:28:37 -07:00
Simon Pilgrim 23d9f4b958 [X86] Fix llvm-qualified-auto warning by using auto*. NFC. 2020-09-03 14:21:17 +01:00
Craig Topper ab7151f1cf [X86] Make PreprocessISelDAG create X86ISD::VRNDSCALE nodes with i32 constants instead of i8.
This is the type declared in X86InstrFragmentsSIMD.td. ISel pattern
matching doesn't check so it doesn't matter in practice. Maybe for
SelectionDAG CSE it would matter.
2020-08-17 17:25:51 -07:00
Craig Topper 9201efb3b9 [X86] Custom match X86ISD::VPTERNLOG in X86ISelDAGToDAG in order to reduce isel patterns.
By factoring out the end of tryVPTERNLOG, we can use the same code
to directly match X86ISD::VPTERNLOG. This allows us to remove
around 3-4K worth of X86GenDAGISel.inc.
2020-08-10 23:15:58 -07:00
Wang, Pengfei 9512525947 [X86][FPEnv] Teach X86 mask compare intrinsics to respect strict FP semantics.
When we use mask compare intrinsics under strict FP option, the masked
elements shouldn't raise any exception. So, we cann't replace the
intrinsic with a full compare + "and" operation.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D85385
2020-08-11 10:28:41 +08:00
Craig Topper 966a58e329 [X86] Support matching VPTERNLOG when the root node is X86ISD::ANDNP. 2020-08-08 13:11:47 -07:00
Craig Topper 75f134eec1 [X86] Refactor the broadcast and load folding in tryVPTESTM to reduce some code.
Now we try to load and broadcast together for operand 1. Followed
by load and broadcast for operand 1. Previously we tried load
operand 1, load operand 1, broadcast operand 0, broadcast operand 1.

Now we have a single helper that tries load and broadcast for
one operand that we can just call twice.
2020-07-31 23:57:13 -07:00
Craig Topper 1bd7046e4c [X86] Use TargetLowering::getRegClassFor to simplify some code in tryVPTESTM. NFCI 2020-07-31 21:39:10 -07:00
Craig Topper 93c678a79b [X86] Simplify vpternlog immediate selection.
Rather than hardcoding immediate values for 12 different combinations
in a nested pair of switches, we can perform the matched logic
operation on 3 magic constants to calculate the immediate.

Special thanks to this tweet https://twitter.com/rygorous/status/1187034321992871936
for making me realize I could do this.
2020-07-31 17:16:27 -07:00
Craig Topper c4823b24a4 [X86] Add custom lowering for llvm.roundeven with sse4.1.
We can use the roundss/sd/ps/pd instructions like we do for
ceil/floor/trunc/rint/nearbyint.

Differential Revision: https://reviews.llvm.org/D84592
2020-07-29 10:23:08 -07:00
Craig Topper df12524e6b [X86] Turn X86DAGToDAGISel::tryVPTERNLOG into a fully custom instruction selector that can handle bitcasts between logic ops
Previously we just matched the logic ops and replaced with an
X86ISD::VPTERNLOG node that we would send through the normal
pattern match. But that approach couldn't handle a bitcast
between the logic ops. Extending that approach would require us
to peek through the bitcasts and emit new bitcasts to match
the types. Those new bitcasts would then have to be properly
topologically sorted.

This patch instead switches to directly emitting the
MachineSDNode and skips the normal tablegen pattern matching.
We do have to handle load folding and broadcast load folding
ourselves now. Which also means commuting the immediate control.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D83630
2020-07-26 12:19:08 -07:00
Eric Christopher e958379581 Fold the opt size check into the assert to silence an unused variable warning. 2020-07-13 16:05:24 -07:00
Hiroshi Yamauchi fb558ccae7 [PGO][PGSO] Add profile guided size optimization to X86ISelDAGToDAG.
Differential Revision: https://reviews.llvm.org/D83331
2020-07-13 10:28:09 -07:00
Xiang1 Zhang 939d8309db [X86-64] Support Intel AMX Intrinsic
INTEL ADVANCED MATRIX EXTENSIONS (AMX).
AMX is a new programming paradigm, it has a set of 2-dimensional registers
(TILES) representing sub-arrays from a larger 2-dimensional memory image and
operate on TILES.

These intrinsics use direct TMM register number as its params.

Spec can be found in Chapter 3 here https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D83111
2020-07-07 10:13:40 +08:00
Craig Topper e75f2d5a8c [X86] Add matching support for X86ISD::ANDNP to X86DAGToDAGISel::tryVPTERNLOG. 2020-07-03 17:50:35 -07:00
Craig Topper 52855ed099 [X86] Add back support for matching VPTERNLOG from back to back logic ops.
I think this mostly looks ok. The only weird thing I noticed was
a couple rotate vXi8 tests picked up an extra logic op where we have

(and (or (and), (andn)), X). Previously we matched the (or (and), (andn))
to vpternlog, but now we match the (and (or), X) and leave the and/andn
unmatched.
2020-07-02 22:11:52 -07:00
Craig Topper 1df1186ab1 [X86] Use some preprocessor macros to reduce the very similar repeated code in getVPTESTMOpc. NFCI
This function picks X86 opcode name based on type, masking,
and whether not a load or broadcast has been folded using multiple
switch statements. The contents of the switches mostly just vary in
a few characters in the instruction name. So use some macros to
build the instruction names to reduce the repetiveness.
2020-06-30 14:38:22 -07:00
Guillaume Chatelet a976ea3209 [Alignment][NFC] Migrate PPC, X86 and XCore backends to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82779
2020-06-30 08:08:45 +00:00
Craig Topper d72cb4ce21 Recommit "[X86] Separate imm from relocImm handling."
Fix the copy/paste mistake that caused it to fail previously
2020-06-15 10:59:43 -07:00