Commit Graph

69244 Commits

Author SHA1 Message Date
Philip Reames 79f0413e5e [RISCV] Use branchless form for selects with -1 in either arm
We can lower these as an or with the negative of the condition value. This appears to result in significantly less branch-y code on multiple common idioms (as seen in tests).

Differential Revision: https://reviews.llvm.org/D135316
2022-10-06 15:18:43 -07:00
Philip Reames 027516553d [RISCV] Verify that policy operands only exist on instructions with tied passthru operands
This is a non-trivial property relied upon by D135396. I wrote this to convince myself it was true.

Differential Revision: https://reviews.llvm.org/D135403
2022-10-06 15:18:43 -07:00
Philip Reames 04bb32e58a [DAG] Extract helper for (neg x) [nfc]
This is a frequently reoccurring pattern, let's factor it out.

Differential Revision: https://reviews.llvm.org/D135301
2022-10-06 13:23:52 -07:00
Joao Moreira eac3e5c3fb [X86] Do not emit JCC to __x86_indirect_thunk
Clang may optimize conditional tailcall blocks with the following layout:

cmp <condition>
je  tailcall_target
ret

When retpoline is in place, indirect calls are converted into direct calls to a retpoline thunk. When these indirect calls are tail calls, they may be subject to the above described optimization (there is no indirect JCC, but since now the jump is direct it can be made conditional). The above layout is non-ideal for the Linux kernel scenario because the branches into thunks may be patched back into indirect branches during runtime depending on the underlying CPU features, what would not be feasible if the binary is emitted with the optimized layout above.

Thus, prevent clang from emitting this it if CodeModel is Kernel.

Feature request from the respective kernel mailing list: https://lore.kernel.org/llvm/Yv3uI%2FMoJVctmBCh@worktop.programming.kicks-ass.net/

Reviewed By: nickdesaulniers, pengfei

Differential Revision: https://reviews.llvm.org/D134915
2022-10-06 11:09:24 -07:00
Ivan Tetyushkin 0e6c1576e6 [RISCV] Optimization for using compressed beqz and bnez PR#56391
Optimization for using compressed beqz and bnez

If there is pattern
```
br_cc val1 constval eq/neq place
select_cc val1 constval eq/neq trueval falseval
```
and constval does not fit in compressed imm format(6 bit), but fit in
imm format(12 bit), we can replace by non compress sub and compress
c.beqz/c.bneqz:

```
addi val val -constval
c.beqz val place
```

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132839
2022-10-06 09:33:32 -07:00
Philip Reames d89d45ca9a [RISCV][InsertVSETVLI] Default to MA not MU
This changes the default value used for mask policy from mask undisturbed to mask agnostic. In hardware, there may be a minor preference for ta/ma, but since this is only going to apply to instructions which don't use the mask policy bit, this is functionally mostly a nop. The main value is to make future changes to using MA when legal for masked instructions easier to review by reducing test churn.

The prior code was motivated by a desire to minimize state transitions between masked and unmasked code. This patch achieves the same effect using the demanded field logic (landed in afb45ff), and there are no regressions I spotted in the test diffs. (Given the size, I have only been able to skim.) I do want to call out that regressions are possible here; the demanded analysis only works on a block local scope right now, so e.g. a tight loop mixing masked and unmasked computation might see an extra vsetvli or two.

Differential Revision: https://reviews.llvm.org/D133803
2022-10-06 07:59:39 -07:00
Philip Reames afb45ffce7 [RISCV][InsertVSETVLI] Treat mask policy as undemanded if usesMaskPolicy is false
Differential Revision: https://reviews.llvm.org/D135327
2022-10-06 07:20:16 -07:00
Nikita Popov 3d0b5f019e [AA] Remove unused template argument from AAResultBase (NFC)
After D94363, there is no more need to use CRTP here.
2022-10-06 10:21:17 +02:00
Pierre van Houtryve bb71079e30 [AMDGPU][GISel] Add missing V2S16 BUILD_VECTOR_TRUNC legalization
Previously we would be unable to legalize V2S16 BUILD_VECTOR_TRUNC on GFX8 & below as the custom legalization was missing.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135149
2022-10-06 06:48:53 +00:00
Ilia Diachkov 25ee36c6b1 [SPIRV] read kernel arg attributes from fuction/module metadata
The patch introduces reading the attributes of kernel arguments both from
function-attached and module-level metadata, during kernel arguments lowering.
Two tests are added to show the improvement.

Differential Revision: https://reviews.llvm.org/D135106

Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com>
Co-authored-by: Andrey Tretyakov <andrey.tretyakov@mail.com>
Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>
2022-10-06 04:43:52 +03:00
Sami Tolvanen 43f4c215a1 [AArch64][KCFI] Define Size for KCFI_CHECK
Specify the correct size for the KCFI_CHECK pseudo
instruction, which is lowered into six 4-byte instructions in
AArch64AsmPrinter::LowerKCFI_CHECK.

Link: https://github.com/ClangBuiltLinux/linux/issues/1730
2022-10-05 22:24:50 +00:00
Quentin Colombet 6e440ee2aa [RISCV][ISel] Finally fix the UBSan error
Forgot another SDValue check and a boolean initialization.
2022-10-05 21:43:09 +00:00
Quentin Colombet 6bbe7d376e [RISCV][ISel] Attempt to fix UBSan error
Explicitly check an SDValue with the invalid SDValue.

UBSan reports:
runtime error: load of value 36, which is not a valid value for type
'bool'

https://lab.llvm.org/buildbot/#/builders/85/builds/11231
2022-10-05 20:59:28 +00:00
Quentin Colombet c5c2de287e [RISCV][ISel] Fold extensions when all the users can consume them
This patch allows the combines that fold extensions in binary operations
to have more than one use.
The approach here is pretty conservative: if all the users of an
extension can fold the extension, then the folding is done, otherwise we
don't fold.
This is the first step towards avoiding the one-use limitation.

As a result, we make a decision to fold/don't fold for a web of
instructions. An instruction is part of the web of instructions as soon
as it consumes an extension that needs to be folded for all its users.

Because of how SDISel works a web of instructions can be visited over
and over. More precisely, if the folding happens, it happens for the
whole web and that's the end of it, but if the folding fails, the whole
web may be revisited when another member of the web is visited.

To avoid a compile time explosion in pathological cases, we bail out
earlier for webs that are bigger than a given threshold (arbitrarily set
at 18 for now.) This size can be changed using
`--riscv-lower-ext-max-web-size=<maxWebSize>`.

At the current time, I didn't see a better scheme for that. Assuming we
want to stick with doing that in SDISel.

Differential Revision: https://reviews.llvm.org/D133739
2022-10-05 20:49:21 +00:00
David Green 03b145480d [AArch64] Add tablegen patterns for bf16 trn/zip/uzp.
This adds some missing tablegen patterns to handle trn1/trn2/zip1/zip2/uzp1/uzp2,
similar to the Arm handling in 5e1a9d319d, but via tablegen
patterns for the AArch64 backend.
2022-10-05 21:47:36 +01:00
Quentin Colombet 4852f26acd [RISCV][ISel] Refactor the formation of VW operations
This patch centralizes all the combines of add|sub|mul with extended
operands in one "framework".
The rationale for this change is to offer a one-stop-shop for all these
transformations so that, in the future, it is easier to make combine
decisions for a web of instructions (i.e., instructions connected
through s|zext operands).

Technically this patch is not NFC because the new version is more
powerful than the previous version.
In particular, it diverges in two cases:
- VWMULSU can now also be produced from `mul(splat, zext)`, whereas
  previously only `mul(sext, splat)` were supported when `splat`s were
  involved. (As demonstrated in rvv/fixed-vectors-vwmulsu.ll)
- VWSUB(U) can now also be produced from `sub(splat, ext)`, whereas
  previously only `sub(ext, splat)` were supported when `splat`s were
  involved. (As demonstrated in rvv/fixed-vectors-vwsub.ll)

If we wanted, we could block these transformations to make this
patch really NFC. For instance, we could do something similar to
`AllowSplatInVW_W`, which prevents the combines to form vw(add|sub)(u)_w
when the RHS is a splat.

Regarding the "framework" itself, the bulk of the patch is some
boilderplate code that abstracts away the actual extensions that are
present in the DAG. This allows us to handle `vwadd_w(ext a, b)` as if
it was a regular `add(ext a, ext b)`. Since the node `ext b` doesn't
actually exist in the DAG, we have a bunch of methods (all in the
NodeExtensionHelper class) that fake all that for us.

The other half of the change is around `CombineToTry` and
`CombineResult`. These helper structures respectively:
- Represent the kind of combines that can be applied to a node, and
- Store what needs to happen to do that combine.

This can be viewed as a two step approach:
- First, check if a pattern applies, and
- Second apply it.

The checks and the materialization of the combines are decoupled so that
in the future we can perform several checks and do all the related
applies in one go.

Differential Revision: https://reviews.llvm.org/D134703
2022-10-05 17:43:48 +00:00
Joe Nash 203d0b0ee1 [AMDGPU] Fix V_CMP_CLASS_F16_t16_e64 src1 type.
For V_CMP_CLASS_F16_t16_e64 and V_CMPX_CLASS_F16_t16_e64,
https://reviews.llvm.org/D133723 changed the value type of src1 from i32 to i16.
These src1 operands are 16 bits, therefore need to be encoded as true16
operands. So the _e32 type was correctly set to VGPR_32_Lo128.
In _e64 form the operand class went from
VSrc_b32 to VSrc_b16. For some reason, we cannot encode inline literals for
VSrc_b16, see 5f5f566b26. In this phase of
the true16 implementation, VSrc_b16 and VSrc_b32 are still similar,
except from that quirk of inlines. So set the operand class to regain
that function.

Reviewed By: dp, arsenm

Differential Revision: https://reviews.llvm.org/D134897
2022-10-05 11:15:40 -04:00
Michael Maitland fbf8aa1b49 [RISCV][CodeGen] Add Scheduling for vset{i}vl{i} instruction
Differential Revision: https://reviews.llvm.org/D135188
2022-10-05 07:54:22 -07:00
Juan Manuel MARTINEZ CAAMAÑO fa2b1cb8c9 [NFC][AMDGPULowerKernelAttributes] Factorize repeated code into function
Differential Revision: https://reviews.llvm.org/D135266
2022-10-05 09:26:39 -05:00
Anton Sidorenko 3e97e94237 [NFC][RISCV] Move getSEWLMULRatio function to header
More uses of getSEWLMULRatio will be added in D130895.

Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D135086
2022-10-05 15:10:53 +01:00
Dmitry Preobrazhensky f4b1cfa1cb [AMDGPU][MC][GFX11] Correct e64_dpp variants of v_movreld and v_movrelsd
Differential Revision: https://reviews.llvm.org/D135079
2022-10-05 16:47:18 +03:00
Kerry McLaughlin f7f44f018f [AArch64][SME] Set up a lazy-save/restore around calls.
Setting up a lazy-save mechanism around calls is done during SelectionDAG
because calls to intrinsics may be expanded into an actual function call
(e.g. calls to @llvm.cos()), and maintaining an allowed-list in the SMEABI
pass is not feasible.

The approach for conditionally restoring the lazy-save based on the runtime
value of TPIDR2_EL0 is similar to how we handle conditional smstart/smstop.
We create a pseudo-node which gets expanded into a conditional branch and
expands to a call to __arm_tpidr2_restore(%tpidr2_object_ptr).

The lazy-save buffer and TPIDR2 block are only allocated once at the start
of the function. For each call, the TPIDR2 block is initialised, and at
the end of the call, a pseudo node (RestoreZA) is planted.

Patch by Sander de Smalen.

Differential Revision: https://reviews.llvm.org/D133900
2022-10-05 14:36:53 +01:00
Johannes Doerfert 477e8e10f0 [Attributor] Teach AAPointerInfo to look into aggregates
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
2022-10-05 06:19:47 -07:00
David Sherwood f0f474dfd0 [AArch64][SME] Add codegen pass to handle ZA state in arm_new_za functions.
The new pass implements the following:

* Inserts code at the start of an arm_new_za function to
    commit a lazy-save when the lazy-save mechanism is active.
* Adds a smstart intrinsic at the start of the function.
* Adds a smstop intrinsic at the end of the function.

Patch co-authored by kmclaughlin.

Differential Revision: https://reviews.llvm.org/D133896
2022-10-05 09:43:57 +00:00
David Sherwood 991a36da1b [AArch64][SME] Prevent SVE object address calculations between smstop and call
This patch introduces a new AArch64 ISD node (OBSCURE_COPY) that can
be used when we want to prevent SVE object address calculations
from being rematerialised between a smstop/smstart and a call.
At the moment we use COPY to copy the frame index to a register,
which leads to problems because the "simple register coalescing"
pass understands the COPY instruction and attempts to rematerialise
an address calculation with 'addvl' between an smstop and a call.
When in streaming mode the 'addvl' instruction may have different
behaviour because the streaming SVE vector length is not guaranteed
to equal the normal SVE vector length.

The new ISD opcode OBSCURE_COPY gets lowered to a new pseudo
instruction also called OBSCURE_COPY. This ensures it cannot be
rematerialised and we expand this into a simple move very late in
the machine instruction pipeline.

A new test is added here:

CodeGen/AArch64/sme-streaming-interface.ll

Differential Revision: https://reviews.llvm.org/D134940
2022-10-05 08:11:16 +00:00
Martin Storsjö 2f7fbf8376 [AArch64] Add missing SEH_Nop when aligning the stack
This makes sure that the instructions of the prologue matches the
SEH opcodes.

Also remove a couple redundant cases of setting HasWinCFI; it was
already set unconditionally after the conditional cases.

Differential Revision: https://reviews.llvm.org/D135101
2022-10-05 11:00:36 +03:00
Yeting Kuo 74a130af97 [RISCV] Add isel patterns for vfmacc, vfnmacc, vfmsac and vfnmsac.
The patch selects VSELECT_VL/VP_MERGE_VL that uses VF(N)M(ACC|SAC) as its
true operand and the adden of the true operand as its false operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D135080
2022-10-05 09:56:43 +08:00
Stefan Pintilie 30d639180f [PowerPC] Fix the register allocation hints for ACC registers.
The allocation hints for copies of ACC registers assumed that we would only be
copying between VSRp and UACC registers. In reality it is also possible to copy
between UACC and ACC registers.

This patch adds a new case for the ACC copy to fix that issue.
Note that the test case added with this patch will hit an assert without the
fix.

Reviewed By: lei, amyk

Differential Revision: https://reviews.llvm.org/D134501
2022-10-04 20:30:16 -05:00
Sam Clegg be758cd4a3 [WebAssembly][MC] Fix missing `else` after `return` due to type checker bug
Once we are in the `Unreachable` we want to disable type checking, but
we were unconditionally returning `true` here which means we encountered
and error.  Instead we unconditionally return false to signal no error.

Fixes: https://github.com/llvm/llvm-project/issues/56935

Differential Revision: https://reviews.llvm.org/D135195
2022-10-04 16:43:22 -07:00
Amara Emerson 8055aa8e8a [AArch64][GlobalISel] Make vector G_SEXT_INREG legal and allow combining.
As a result of making these legal, and tweaking the combine to allow vectors,
we generate vector G_SEXT_INREG during legalization.

The reason we want to make these legal in the first place is to allow for
more combine opportunities. Once those have been done, we can just lower them
back to shifts in the post-legalizer lowering.

This needs to be one commit otherwise we start causing tests to fail due to
incomplete support for selection etc.
2022-10-05 00:28:08 +01:00
Craig Topper ece4bb5ab8 [RISCV] Teach SExtWRemoval to recognize sign extended values that come from arguments.
This information is not preserved in MIR today. So this patch adds
information to RISCVMachineFunctionInfo when the vreg is created for
the argument.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D134621
2022-10-04 15:39:10 -07:00
David Green cf43154bc3 [AArch64] Ensure condition (SUBS) has no uses of value in performCONDCombine
performCONDCombine removes and 0xff in patterns of
SUBS (and (add(..), 0xff), C)
under certain complex conditions. It doesn't come up often,
but in the lowering of usub.sat where the SUBS is both used as a
condition and as a value, the And is removed where it would only be
valid for the condition.

Fixes #58109.

Differential Revision: https://reviews.llvm.org/D135043
2022-10-04 21:18:30 +01:00
Eli Friedman 76ccd1db73 [AArch64] Don't form paired loads from epilogue operations on Windows
AArch64LoadStoreOptimizer has a bunch of different guards to avoid
corrupting Windows SEH prologues/epilogues, but apparently we missed the
case of merging two instructions where the first instruction isn't part
of the epilogue, but the second instruction is.

Fixes issue discovered at https://reviews.llvm.org/D130049#3704064

Differential Revision: https://reviews.llvm.org/D134992
2022-10-04 11:41:59 -07:00
Chris Bieneman 618e5006a5 [DirectX] Generate `dx.resources` metadata entry
This code adds initial support for generating the HLSL resources
metadata entries. It has a lot of `FIXMEs` laying around because there
is a lot more work to do here, but this lays a solid groundwork and can
accurately handle some trivial cases.

I've filed a swath of issues covering the deficiencies here and left the
issues in comments so that we can easily follow them.

One big change to make sooner rather than later is to move some of this
code into a new libLLVMFrontendHLSL so that we can share it with the
Clang CodeGen layer.

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D134682
2022-10-04 12:36:39 -05:00
Craig Topper a38fb90b19 [RISCV] Refactor and improve eliminateFrameIndex.
There are few changes mixed in here.

-Try to reuse the destination register from ADDI instead of always
creating a virtual register. This way we lean on the register
scavenger in fewer case.
-Explicitly reuse the primary virtual register when possible. There's
still a case where both getVLENFactoredAmount and handling large
fixed offsets can both create a secondary virtual register.
-Combine similar BuildMI calls by manipulating the Register variables.

There are still a couple early outs for ADDI, but overall I tried to
arrange the code into steps.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135009
2022-10-04 09:32:27 -07:00
Craig Topper 2734968e32 [RISCV] Restructure eliminateFrameIndex to share more code. NFC
The old code took two different paths based on whether there is
a scalable offset, but these two paths had some code in common.

The main difference between the two code paths was whether we needed
to create a GPR or not for the ADDI that gets created for RVVSpill.
If we had a scalable offset, the same GPR was used as the destination
for adding the scalable offset and the ADDI. To manage this, we now
cache the scratch register and reuse it if it has already been created.

This is a pre-patch for D135009.

Reviewed By: reames, frasercrmck

Differential Revision: https://reviews.llvm.org/D135092
2022-10-04 09:27:48 -07:00
Pierre van Houtryve 75b292cb14 [AMDGPU][DAG] Fix insert_vector_elt lowering for 8 bit elements
The bitmask used to extract the bits assumed 16 bit elements and wasn't taking the size of the elements into account.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135156
2022-10-04 14:48:15 +00:00
Pierre van Houtryve c93104073c [AMDGPU] Always lower SHUFFLE_VECTOR
Make it illegal, remove InstructionSelector logic for it

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134967
2022-10-04 14:23:17 +00:00
Amara Emerson 75b18ba14d Revert "[AArch64][GlobalISel] Fold away lowered vector sign-extend of vector compares."
This reverts commit dcd02a524b.

We should instead use the generic combine.
2022-10-04 11:03:02 +01:00
Craig Topper 05df15965b [RISCV] Use _TIED form of VFWADD(U)_WV/VFWSUB(U)_WV to avoid early clobber.
One of the sources is the same size as the destination so that source
doesn't have an overlap with the destination register. By using the _TIED
form we avoid an early clobber contraint for that source.

This matches what was already done for instrinsics. ConvertToThreeAddress
will fix it if it can't stay tied.
2022-10-03 21:44:08 -07:00
Craig Topper b41fe90dc3 [RISCV] Correct the setcc in vp.floor/ceil/round/roundeven lowering.
We want to emit a masked setcc that preserves zeros in all of the bits
where the original mask is zero. To do this we need to pass the original
mask as the passthru operand as well. Otherwise, we'll use the mask agnostic
policy and replace the zeros with 1s on some CPUs.

Differential Revision: https://reviews.llvm.org/D135122
2022-10-03 20:58:05 -07:00
Nemanja Ivanovic 4ea121c904 [PowerPC] Fix a number of inefficiencies and issues with atomic code gen
There are a few issues with the code we generate for atomic operations and the way we generate it:

- Hard coded CR0 for compares
- Order of operands for compares not conducive to
  emitting compare-immediate or for CSE of compares
- Missing MachineMemOperand for st[bhwd]cx intrinsics
- Missing intrinsic properties for the same
- Unnecessary blocks with store conditional
  instructions to clear reservation (which ends
  up hindering performance)
- Move from CR instructions just to compare the
  result of a store conditional with zero (even
  though it is a record-form)

This patch aims to resolve all of those issues.

Differential revision: https://reviews.llvm.org/D134783
2022-10-03 19:55:29 -05:00
Jessica Paquette d46751a572 [GlobalISel][AArch64] Lower G_FMAD
Noticed this falling back on CTMark at -Os (bullet).

Seems like we have no 1:1 matching for it, so match SDAG and just lower.

Add testcases for common legal cases as well.

Differential Revision: https://reviews.llvm.org/D135111
2022-10-03 15:15:17 -07:00
Andrew Savonichev d420110a1e [NVPTX] Fix constant expression initializers for global variables
Before this patch the code in printScalarConstant was unable to handle
nested constant expressions like (gep (addrspacecast ptr)) and crashed
with:

LLVM ERROR: Unsupported expression in static initializer:
  addrspacecast ([4 x i8] addrspace(1)* @ga to [4 x i8]*)

We can use lowerConstantForGV instead which is a customized version of
lowerConstant that supports generic() and nested expressions.

Differential Revision: https://reviews.llvm.org/D127878
2022-10-04 00:29:42 +03:00
Philip Reames e884324145 [RISCV] Generalize select (and (x , 0x1) == 0), y, (z ^ y) ) and select (and (x , 0x1) == 0), y, (z | y) ) transforms by removing and-clause
These transforms were recently added (by me) in D134881. Looking at the code again, I realized we don't need the (and x, 0x1) portion of the pattern, we just need to know that the result of that sub-tree is either 0 or 1. Checking for this directly allows us to match slightly more broadly. The test changes are zext i1 arguments, but this could also kick in for e.g. shifts of high bits, or any other source of known bits.

Differential Revision: https://reviews.llvm.org/D135081
2022-10-03 13:57:38 -07:00
Amara Emerson dcd02a524b [AArch64][GlobalISel] Fold away lowered vector sign-extend of vector compares.
This fixes a long standing cause of awful code generation when legalization creates
G_SEXT(G_FCMP(...)), for example due to promoting the condition of a vector G_SELECT.

Since on AArch64 vector compares sign-extend the condition value, there's no need
for this extra G_SEXT. Unfortunately by the time we get to post-legalization these
G_SEXTs have already been lowered into shifts, so this combine is a bit more
involved than I'd ideally like. Oh well.

Differential Revision: https://reviews.llvm.org/D135078
2022-10-03 21:39:53 +01:00
Amara Emerson 07ccf651b9 x[AArch64][GlobalISel] Enable vector support for G_SELECT->G_FMAXIMUM/MINIMUM.
Vector support seems to work immediately, as long as we run the combine before
legalization (so the vector SELECTs don't get lowered) and the legalizer rules
are there to enable generation.

Differential Revision: https://reviews.llvm.org/D135047
2022-10-03 21:39:52 +01:00
jeff f4e6149d82 [AMDGPU] Use V_PERM to match buildvectors when inputs are not canonicalized (i.e. can't use V_PACK)
If we can not prove that f16 operands of a buildvector are canonicalized, then we can not lower into a V_PACK. In this scenario, we would previously lower into some combination of and(sdwa), shr, or. This patch allows for matching into V_PERM instead.

Change-Id: Ifa4a74fdb81ef44f22ba490c7fdf81ec8aebc945
2022-10-03 12:58:29 -07:00
Philip Reames a200b0fc25 [DAG] Introduce getSplat utility for common dispatch pattern [nfc]
We have a very common pattern of dispatching between BUILD_VECTOR and SPLAT_VECTOR creation repeated in many cases in code.  Common the pattern into a utility function.
2022-10-03 12:49:39 -07:00
Craig Topper f3e87a63e5 [RISCV] Add missing VL arguments to the creation of RISCVISD::VMV_V_X_VL nodes.
VMV_V_X_VL nodes should always have a passthru, a splat, and a VL.
We were sometimes missing the VL.

This went unnoticed because these cases were all selected into the
following node to form a .vx or .vi instruction. The ComplexPattern
that does this, doesn't check the VL operand. I've added an assert
to the ComplexPattern to catch if the operand is missing.

@qcolombet spotted some of these in D134703.
2022-10-03 12:21:05 -07:00
Craig Topper 5b06ccb611 Revert "foo"
This reverts commit 2138ef354a.

Forgot to squash
2022-10-03 12:15:41 -07:00
Craig Topper a55cdcae3e Revert "[RISCV] Add missing VL arguments to the creation of RISCVISD::VMV_V_X_VL nodes."
This reverts commit 4c03c9f375.

Forgot to squash
2022-10-03 12:15:28 -07:00
Craig Topper 4c03c9f375 [RISCV] Add missing VL arguments to the creation of RISCVISD::VMV_V_X_VL nodes.
VMV_V_X_VL nodes should always have a passthru, a splat, and a VL.
We were sometimes missing the VL.

This went unnoticed because these cases were all selected into the
following node to form a .vx or .vi instruction. The ComplexPattern
that does this, doesn't check the VL operand. I've added an assert
to the ComplexPattern to catch if the operand is missing.

@qcolombet spotted some of these in D134703.
2022-10-03 12:13:21 -07:00
Craig Topper 2138ef354a foo 2022-10-03 12:13:20 -07:00
Chris Bieneman c0a76c2c71 [DirectX] Add DXIL metadata `dx.shaderModel`
This captures the target shader model and pipeline stage into the DXIL
metadata for consumption by the DirectX runtime.

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D134469
2022-10-03 13:00:11 -05:00
Craig Topper 31bca38ad1 [RISCV] Pass the destination register to getVLENFactoredAmount instead of returning it. NFC
This is a refactor for another patch. For now we move the vreg
creation to the caller.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D135008
2022-10-03 10:59:35 -07:00
Zain Jaffal 966411790e
[AArch64] Add support to loop vectorization for non temporal loads
Currently, AArch64 doesn't support vectorization for non temporal loads because `isLegalNTLoad` is not implemented for the target.
This patch applies similar functionality as `D73158` but for non temporal loads

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D131964
2022-10-03 17:06:47 +01:00
Sam Clegg 664a5c6d03 [WebAssembly] Fix return type of __builtin_return_address under wasm64
Differential Revision: https://reviews.llvm.org/D135005
2022-10-03 08:31:52 -07:00
Amara Emerson 3daf7ddaef [GlobalISel] Allow prelegalizer combiners to have access to LegalizerInfo.
Before, the isPreLegalize() query in CombinerHelper only checked for the
presence of a LegalizerInfo object. This is problematic when we want to have
a combine actually check for legality in a pre-legalizer combine pass, since
if we pass a LegalizerInfo object to the constructor it causes the combines to
think that we're running *post* legalizer, which isn't true.

This change fixes it to instead check an explicit bool that passes to signal
whether the pass will be run before or after legalization.

Doing so exposed a bug in the extending loads combine, which tried to check for
legality of candidate extending loads if LegalizerInfo was present. Since we
only ran it pre-legalizer and therefore with a null LegalizerInfo, it never
actually ran. Also fixes the legality checks to keep the tests passing.

Differential Revision: https://reviews.llvm.org/D135044
2022-10-03 07:36:18 +01:00
David Green 5e1a9d319d [ARM] Add lowering for bf16 neon vtrn, vzup and vuzp.
These go via Dag2Dag, which are better based on element sizes not the
exact element types.
2022-10-02 15:34:37 +01:00
David Green f2fde99461 [ARM] More bf16 shuffle handling, including perfect shuffles. 2022-10-02 14:31:51 +01:00
David Green 8193f0d1d2 [ARM] Add tablegen patterns for bf16 vrev 2022-10-02 13:42:14 +01:00
David Green 58369c8631 [ARM] Add tablegen patterns for bf16 vext
This adds missing tablegen patterns for VEXT, identical to the fp16
patterns as they only use baseline Neon operations.
Part of fixing #57770.
2022-10-02 12:45:58 +01:00
Craig Topper 5bbc5eb55f [RISCV] Use _TIED form of VWADD(U)_WX/VWSUB(U)_WX to avoid early clobber.
One of the sources is the same size as the destination so that source
doesn't have an overlap with the destination register. By using the _TIED
form we avoid an early clobber contraint for that source.

This matches what was already done for instrinsics. ConvertToThreeAddress
will fix it if it can't stay tied.
2022-10-01 16:34:39 -07:00
Craig Topper 85db4f10e3 [RISCV] Minor tablegen formatting cleanup. NFC 2022-10-01 15:59:25 -07:00
zhongyunde 4a549be9c3 [AArch64] Lower multiplication by a negative constant to shl+sub+shl
Change the costmodel to lower a = b * C where C = -(2^n - 2^m) to
            lsl     w8, w0, m
            sub     w0, w8, w0, lsl n
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D134934
2022-10-01 21:27:42 +08:00
Filipp Zhinkin 945a1468c9 [ARM] Support all versions of AND, ORR, EOR and BIC in optimizeCompareInstr
Combine cmp with zero and all versions of AND, ORR, EOR and BIC instructions into S-suffixed versions.

Related issue: https://github.com/llvm/llvm-project/issues/57122

Reviewed By: efriedma, samtebbs

Differential Revision: https://reviews.llvm.org/D131786
2022-10-01 12:41:37 +03:00
Carl Ritson a35013bec6 [AMDGPU][GFX11] Mitigate VALU mask write hazard
VALU use of an SGPR (pair) as mask followed by SALU write to the
same SGPR can cause incorrect execution of subsequent SALU reads
of the SGPR.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D134151
2022-10-01 16:21:24 +09:00
Craig Topper 9273f860c0 [RISCV] Prevent performCombineVMergeAndVOps from creating cycles in the DAG.
If True has a Chain result, the other operands of the vmerge may
depend on it through that Chain. We need to ensure it isn't a
predecessor of those operands.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D134980
2022-09-30 20:01:45 -07:00
Craig Topper de0de294eb [RISCV] Update cost of vector roundeven to match round which uses the same sequence but a different FRM value.
Reviewed By: reames, eopXD

Differential Revision: https://reviews.llvm.org/D134978
2022-09-30 20:01:35 -07:00
Yeting Kuo cefb7aab61 [VP][RISCV] Add vp.copysign and RISC-V support.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134935
2022-10-01 10:19:10 +08:00
Matthias Braun 189900eb14 X86: Stop assigning register costs for longer encodings.
This stops reporting CostPerUse 1 for `R8`-`R15` and `XMM8`-`XMM31`.
This was previously done because instruction encoding require a REX
prefix when using them resulting in longer instruction encodings. I
found that this regresses the quality of the register allocation as the
costs impose an ordering on eviction candidates. I also feel that there
is a bit of an impedance mismatch as the actual costs occure when
encoding instructions using those registers, but the order of VReg
assignments is not primarily ordered by number of Defs+Uses.

I did extensive measurements with the llvm-test-suite wiht SPEC2006 +
SPEC2017 included, internal services showed similar patterns. Generally
there are a log of improvements but also a lot of regression. But on
average the allocation quality seems to improve at a small code size
regression.

Results for measuring static and dynamic instruction counts:

Dynamic Counts (scaled by execution frequency) / Optimization Remarks:
    Spills+FoldedSpills   -5.6%
    Reloads+FoldedReloads -4.2%
    Copies                -0.1%

Static / LLVM Statistics:
    regalloc.NumSpills    mean -1.6%, geomean -2.8%
    regalloc.NumReloads   mean -1.7%, geomean -3.1%
    size..text            mean +0.4%, geomean +0.4%

Static / LLVM Statistics:
    mean -2.2%, geomean -3.1%) regalloc.NumSpills
    mean -2.6%, geomean -3.9%) regalloc.NumReloads
    mean +0.6%, geomean +0.6%) size..text

Static / LLVM Statistics:
    regalloc.NumSpills   mean -3.0%
    regalloc.NumReloads  mean -3.3%
    size..text           mean +0.3%, geomean +0.3%

Differential Revision: https://reviews.llvm.org/D133902
2022-09-30 16:01:33 -07:00
Peter Collingbourne 0caa9d4b1e AArch64: Don't use RETA[AB] when ShadowCallStack is enabled.
When returning from a function with both SCS and PAC-RET enabled, we need to
authenticate the return address from the stack and then load from the SCS,
but this was happening in the reverse order when RETA[AB] were being used.
Fix it by disabling the use of RETA[AB] when SCS is enabled.

Fixes pr58072.

Differential Revision: https://reviews.llvm.org/D134931
2022-09-30 12:33:23 -07:00
Xiang Li a80a888de5 [DirectX backend] Support global ctor for DXILBitcodeWriter.
1. Save typed pointer type for GlobalVariable/Function instead of the ObjectType.
   This will allow use GlobalVariable/Function as value.
2. Save target type for global ctors for Constant.
3. In DXILBitcodeWriter::getTypeID, check PointerMap first for Constant case.

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D133283
2022-09-30 11:27:23 -07:00
Florian Hahn fe49ba84d3
[AArch64] Reflow comment in AArch64IselLowering.cpp (NFC). 2022-09-30 17:17:04 +01:00
Zain Jaffal fca8730793
[AArch64] Refactor opcode selection for LowerMUL (NFC)
Move the logic for selecting `NewOpc` out of `LowerMUL`

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D134875
2022-09-30 16:48:02 +01:00
Philip Reames 1e3c179519 [RISCV] Address post commit review comments from D134881 2022-09-30 08:31:40 -07:00
Saleem Abdulrasool 519a73111b RISCV: adjust relocation emission
Simplify and make the pair-wise relocation more precise.  If either of
the symbol references are textual, the relocation must be delayed.  If
the difference is across sections, delay it as well which partially
matches the behaviour of gas.  We unfortunately do not handle the case
where the difference references a symbol that is not yet defined.  In
such a case, we simply fail to resolve the difference, which should
hopefully not be too onerous (particularly since no other target
supports cross-section references and it is not clear if this was
intentional on the part of RISCV).

Differential Revision: https://reviews.llvm.org/D132262
Reviewed By: @MaskRay
2022-09-30 15:28:48 +00:00
Philip Reames 2b5960028e [RISCV] Branchless lowering for select (and (x , 0x1) == 0), y, (z ^ y) ) and select (and (x , 0x1) == 0), y, (z | y) )
This code is directly ported from the X86 backend which applies the same rewrite (along with several others). Planning on looking more closely at the other branchless variants from x86 to see if any are worth porting in future changes.

Motivation here is the coremark crc8 routine from https://github.com/eembc/coremark/blob/main/core_util.c#L165. This patch significantly reduces the number of unpredictable branches in the workload.

Differential Revision: https://reviews.llvm.org/D134881
2022-09-30 08:24:32 -07:00
Ray Wang 4c786c9747 [RISCV] Remove some unused var decl. NFC
Differential Revision: https://reviews.llvm.org/D134707
2022-09-30 08:08:15 -07:00
Pierre van Houtryve d8258508d4 [AMDGPU][GISel] Update `isCanonicalized`
Recognize more opcodes in the function.
Fixes some regressions introduced in D134857 for fdiv.f16 too.

Depends on D134857

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D134862
2022-09-30 14:13:35 +00:00
Pierre van Houtryve 9a67a6b72a [AMDGPU][GISel] Legalize V2S16 G_BUILD_VECTOR
Preparation patch for D134354 to make V2S16 G_BUILD_VECTOR legal.
Also removes RegBankInfo's scalarization of small BUILD_VECTORs,
replacing it with InstructionSelector logic instead.

This allows for V2S16 BUILD_VECTOR instructions to survive
all the way to ISel so we can select FMA/MAD_MIX instructions
in D134354.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134433
2022-09-30 14:04:53 +00:00
Ivan Kosarev a964099ce5 [AMDGPU][SetWavePriority] Fix dealing with MBBInfo records.
Happened earlier than I anticipated. :)

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134726
2022-09-30 14:27:50 +01:00
Zain Jaffal 661403b85c
[AArch64] Add support for 128-bit non temporal loads.
Adding to the work done in `D131773` here we add support to 128-bit loads.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D132559
2022-09-30 11:04:04 +01:00
gonglingqin 853a1b7236 [LoongArch] Clean up redundant code introduced by conflict resolution. NFC 2022-09-30 16:45:21 +08:00
Yeting Kuo 1cc02b05b7 [SelectionDAG] Add helper function to check whether a SDValue is neutral element. NFC.
Using this helper makes work about neutral elements more easier. Although I only
find one case now, I think it will have more chance to be used since so many
combine works are related to neutral elements.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D133866
2022-09-30 11:29:11 +08:00
Amara Emerson 7653586d88 [AArch64][GlobalISel] Implement another combine for shufflevector->AArch64 G_EXT.
This is a port of an existing optimization in AArch64 ISelLowering, handling
a case when the same input vector can be used for both ext inputs.

Differential Revision: https://reviews.llvm.org/D134891
2022-09-29 22:53:24 +01:00
Philip Reames 900364fccf [RISCV] Minor code motion in InsertVSETVLI [nfc] 2022-09-29 14:01:57 -07:00
Bjorn Pettersson 0513b0305a [X86] Avoid miscompile in combineOr (X86ISelLowering.cpp)
In combineOr (X86ISelLowering.cpp) there is a DAG combine that rewrite
a "(0 - SetCC) | C" pattern into something simpler given that a LEA
can be used. Another requirement is that C has some specific value,
for example 1 or 7. When checking those requirements the code used a
32-bit unsigned variable to store the value of C. So for a 64-bit OR
this could miscompile in case any of the 32 most significant bits in
C were non zero.

This patch adds fixes the bug by using a large enough type for the
C value.

The faulty code seem to have been introduced by commit 9bceb8981d
(D131358).

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134892
2022-09-29 21:24:31 +02:00
zhongyunde 4d15e7b21b [AArch64] Lower multiplication by a constant (NFC)
Refactor according https://reviews.llvm.org/D134706#inline-1298952
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D134848
2022-09-30 01:37:28 +08:00
zhongyunde 62a51c357c [AArch64] Lower multiplication by a constant int to shl+sub+shl
Decompose the const 14 can be separated from D132322
Change the costmodel to lower a = b * C where C = 2^n - 2^m to
        lsl     w8, w0, n
        sub     w0, w8, w0, lsl m
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D134706
2022-09-30 01:31:06 +08:00
Chris Bieneman 5d4dd53570 Revert "[DirectX backend] Support global ctor for DXILBitcodeWriter."
This reverts commit 26129766df.

The reverted commit broke in-tree unit tests for the DirectX backend.
2022-09-29 11:58:27 -05:00
Dmitry Preobrazhensky 485c539391 [AMDGPU][MC][GFX11] Disable non-null src0 for s_waitcnt_*cnt
Differential Revision: https://reviews.llvm.org/D134809
2022-09-29 19:56:03 +03:00
David Green 4c4e544cd8 [ARM] Add an option for disabling omitting DLS.
Useful for testing, this option disables when `DLS lr, lr` gets removed.
2022-09-29 17:42:45 +01:00
Philip Reames 02bfe2de7c [RISCV] Adjust vector immediate store materialization cost
This change updates the costs to make constant pool loads match their actual cost, and adds the broadcast special case to avoid too many regressions. We really need more information about the constants being rematerialized, but this is an incremental improvement.

Differential Revision: https://reviews.llvm.org/D134746
2022-09-29 07:37:13 -07:00
eopXD 02a982829c [RISCV] Add lowering for llvm.roundeven
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134785
2022-09-29 06:08:14 -07:00
Fangrui Song 04a65d62a0 Revert D134638 "[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC"
This reverts commit b7baddc755.

Broke CodeGen/X86/callbr-asm-kill.mir
We shall pay attention when adding new constraints.
2022-09-29 00:54:56 -07:00
Weining Lu b7baddc755 [Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC
k: A memory operand whose address is formed by a base register and
(optionally scaled) index register.

m: A memory operand whose address is formed by a base register and
offset that is suitable for use in instructions with the same
addressing mode as st.w and ld.w.

ZB: An address that is held in a general-purpose register. The offset
is zero.

ZC: A memory operand whose address is formed by a base register and
offset that is suitable for use in instructions with the same
addressing mode as ll.w and sc.w.

Differential Revision: https://reviews.llvm.org/D134638
2022-09-29 15:02:08 +08:00
Abinav Puthan Purayil 3759398b4b [AMDGPU] Report minimum scratch size in code object v5 and later by default
This change sets
-amdgpu-assume-{external-call-stack-size | dynamic-stack-object-size}
options to zero by default for code object v5 and later. The runtime is
expected to adjust the scratch size if the amdhsa_uses_dynamic_stack bit
in the kernel descriptor is set.

Differential Revision: https://reviews.llvm.org/D128346
2022-09-29 09:52:45 +05:30
gonglingqin dc3c5a78f2 [LoongArch] Add fp_to_sint support for soft floating point
Differential Revision: https://reviews.llvm.org/D134692
2022-09-29 10:25:35 +08:00
WANG Xuerui 3155f6c508 [LoongArch] Expand llvm.stacksave and llvm.stackrestore
As in commit bfb00d4c1c ("[RISCV] Allow lowering of dynamic_stackalloc, stacksave, stackrestore").

Differential Revision: https://reviews.llvm.org/D134435
2022-09-29 09:07:44 +08:00
wanglei 036b170c24 [LoongArch] Produce a R_LARCH_32_PCREL relocation
LoongArchELFObjectWriter::getRelocType check IsPCRel for FK_Data_4
(which we produce a R_LARCH_32_PCREL relocation for if IsPCRel).

R_LARCH_32_PCREL is required for FDE relocation.

Differential Revision: https://reviews.llvm.org/D134715
2022-09-29 09:04:44 +08:00
wanglei 7b1bdfbeb0 [LoongArch] Override TargetSubtargetInfo::getSelectionDAGInfo
The target selection DAG lowering information is needed for
SelectionDAGBuilder to lower a call like memcmp into an optimized
form.

Differential Revision: https://reviews.llvm.org/D134712
2022-09-29 08:46:53 +08:00
Jessica Paquette 95dabac7a5 [AArch64][GlobalISel] Make G_PTRTOINT only legal for s64 + p0
A few issues:

  1. There was no legalizer test for G_PTRTOINT
  2. Same clamping issue as in many other opcodes
  3. AArch64 pointers can only be 64b, so in reality we always have to trunc or
     extend with any size other than p0 anyway.

This seems to actually produce more correct selection for narrow types as well.

Differential Revision: https://reviews.llvm.org/D107588
2022-09-28 16:20:24 -07:00
Jessica Paquette a7aaafde2e [AArch64][GlobalISel] Implement custom legalization for s32/s64 G_FCOPYSIGN
This is intended to be equivalent to the s32 + s64 cases in
AArch64TargetLowering::LowerFCOPYSIGN.

Widen everything and then use G_BIT + a mask to handle the actual copysign
operation. Then, narrow back down to s32/s64.

I wasn't sure about what the best/most canonical INSERT_SUBREG-selectable
pattern is. I chose G_INSERT_VECTOR_ELT + an undef vector because it produces
reasonably okay codegen. (It doesn't produce INSERT_SUBREG right now though.)
If there's a better way to do this then I'm happy to change it.

We also have a couple codegen deficiencies with how we emit vector constants
right now. (We need a GISel equivalent to the tryAdvSIMDModImm64 stuff)

Differential Revision: https://reviews.llvm.org/D108725
2022-09-28 16:03:22 -07:00
Florian Mayer 0401dc2913 [MTE] [HWASan] unify isInterestingAlloca
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D134779
2022-09-28 15:52:34 -07:00
Jessica Paquette 4957ee6529 [AArch64][GlobalISel] Add a target-specific G_BIT opcode.
This is necessary for custom-legalizing G_FCOPYSIGN.

This is equivalent to the BIT instruction (bitwise insert if true).

Add selection testcases for imported patterns.

Differential Revision: https://reviews.llvm.org/D108714
2022-09-28 15:48:35 -07:00
Xiang Li 26129766df [DirectX backend] Support global ctor for DXILBitcodeWriter.
1. Save typed pointer type for GlobalVariable/Function instead of the ObjectType.
   This will allow use GlobalVariable/Function as value.
2. Save target type for global ctors for Constant.
3. In DXILBitcodeWriter::getTypeID, check PointerMap first for Constant case.

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D133283
2022-09-28 13:23:56 -07:00
Stanislav Mekhanoshin 5a3fe9a039 [AMDGPU] Move SIModeRegisterDefaults to SI MFI
It does not belong to a general AMDGPU MFI.

Differential Revision: https://reviews.llvm.org/D134666
2022-09-28 13:13:40 -07:00
Nico Weber 90f7f24b20 try to fix build yet more after 16544cbe64 2022-09-28 15:40:52 -04:00
Baptiste b556726ccc [AMDGPU] Avoid flushing the vmcnt counter in loop preheaders if not necessary
One of the conditions to flush the vmcnt counter in loop preheaders is: The loop
contains a use of a vgpr that is defined out of the loop. The code currently
checks if a waitcnt is needed by looking at the score of that vgpr in the score
brackets. This is not enough and may cause the generation of an unnecessary
vmcnt flush. This patch fixes that case.

Differential Revision: https://reviews.llvm.org/D130313
2022-09-28 13:05:50 -04:00
Jon Chesterfield 35f2584ef9 [amdgpu] Error, instead of miscompile, anonymous kernels using lds
The association between kernel and struct is done by symbol name.
This doesn't work robustly for anonymous kernels as shown by the modified
test case.

An alternative association between function and struct can be constructed
if necessary, probably though metadata, but on the basis that we currently
miscompile anonymous kernels and that they are difficult to construct from
application code and difficult to call from the runtime, this patch makes
it a fatal error for now.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134741
2022-09-28 16:30:04 +01:00
Matt Arsenault 7a84624079 AMDGPU: Make various vector undefs legal
Surprisingly these were getting legalized to something
zero initialized.

This fixes an infinite loop when combining some vector types.
Also fixes zero initializing some undef values.

SimplifyDemandedVectorElts / SimplifyDemandedBits are not checking
for the legality of the output undefs they are replacing unused
operations with. This resulted in turning vectors into undefs
that were later re-legalized back into zero vectors.
2022-09-28 10:48:52 -04:00
Matt Devereau 0a4771a7e8 [AArch64][SVE] Expand gather index to 32 bits instead of 64 bits
For gathers which load in 8 and 16 bit data then use that data
as an index, the index can be extended to 32 bits instead of
64 bits

Differential Revision: https://reviews.llvm.org/D130692
2022-09-28 14:42:12 +00:00
Florian Hahn eba84971ae
Revert "[AARCH64][CostModel] Modified the cost of mask vector load/store"
This reverts commit 1c62af3e23.

The commit causes the test below to fail. Revert for now to get the bots
back to green.

Failing test:
lvm/test/Transforms/LoopVectorize/AArch64/masked-op-cost.ll
2022-09-28 15:35:13 +01:00
Florian Hahn 2d3c260362
[AArch64] break non-temporal loads over 256 into 256-loads and a smaller load
Currently over 256 non-temporal loads are broken inefficently. For example, `v17i32` gets broken into 2 128-bit loads. It is better if we can use
256-bit loads instead.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D133421
2022-09-28 15:20:26 +01:00
Jon Chesterfield 80ba432821 [amdgpu][nfc] Allocate kernel-specific LDS struct deterministically
A kernel may have an associated struct for laying out LDS variables.
This patch puts that instance, if present, at a deterministic address by
allocating it at the same time as the module scope instance.

This is relatively likely to be where the instance was allocated anyway (~NFC)
but will allow later patches to calculate where a given field can be found,
which means a function which is only reachable from a single kernel will be
able to access a LDS variable with zero overhead. That will be particularly
helpful for applications that instantiate a function template containing LDS
variables once per kernel.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127052
2022-09-28 14:55:16 +01:00
Archibald Elliott ff4027d152 [ARM] Support fp16/bf16 using t constraint
fp16 and bf16 values can be used in GCC's inline assembly using the "t"
constraint, which means "VFP floating-point registers s0-s31" - fp16 and
bf16 values are stored in S registers too.

This change ensures that LLVM is compatible with GCC for programs that
use fp16 and the 't' constraint.

Fixes #57753

Differential Revision: https://reviews.llvm.org/D134553
2022-09-28 14:48:21 +01:00
liqinweng 1c62af3e23 [AARCH64][CostModel] Modified the cost of mask vector load/store
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D134413
2022-09-28 19:40:29 +08:00
Carl Ritson 266b5dbc5d [AMDGPU] Add MIMG NSA threshold configuration attribute
Make MIMG NSA minimum addresses threshold an attribute that can
be set on a function or configured via command line.
This enables frontend tuning which allows increased NSA usage
where beneficial.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D134780
2022-09-28 20:03:18 +09:00
Simon Pilgrim 759bedade5 Fix MSVC "not all control paths return a value" warning. NFCI. 2022-09-28 10:56:37 +01:00
wanglei 983a0ae5cf [LoongArch] Specify registers used in DWARF exception handling
Defines LoongArch registers for getExceptionPointerRegister() and
getExceptionSelectorRegister().

Differential Revision: https://reviews.llvm.org/D134709
2022-09-28 17:53:16 +08:00
Cullen Rhodes 3918ef07c4 [AArch64][SVE] Remove redundant ptest after match/nmatch
These instructions are flag setting so the ptest is redundant, the
TableGen class wasn't setting the element size for the predicate causing
the checks in AArch64InstrInfo::optimizePTestInstr to fail.
2022-09-28 08:23:23 +00:00
eopXD 9677d70eb2 [VP][RISCV] Add vp.floor, vp.round, vp.roundeven and their RISC-V support
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134759
2022-09-27 19:45:58 -07:00
gonglingqin 95d2367647 [LoongArch] Expand FSIN/FCOS/FSINCOS/FPOW/FREM
Differential Revision: https://reviews.llvm.org/D134628
2022-09-28 09:42:41 +08:00
Florian Mayer 979db5343f [HWASan] [NFC] use auto* over auto& for pointers 2022-09-27 18:19:25 -07:00
Han-Kuan Chen c595c874cb [RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D133688
2022-09-27 17:25:34 -07:00
Philip Reames f6d110e26f [LAA] Make getPtrStride return Option instead of overloading zero as error value [nfc]
This is purely NFC restructure in advance of a change which actually exposes zero strides.  This is mostly because I find this interface confusing each time I look at it.
2022-09-27 15:55:44 -07:00
Quentin Colombet ce35e8b426 [RISCV][ISel] Remove the commutative flag on SUB
I wasn't able to produce a testcase for that because right now VWSUB is
only generated from VWSUB_W and from there to trigger the commutative
bug we would need to grab VWSUB where the splat value is on the LHS,
which is currently not matched.

Differential Revision: https://reviews.llvm.org/D134701
2022-09-27 20:15:01 +00:00
Philip Reames b54c571a01 [RISCV] Extend strided load/store pattern matching to non-loop cases
The motivation here is to enable a change I'm exploring in vectorizer to prefer base + offset_vector addressing for scatter/gather. The form the vectorizer would end up emitting would be a gep whose vector operand is an add of the scalar IV (splated) and the index vector. This change makes sure we can recognize that pattern as well as the current code structure. As a side effect, it might improve scatter/gathers from other sources.

Differential Revision: https://reviews.llvm.org/D134755
2022-09-27 12:56:58 -07:00
eopXD 163cb33854 [VP][RISCV] Add vp.ceil and RISC-V support
Previous commit 8b00b24f85 missed to add `int_ceil` anchor for the
llvm.ceil.* section under LangRef.rst

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134586
2022-09-27 12:04:09 -07:00
eopXD 384b8b3da7 Revert "[VP][RISCV] Add vp.ceil and RISC-V support"
This reverts commit 8b00b24f85.
2022-09-27 11:12:57 -07:00
eopXD 8b00b24f85 [VP][RISCV] Add vp.ceil and RISC-V support
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134586
2022-09-27 11:08:27 -07:00
Krzysztof Parzyszek 7da2b91887 [Hexagon] Unify getSizeOfs in HexagonVectorCombine, NFC 2022-09-27 10:51:52 -07:00
Krzysztof Parzyszek 9c9e877b7e [Hexagon] Move function to a different class, NFC
"Sector" is a concept from AlignVectors, so the check for it
should be there.
2022-09-27 10:32:52 -07:00
Stefan Gränitz ed8409dfa0 [ObjC][ARC] Fix target register for call expanded from CALL_RVMARKER on Windows
Fix regression https://github.com/llvm/llvm-project/issues/56952 for Clang CodeGen on Windows. In the Windows ABI the instruction sequence that is expanded from CALL_RVMARKER should use RCX as target register and not RDI.

Reviewed By: rnk, fhahn

Differential Revision: https://reviews.llvm.org/D134441
2022-09-27 18:49:40 +02:00
David Green 401481daac [AArch64] Remove incorrect zero element insert-bitcast patterns
These two patterns are not working as intended, as shown in D134022.
They need to insert the value into the new register, not override it.
2022-09-27 17:08:17 +01:00
Philip Reames 77b202f974 [RISCV] Rename getVectorImmCost to getStoreImmCost [nfc]
My original intent had been to reuse this for arithmetic instructions as well, but due to the availability of a immediate splat encoding there, we will need different heuristics.  So specialize the existing code for the store case.
2022-09-27 08:22:13 -07:00
wanglei 823ce6ad18 [LoongArch] Add some comments for expand pseudo-inst pass. NFC
Differential Revision: https://reviews.llvm.org/D134708
2022-09-27 20:26:07 +08:00
Kazushi (Jam) Marukawa de8013201f [VE] Change to expand FPOW
VE doesn't have FPOW instruction, so this patch makes llvm expand it.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D134695
2022-09-27 20:03:10 +09:00
WANG Xuerui c2a44b591e [LoongArch] Support lowering frames larger than 2048 bytes
Differential Revision: https://reviews.llvm.org/D134582
2022-09-27 18:58:33 +08:00
David Sherwood fbb119412f [AArch64] Add Neoverse V2 CPU support
Adds support for the Neoverse V2 CPU to the AArch64 backend.

Differential Revision: https://reviews.llvm.org/D134352
2022-09-27 07:56:08 +00:00
Paulo Matos 1bd1a44070 [WebAssembly] Use intrinsics for table.get/set instructions
Initial table.get/set implementation would match and lower combinations
of GEP+load/store to table.get/set instructions. However, this is error
prone due to potential combinations of GEP+load/store we don't implement,
and load/store optimizations. By changing the code to using intrinsics, we
 avoid both issues and simplify the code.

New builtins implemented:
* @llvm.wasm.table.get.externref
* @llvm.wasm.table.get.funcref
* @llvm.wasm.table.set.externref
* @llvm.wasm.table.set.funcref

Reviewed By: asb, tlively

Differential Revision: https://reviews.llvm.org/D134436
2022-09-27 09:16:30 +02:00
Yeting Kuo 04e1301f3d [VP][RISCV] Add vp.maxnum and vp.minnum intrinsics and RISC-V support.
Add vp.maxnum and vp.minnum which are vector predicted intrinsics of llvm.maxnum
and llvm.minnum.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134639
2022-09-27 13:36:45 +08:00
Vitaly Buka 20a80d60a8 Revert "[AMDGPU] Move SIModeRegisterDefaults to SI MFI"
Break msan bots. Details in D134666.

This reverts commit 0ce96e06ee.
2022-09-26 22:22:09 -07:00
Paul Scoropan ce004fb4f2 [PowerPC] XCOFF exception section support on the direct assembler path
This feature implements support for making entries in the exception section
on XCOFF on the direct assembly path using the ".except" pseudo-op. It also
provides functionality to lower entries (comprised of language and reason
codes) into the exception section through the use of annotation metadata
attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler
support will be provided in another review. https://reviews.llvm.org/D133030
needs to merge first for LIT tests

Reviewed By: shchenz, RKSimon

Differential Revision: https://reviews.llvm.org/D132146
2022-09-26 22:24:20 -04:00
Stanislav Mekhanoshin 0ce96e06ee [AMDGPU] Move SIModeRegisterDefaults to SI MFI
It does not belong to a general AMDGPU MFI.

Differential Revision: https://reviews.llvm.org/D134666
2022-09-26 13:20:24 -07:00
Krzysztof Parzyszek dfaf7a2846 [Hexagon] Avoid some unnecessary sign-extend instructions
Simplify (sext_inreg (extractu ...)) -> (extract ...) where appropriate.
2022-09-26 12:30:18 -07:00
Krzysztof Parzyszek d6c0a5be7f [Hexagon] Make sure we can still shift scalar vectors by non-splats 2022-09-26 11:25:06 -07:00
Changpeng Fang dee4bc4a4e AMDGPU: Handle new address pattern in LowerKernelAttributes introduced by opaque pointers
Summary:
  With opaque pointer support, the "ptr" type is introduced and thus BitCast is not necessary in some cases.
This work takes care of this change, and recognizes the new address patterns to do appropriate optimizations.

Reviewers:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D134596
2022-09-26 09:31:52 -07:00
Kazushi (Jam) Marukawa 1cef30b9d3 [VE] Disable automatic maxnum/minnum selection
Disable FMAX/FMIN selection from select_cc in VEInstrInfo.td because of
the lack of NaN consideration.  This patch removes such selection from
VEInstrInfo.td and lets llvm work on it in combineMinNumMaxNum.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D134595
2022-09-26 22:04:02 +09:00
Kazushi (Jam) Marukawa 76c76e9ab4 [VE] Support smax/smin
Support smax/smin in VEInstrInfo.td.  Remove obsolete patterns for
smax/smin.  Add regression tests for smax/smin/umax/umin.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D134583
2022-09-26 22:02:57 +09:00
Momchil Velikov 6602110152 [ARM] Enable and/cmp0 folding
The `CodeGenPrepare` pass can sink bitwise `and` used by compare to
zero into the basic blocks where the users are. This operation is
guarded by lowering hook, which is disabled for ARM.  In the ARM
architecture versions from v7-M up these two operations can be folded
into `tst rN, #imm` instruction. Sinking of `and` can also enable
the cmov-to-bfi DAG combiner.

This patch fixes some benchmark regressions caused
by https://reviews.llvm.org/D129370 as well scoring slightly better overall.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D134360
2022-09-26 11:31:23 +01:00
David Green bebc96956b [AArch64] Enable FeatureFuseAdrpAdd for all Arm cpus
The commit D120104 enabled FeatureFuseAdrpAdd for -mcpu=generic,
allowing the linker to relax adrp;add pairs where possible. D132075
extended that to neoverse-n1, this patch extends it to all other cortex
and neoverse cpus for the same reasons.

Differential Revision: https://reviews.llvm.org/D134521
2022-09-26 09:55:10 +01:00
gonglingqin a6d699b55d [LoongArch] Add codegen support for strict_fsetccs and any_fsetcc
Differential Revision: https://reviews.llvm.org/D134274
2022-09-26 13:05:36 +08:00
wanglei 75265c7f49 [LoongArch] Lower BlockAddress/JumpTable
This patch uses a unified interface for lower GlobalAddress ConstantPool
BlockAddress and JumpTable.

This patch allows lowering addresses by using PC-relative addressing
for DSO-local symbols, and accessing the address through the global
offset table for DSO-preemptable symbols.

Remove hardcoded `MininumJumpTableEntries` for test lower JumpTable.

Also updated some test cases using ConstantPool, due to the addition of
relocation information.

Differential Revision: https://reviews.llvm.org/D134431
2022-09-26 10:52:54 +08:00
Yeting Kuo 43c5fbdd3a [VP][RISCV] Add vp.sqrt intrinsic and RISC-V support.
The patch modeled vp.fabs patch D132793.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D133690
2022-09-26 10:47:40 +08:00
WANG Xuerui ad6fe32032 [LoongArch] Support 'generic' as a valid CPU name
As the LoongArch port is largely modeled after RISCV it has the same
behavior of not accepting `generic` as a CPU name. For better
compatibility with consumers of LLVM (e.g. mesa) follow D121149's suit
and treat `generic` the same as an empty CPU name.

Differential Revision: https://reviews.llvm.org/D134412
2022-09-26 10:20:13 +08:00
Ruiling Song bf25a48985 Add override for runOnFunction()
Fix build-bot failure.
2022-09-26 10:19:35 +08:00
WANG Xuerui d2ac89b64e [LoongArch] Support fastcc and treat it as ccc
As explained in D68559 the `fastcc` calling convention may be requested
under certain conditions, hence the need for supporting it. But unlike
RISCV we actually treat it exactly like ccc, without actually inventing
any performance hack right here. And CSKY does the same thing.

This is going to fix a few more test cases with native LoongArch builds.

Differential Revision: https://reviews.llvm.org/D134443
2022-09-26 10:15:00 +08:00
WANG Xuerui f89f0990db [LoongArch] Support llvm.thread.pointer
For `__builtin_thread_pointer` to work, among other things.

Similar to D76828 for RISCV.

Differential Revision: https://reviews.llvm.org/D134368
2022-09-26 09:56:42 +08:00
Ruiling Song cf14c7caac AMDGPU: Add a pass to rewrite certain undef in PHI
For the pattern of IR (%if terminates with a divergent branch.),
divergence analysis will report %phi as uniform to help optimal code
generation.
```
  %if
  | \
  | %then
  | /
  %endif: %phi = phi [ %uniform, %if ], [ %undef, %then ]
```
In the backend, %phi and %uniform will be assigned a scalar register.
But the %undef from %then will make the scalar register dead in %then.
This will likely cause the register being over-written in %then. To fix
the issue, we will rewrite %undef as %uniform. For details, please refer
the comment in AMDGPURewriteUndefForPHI.cpp. Currently there is no test
changes shown, but this is mandatory for later changes.

Reviewed by: sameerds

Differential Revision: https://reviews.llvm.org/D133840
2022-09-26 09:54:47 +08:00
Weining Lu 394f30919a [Clang][LoongArch] Add inline asm support for constraints f/l/I/K
This patch adds support for constraints `f`, `l`, `I`, `K` according
to [1]. The remain constraints (`k`, `m`, `ZB`, `ZC`) will be added
later as they are a little more complex than the others.
f: A floating-point register (if available).
l: A signed 16-bit constant.
I: A signed 12-bit constant (for arithmetic instructions).
K: An unsigned 12-bit constant (for logic instructions).

For now, no need to support register alias (e.g. `$a0`) in llvm as
clang will correctly decode the usage of register name aliases into
their official names. And AFAIK, the not yet upstreamed `rustc` for
LoongArch will always use official register names (e.g. `$r4`).

[1] https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html

Differential Revision: https://reviews.llvm.org/D134157
2022-09-26 08:49:58 +08:00
James Y Knight 4f188ef89c [AVR] Fix useDeprecatedPositionallyEncodedOperands errors.
This is a follow-on to https://reviews.llvm.org/D134073.

It renames a few fields to have consistent names, as well as renaming
operands to match the field names.

The encoder behavior is unchanged by this cleanup, but a few
instructions were previously being disassembled incorrectly, and have
been corrected by this change. All of the affected instructions were
missing disassembly tests, which are now added.

Differential Revision: https://reviews.llvm.org/D134185
2022-09-25 17:55:09 -04:00
James Y Knight a8c59bcc01 [AMDGPU] Fix useDeprecatedPositionallyEncodedOperands errors in R600.
This is a follow-on to https://reviews.llvm.org/D134073.

It renames a couple of fields to match their operands, as well as
introducing sub-operand names where required.

This change _only_ fixes the 'R600' half of the target, not the
'AMDGPU' half. Fixing the AMDGPU half will be a significantly more
difficult change (which I've not yet attempted.)

Differential Revision: https://reviews.llvm.org/D134078
2022-09-25 17:55:09 -04:00
James Y Knight 0f99958e79 [Lanai] Fix useDeprecatedPositionallyEncodedOperands errors.
This is a follow-on to https://reviews.llvm.org/D134073.

Lanai was almost clean: the only issue is that 'bit' behaves
differently than 'bits<1>', because only the 'bits' type preserves
unresolved references via 'keepUnsetBits()' in
TableGen/Record.h. Thus, use bits instead.

This issue _would_ have caused invalid instruction emission/decoding,
except that the PQ bits were being overriden after the fact by code in
'adjustPqBits' in MCTargetDesc/LanaiMCCodeEmitter.cpp, and
'PostOperandDecodeAdjust' in Disassembler/LanaiDisassembler.cpp.

Differential Revision: https://reviews.llvm.org/D134075
2022-09-25 17:55:09 -04:00
Simon Pilgrim 196f27bb56 [CostModel][X86] Add missing cost kinds for v2i64 icmp on SLM 2022-09-25 15:12:21 +01:00
Simon Pilgrim faff990e9b [X86] Fix Icelake VPMULLQ zmm pipes and adjust AVX512DQ v8i64 mul costs to match worse case
Icelake PMULLQ throughput regressed cf SkylakeServer as its Pipe0 only

Confirmed with Intel SOM, Agner and instlatx64
2022-09-25 14:18:08 +01:00
Petar Avramovic dcc756d03e [AMDGPU] Pattern for flat atomic fadd f64 intrinsic with local addr
Fix regression from clang opencl test in builtins-fp-atomics-gfx90a.cl
test_flat_add_local_f64 caused by D130579
Revert a3becb333d.

Differential Revision: https://reviews.llvm.org/D134568
2022-09-25 13:25:41 +02:00
Philip Reames 5358968e13 [RISCV] Pattern match scalable strided load/store
Very straight forward extension of the existing pattern matching pass to handle scalable types as well as fixed length types. The only extra bit beyond removing a bailout is recognizing stepvector.

Differential Revision: https://reviews.llvm.org/D134502
2022-09-24 17:41:58 -07:00
Philip Reames 6e7c54ecaf [RISCV] Add lowering for scalable @llvm.riscv.masked.strided.load/store
The code previously assumed fixed length vectors; make the relevant code conditional.

Having the lowering in place is neccessary for an upcoming change to generalize scatter/gather matching to scalable vectors.

Differential Revision: https://reviews.llvm.org/D134489
2022-09-24 17:41:57 -07:00
James Y Knight 5351878ba1 [TableGen] Add useDeprecatedPositionallyEncodedOperands option.
Summary:
The existing undefined-bitfield-to-operand matching behavior is very
hard to understand, due to the combination of positional and named
matching. This can make it difficult to track down a bug in a target's
instruction definitions.

Over the last decade, folks have tried to work-around this in various
ways, but it's time to finally ditch the positional matching. With
https://reviews.llvm.org/D131003, there are no longer cases that
_require_ positional matching, and it's time to start removing usage
and support for it.

Therefore: add a (default-false) option, and set it to true only in
those targets that require positional matching today. Subsequent
changes will start cleaning up additional in-tree targets.

NOTE TO OUT OF TREE TARGET MAINTAINERS:

If this change breaks your build, you may restore the previous
behavior simply by adding:
  let useDeprecatedPositionallyEncodedOperands = 1;
to your target's InstrInfo tablegen definition. However, this is
temporary -- the option will be removed in the future.

If your target does not set 'decodePositionallyEncodedOperands', you
may thus start migrating to named operands. However, if you _do_
currently set that option, I recommend waiting until a subsequent
change lands, which adds decoder support for named sub-operands.

Differential Revision: https://reviews.llvm.org/D134073
2022-09-24 09:40:45 -04:00
James Y Knight a538d1f13a [TableGen][CodeEmitterGen] Allow local names for sub-operands in a operand list.
These names can then be matched by name against 'bits' fields in a
record, to populate an instruction's encoding.

This does _not_ yet change DecoderEmitter to allow by-name matching of
sub-operands. Unlike the encoder, the decoder already defaulted to not
supporting positional matching, and backends had workarounds in place
for the missing decoding support.

Additionally, use this new capability to allow the ARM and AArch64
backends not to require any positional operand matching.

Differential Revision: https://reviews.llvm.org/D131003
2022-09-24 09:40:44 -04:00
Craig Topper 4f86c5cbb7 [RISCV] Rename RISCVScheduleB.td to RISCVScheduleZb.td. NFC 2022-09-23 21:38:42 -07:00
Craig Topper 3967abcc0b [RISCV] Add missing scheduler classes to Zbkb and Zbkx instructions. 2022-09-23 21:38:42 -07:00
Craig Topper cde3de5381 [RISCV] Remove a few remnants of Zbr I misssed. 2022-09-23 21:21:51 -07:00
Craig Topper 19850cc2d8 Revert "[RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type."
This reverts commit dd53a0bb30.

We have seen crashes from this internally. Probably due to the use
of RoundingMode::Dynamic.
2022-09-23 18:41:41 -07:00
Craig Topper 90a5d8499a [RISCV] Promote f16 STRICT_FCEIL/FLOOR/TRUNC/NEARBYINT/RINT/ROUND,ROUNDEVEN to f32. 2022-09-23 14:01:51 -07:00
Jay Foad ddfa0f62d8 [AMDGPU] Add GFX11 feature for subtargets with more VGPRs
The full complement of physical VGPRs for GFX11 is 50% more than GFX10.
Some subtargets have this, others stay the same as GFX10. This affects
occupancy calculations.

Differential Revision: https://reviews.llvm.org/D134522
2022-09-23 20:18:23 +01:00
Josh Stone cb46ffdbf4 [X86] Use BuildStackAdjustment in stack probes
This has the advantage of dealing with live EFLAGS, using LEA instead of
SUB if needed to avoid clobbering. That also respects feature "lea-sp".

We could allow unrolled stack probing from blocks with live-EFLAGS, if
canUseAsEpilogue learns when emitStackProbeInlineGeneric will be used.

Differential Revision: https://reviews.llvm.org/D134495
2022-09-23 09:30:32 -07:00
Josh Stone 26c37b461a [X86] Don't allow prologue stack probing with live EFLAGS
Fixes https://github.com/llvm/llvm-project/issues/49509

Differential Revision: https://reviews.llvm.org/D134494
2022-09-23 09:30:32 -07:00
Josh Stone 4dcfb09e40 [NFC][CodeGen] Use const MF in TargetLowering stack probe functions
This makes them callable from places like canUseAsPrologue.

Differential Revision: https://reviews.llvm.org/D134492
2022-09-23 09:30:32 -07:00
Petar Avramovic 6db7921b65 AMDGPU: Use tablegen patterns for buffer global and flat atomic fadd
Remove manual selection for atomic fadd from global-isel.
Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD
which corresponds to llvm-ir's atomicrmw fadd instruction.

global and flat atomic fadd patterns changes:
Split rtn/no-rtn patterns
Add missing patterns or fix predicates
Remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors).
Patterns now check addrspace of pointer, added patterns for flat intrinsic.
with global addrspace pointer that selects into global atomic instruction.

buffer atomic fadd patterns changes:
Rdit patterns to import into global-isel.
Remove gfx6/gfx7 _addr64 and _offset patterns.
Remove patterns that can't be reached (same pattern but different feature).

Differential Revision: https://reviews.llvm.org/D130579
2022-09-23 17:52:10 +02:00
Petar Avramovic 5cee9047d5 AMDGPU: Improve atomicrmw fadd selection
Use same atomicrmw fadd expansion rules for gfx908, gfx940 and gfx11
as for gfx90a. Add missing globalisel legalizer support for flat
atomicrmw fadd f32 on gfx940 and gfx11.
Isel support for gfx11 will be added in D130579.

Differential Revision: https://reviews.llvm.org/D131560
2022-09-23 17:52:10 +02:00
Petar Avramovic e03d36d4ae [AMDGPU] Add FeatureFlatAtomicFaddF32Inst
Feature used by targets that have flat_atomic_add_f32 instruction
(gfx940 and gfx11). Remove isGFX940GFX11Plus.
Add hasFlatAtomicFaddF32Inst Subtarget check for codegen.

Differential Revision: https://reviews.llvm.org/D134532
2022-09-23 17:52:10 +02:00
Simon Pilgrim a6e9141505 [TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436)
The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both.

This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling.

Fixes #50778

Differential Revision: https://reviews.llvm.org/D111968
2022-09-23 14:03:18 +01:00
Hassnaa Hamdi 181f200a1c [NFC]: AArch64-SVE
modify some comments
2022-09-23 12:07:31 +00:00
Caroline Concatto 5431bf27bd [AArch64]Remove svget/svset/svcreate from llvm
This patch removes the aarch64 instrinsic svget/svset/svcreate from llvm.
It also implements the InstCombine for vector.extract that used to be in svget.

Depends on: D131547

Differential Revision: https://reviews.llvm.org/D131548
2022-09-23 10:48:43 +01:00
gonglingqin ac295597a8 [LoongArch] Add codegen support for atomicrmw add/sub/nand/and/or/xor operation
Differential Revision: https://reviews.llvm.org/D133755
2022-09-23 09:32:11 +08:00
Philip Reames 60c91fd364 [RISCV] Disallow scale for scatter/gather
RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely.

This has two effects:
* First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to.
* Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.)

The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics.

Differential Revision: https://reviews.llvm.org/D134382
2022-09-22 15:31:26 -07:00
Craig Topper 52708be182 [RISCV] Remove support for the unratified Zbe, Zbf, and Zbm extensions.
These extensions do not appear to be on their way to ratification.
2022-09-22 13:04:41 -07:00
Simon Pilgrim 98907f8685 [CostModel][X86] Tidyup sdiv/srem/udiv/urem by constant cost tables
Preparation for adding cost kinds handling

This is necessary to eventually unblock D111968
2022-09-22 20:46:33 +01:00
Chris Bieneman 4959bfa060 [NFC] Refactor dxil metadata code
DXIL relies on a whole bunch of IR metadata constructs being populated
in the right shape. Rather than just hard coding or using complicated
arrangements of constant data strings, let's make first-class objects
that reprensent the metadata and manage reading and writing the
metadata from the module.

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D134397
2022-09-22 12:17:51 -05:00
Hassnaa Hamdi f2072e0ae0 [AArh64-SVE]: Improve cost model for div/udiv/mul 128-bit vector operations
Differential Revision: https://reviews.llvm.org/D132477
2022-09-22 16:50:55 +00:00
Simon Pilgrim dc93202b44 [CostModel][X86] Remove duplicate ashr v4i64 cost table entry. NFCI. 2022-09-22 17:27:26 +01:00
Fraser Cormack 92d71c615d [RISCV] Use structured bindings in common RVV lowering code
This patch uses structured bindings to simplify a couple of specific
cases when lowering RVV operations where we commonly declare two
SDValues and immediately 'tie' them to the mask and vector length.

There's also a couple places where we split vectors that structured
bindings make sense to use.

This patch tries to keep these sorts of changes minimal and to cases
where the returned types are commonly understood, rather than applying
this wholesale to the RISCV backend.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D134442
2022-09-22 16:38:40 +01:00
Philip Reames e41765aa4d [RISCV] Verify consistency of a couple TSFlags related to vector operands
Various bits of existing code assume the presence of one operand implies the presence of another.  Add verifier rules to catch violations.

Differential Revision: https://reviews.llvm.org/D133810
2022-09-22 08:35:17 -07:00
Craig Topper bf7c7696fe [RISCV] Improve support for vector fp_to_sint_sat/uint_sat.
The default fixed vector legalization is to unroll. The default
scalable vector legalization is to clamp in the FP domain. The
RVV vfcvt instructions have saturating behavior so we can use them
directly. The only difference is that RVV instruction turn nan into
the max value, but the _SAT intrinsics want 0.

I'm only supporting 1 step of narrowing for now. I think we can
support more steps by using VNCLIP to saturate and narrower.

The only case that needs 2 steps of widening is f16->i64 which we can
do as f16->f32->i64.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D134400
2022-09-22 08:13:48 -07:00
Craig Topper d6cb8f85bf [RISCV] Formatting fixes to RISCV.td NFC
Improve indentation. Fix the worst of the 80 column violations.
2022-09-22 08:12:59 -07:00
Tim Northover 677da09d02 AArch64: add support for newer Apple CPUs
They're roughly ARMv8.6. This works in the .td file, but in
AArch64TargetParser.def, marking them v8.6 brings in support for the SM4
cryptographic hash and we don't actually have that. So TargetParser side
they're marked as v8.5, with the extra features (BF16 and I8MM added manually).

Finally, A16 supports the HCX extension in addition to v8.6. This has no
TargetParser implications.
2022-09-22 11:58:51 +01:00