Commit Graph

12514 Commits

Author SHA1 Message Date
Aries bc5d0f6024 Avoiding generating VGPR to SGPR copy for CopyFromReg. 2023-06-15 09:30:22 +08:00
zhoujing 899ca9fd8e Add support for 12 bits immediate 2023-01-09 11:59:45 +08:00
Fangrui Song a996cc217c Remove unused #include "llvm/ADT/Optional.h" 2022-12-05 06:31:11 +00:00
Kazu Hirata 9f252e5567 [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 17:31:17 -08:00
Kazu Hirata 3c09ed006a [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 17:12:44 -08:00
Krzysztof Parzyszek ab672e9173 FPEnv: convert Optional to std::optional 2022-12-03 13:55:56 -06:00
Kazu Hirata 998960ee1f [CodeGen] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:08 -08:00
Sanjay Patel 0037e21f28 [SDAG] bail out of mergeTruncStores() if there's any other use in the chain
This fixes the miscompile in issue #58883.

The test demonstrates that we gave up on store merging in that example.

This change should be strictly safe (just adds another clause
to avoid the transform), and it does not prohibit any existing
valid optimizations based on regression tests. I want to believe
that it's also a sufficient fix (possibly overkill), but I'm not
sure how to prove that.

Differential Revision: https://reviews.llvm.org/D137791
2022-12-02 10:08:19 -05:00
tentzen db6a979ae8 Revert "[Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 2"
This reverts commit 1a949c871a.
2022-12-02 02:44:18 -08:00
tentzen 1a949c871a [Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 2
This patch is the Part-2 (BE LLVM) implementation of HW Exception handling.
Part-1 (FE Clang) was committed in 797ad70152.

This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).

Compiler options:
  For clang-cl.exe, the option is -EHa, the same as MSVC.
  For clang.exe, the extra option is -fasync-exceptions,
  plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.

NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.

The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow three rules:
  First, no exception can move in or out of _try region., i.e., no "potential faulty
    instruction can be moved across _try boundary.
  Second, the order of exceptions for instructions 'directly' under a _try must be preserved
    (not applied to those in callees).
  Finally, global states (local/global/heap variables) that can be read outside of _try region
    must be updated in memory (not just in register) before the subsequent exception occurs.

The impact to C++ code:
  Although SEH is a feature for C code, -EHa does have a profound effect on C++
  side. When a C++ function (in the same compilation unit with option -EHa ) is
  called by a SEH C function, a hardware exception occurs in C++ code can also
  be handled properly by an upstream SEH _try-handler or a C++ catch(...).
  As such, when that happens in the middle of an object's life scope, the dtor
  must be invoked the same way as C++ Synchronous Exception during unwinding process.

Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.

This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.
One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:

A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others.  If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:

Warning: jump bypasses variable with a non-trivial destructor

In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.
Implementation:
Part-1: Clang implementation (already in):
Please see commit 797ad70152).

Part-2 : LLVM implementation described below.

For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).

For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.

The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.

Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.html
https://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html

Differential Revision: https://reviews.llvm.org/D102817/new/
2022-12-01 23:44:25 -08:00
Krzysztof Parzyszek 864aaa21b4 TargetLowering: convert Optional to std::optional 2022-12-01 16:19:10 -08:00
Nicolai Hähnle 43b86bf992 AMDGPU: Remove BufferPseudoSourceValue
The use of a PSV for buffer intrinsics is misleading because it may be
misinterpreted as all buffer intrinsics accessing the same address in
memory, which is clearly not true.

Instead, build MachineMemOperands without a pointer value but with an
address space, so that address space-based alias analysis can still
work.

There is a lot of test churn because previously address space 4
(constant address space) was used as an address space for buffer
intrinsics. This doesn't make much sense and seems to have been an
accident -- see the change in
AMDGPUTargetMachine::getAddressSpaceForPseudoSourceKind.

Differential Revision: https://reviews.llvm.org/D138711
2022-11-29 22:15:11 +01:00
Matt Arsenault ee29a846c6 DAG: Fix assert when alloca has inconsistent pointer size
Take the type from the alloca, not the type to use for allocas.

Fixes issue 59250.
2022-11-29 11:48:46 -05:00
Philip Reames fc0efb7e78 [SDAG] Allow scalable vectors in ComputeNumSignBits (try 2)
I had reverted this before the holiday week because a problem was reported with a related change (D137140 - scalable vector known bits in DAG).  I had initially confused the two patches, and then decided to leave this reverted out an abundance of caution.  Now that we're through the holiday week, reapplying.

I also roled in fixes for several post commit review comments that hadn't landed with the original change.

Original commit message

This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines.

The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.

Differential Revision: https://reviews.llvm.org/D137141
2022-11-29 08:25:05 -08:00
Simon Pilgrim 30eff7f29f [DAG] Attempt to replace a mul node with an existing umul_lohi/smul_lohi node (PR59217)
As discussed on Issue #59217, under certain circumstances the DAG can generate duplicate MUL and MUL_LOHI nodes, often during MULO legalization.

This patch attempts to replace MUL nodes with additional uses of the LO result from the MUL_LOHI node

Differential Revision: https://reviews.llvm.org/D138790
2022-11-29 12:51:30 +00:00
Janek van Oirschot 322966f8f8 [AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp
class support and introduce GlobalISel implementation for AMDGPU

Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic
for llvm.is.fpclass
2022-11-28 16:00:36 -05:00
Kazu Hirata a5ef7bb5c1 [SelectionDAG] Use std::optional in SelectionDAGISel.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26 15:07:23 -08:00
Kazu Hirata 01e998e752 [SelectionDAG] Use std::optional in SelectionDAGBuilder.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26 15:05:06 -08:00
Kazu Hirata d82f7fbfce [SelectionDAG] Use std::optional in FastISel.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26 15:02:45 -08:00
Kazu Hirata dd698b7777 [SelectionDAG] Use std::optional in DAGCombiner.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26 15:00:23 -08:00
Benjamin Maxwell 79b5829a15 [TargetLowering][AArch64] Teach DemandedBits about SVE count intrinsics
This allows DemandedBits to see that the SVE count intrinsics (CNTB,
CNTH, CNTW, CNTD) sans multiplier will only ever produce small
positive integers. The maximum value you could get here is 256, which
is CNTB on a machine with a 2048bit vector size (the maximum for SVE).

Using this various redundant operations (zexts, sexts, ands, ors, etc)
can be eliminated.

Differential Revision: https://reviews.llvm.org/D138424
2022-11-25 10:15:14 +00:00
Phoebe Wang 2e5366ac2e [NFC] Change `dyn_cast` to `cast` to make sure no dereference on nullptr 2022-11-25 17:40:37 +08:00
chenglin.bi cdb7b804f6 [DAGCombiner] fold or (xor x, y),? patterns
or (xor x, y), x --> or x, y
or (xor x, y), y --> or x, y
or (xor x, y), (and x, y) --> or x, y
or (xor x, y), (or x, y) --> or x, y

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D138401
2022-11-23 09:28:10 +08:00
Simon Pilgrim 629f17c516 [DAG] isGuaranteedNotToBeUndefOrPoison - handle FrameIndex/TargetFrameIndex
Fixes #58904
2022-11-22 18:16:15 +00:00
Han-Kuan Chen caa9f63022 [CodeGen] Refactor visitSCALAR_TO_VECTOR. NFC.
Differential Revision: https://reviews.llvm.org/D137688
2022-11-22 01:29:04 -08:00
Phoebe Wang b39b76f2ef [X86] Allow no X87 on 32-bit
This patch is an alternative of D100091. It solved the problems in `f80` type lowering.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D137946
2022-11-22 10:47:47 +08:00
chenglin.bi ac1b999e85 [DAGCombiner] fold or (and x, y), x --> x
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D138398
2022-11-21 22:11:12 +08:00
Kazu Hirata 6ccf1d23d7 [SelectionDAG] Teach getRegistersForValue to return std::optional (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716/11
2022-11-19 15:00:19 -08:00
Philip Reames 6705d94005 Revert "[SDAG] Allow scalable vectors in ComputeKnownBits"
This reverts commit bc0fea0d55.

There was a "timeout for a Halide Hexagon test" reported.  Revert until investigation complete.
2022-11-18 15:29:14 -08:00
Philip Reames 102f05bd34 Revert "[SDAG] Allow scalable vectors in ComputeNumSignBits" and follow up
This reverts commits 3fb08d14a6 and f8c63a7fbf.

There was a "timeout for a Halide Hexagon test" reported.  Revert until investigation complete.
2022-11-18 15:25:59 -08:00
Philip Reames 3fb08d14a6 [SDAG] Address post commit review feedback from f8c63a7f
The major change is falling through to ComputeKnownBits when we don't have an implementation of ComputeNumSignBits due to conservatism over scalable vectors.  Right now, we're mostly conservative in the same cases, but this allows our results to improve when we change ComputeKnownBits without also needing to improve ComputeNumSignBits at the same time.
2022-11-18 12:30:10 -08:00
Philip Reames f8c63a7fbf [SDAG] Allow scalable vectors in ComputeNumSignBits
This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines.

The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.

Differential Revision: https://reviews.llvm.org/D137141
2022-11-18 10:50:06 -08:00
Philip Reames bc0fea0d55 [SDAG] Allow scalable vectors in ComputeKnownBits
his is the SelectionDAG equivalent of D136470, and is thus an alternate patch to D128159.

The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.

This patch also includes an implementation for SPLAT_VECTOR as without it, the lane wise reasoning has no base case. The original patch which inspired this (D128159), also included STEP_VECTOR. I plan to do that as a separate patch.

Differential Revision: https://reviews.llvm.org/D137140
2022-11-18 07:40:32 -08:00
Benjamin Maxwell 34d88cf6cf [DAG] Allow folding AND of anyext masked_load with >1 user to zext version
This now allows folding an AND of a anyext masked_load to a
zext_masked_load even if the masked load has multiple users.  Doing is
eliminates some redundant ANDs/MOVs for certain AArch64 SVE code.

I'm not sure if there's any cases where doing this could negatively the
other users of the masked_load.  Looking at other optimizations of
masked loads, most don't apply if the load is used more than once, so it
doesn't look like this would interfere.

Reviewed By: c-rhodes

Differential Revision: https://reviews.llvm.org/D137844
2022-11-18 10:38:09 +00:00
YingChi Long 7a715bf317
[VP] Add support for vp.inttoptr & vp.ptrtoint
Add vp.inttoptr & vp.ptrtoint support by lowering them into
vp.zext / vp.truncate with in SelectionDAGBuilder.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137169
2022-11-18 10:42:24 +08:00
Stanislav Mekhanoshin bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Philip Reames 4105794e66 [SDAG] Assert we don't see scalable VECTOR_SHUFFLES
It was pointed out in review of D137140 that this case should be impossible.  This patch converts an existing bailout into an assert instead.
2022-11-17 08:18:51 -08:00
zhongyunde 8fbb6f8678 [NFC] Fix typo in comment
Address comment in https://reviews.llvm.org/D137936

Differential Revision: https://reviews.llvm.org/D138124
2022-11-16 23:35:53 +08:00
Simon Pilgrim a92f5a08a1 [DAG] simplifySelect - add support for vselect(0, T, F) -> F fold
We still need to add handling for the non-zero T fold (which requires getBooleanContents handling)
2022-11-16 13:11:14 +00:00
OCHyams a1ac6efcb0 [NFC][SelectionDAG][DebugInfo] Refactor DanglingDebugInfo class
Hide the underlying DbgValueInst by adding methods to extract the necessary
information and by adding a raw_ostream &operator<< overload to print it.

Remove the DebugLoc field as this is always the same as the DbgValueInst's
DebugLoc (see D136247).

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D136249
2022-11-16 10:10:24 +00:00
OCHyams 9792744650 [NFC][SelectionDAG][DebugInfo] Remove duplicate parameter from handleDebugValue
handleDebugValue has two DebugLoc parameters that appear to always take the
same value. Remove one of the duplicate parameters. See phabricator review for
more detail.

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D136247
2022-11-16 09:59:35 +00:00
Matt Arsenault 116c894d72 DAG: Fix assert on load casted to vector with attached range metadata
AMDGPU legalizes i64 loads to loads of <2 x i32>, leaving the
i64 MMO with attached range metadata alone. The known bit width
was using the scalar element type, and asserting on a mismatch.
2022-11-15 23:28:55 -08:00
Yeting Kuo ed9638c44b [VP][RISCV] Add vp.nearbyint and RISC-V support.
nearbyint has the property to execute without exception.
For not modifying fflags, the patch added new machine opcode
PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair
of frflags and fsflags.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137685
2022-11-16 14:05:35 +08:00
Yeting Kuo 5c3ca10b09 [VP][RISCV] Add vp.bswap and RISC-V support.
The patch also added function expandVPBSWAP to expand ISD::VP_BSWAP nodes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137928
2022-11-16 11:36:38 +08:00
Craig Topper f387918dd8 [TargetLowering][RISCV][ARM][AArch64][Mips] Reduce the number of AND mask constants used by BSWAP expansion.
We can reuse constants if we use SRL followed by AND and AND followed by SHL.
Similar was done to bitreverse previously.

Differential Revision: https://reviews.llvm.org/D138045
2022-11-15 14:36:01 -08:00
Sanjay Patel fe05a0a3dd [SDAG] avoid udiv/urem transform for vector/scalar type mismatches
This solves the crashing from issue #58994.
I don't know anything about VE, so I don't know if the output
is as expected or even correct.
2022-11-15 11:01:18 -05:00
Nick Desaulniers f2981a3bc9 [SelectDagISEL] refactor HandlePHINodesInSuccessorBlocks NFC.
While working on this code to support outputs from callbr along indirect
branches, I kept making these changes again and again. Precommit these.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137445
2022-11-10 14:34:23 -08:00
Amaury Séchet 82209fd96e [NFC] Refactor DAGCombiner::foldSelectOfConstants to reduce nesting 2.0 2022-11-05 17:10:06 +00:00
Amaury Séchet 7c05f092c9 [NFC] Refactor DAGCombiner::foldSelectOfConstants to reduce nesting 2022-11-05 16:17:58 +00:00
Nick Desaulniers 0589038a6f [StatepointLowering] remove unused parameter. NFC
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D136885
2022-11-03 15:43:20 -07:00