This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
This fixes the miscompile in issue #58883.
The test demonstrates that we gave up on store merging in that example.
This change should be strictly safe (just adds another clause
to avoid the transform), and it does not prohibit any existing
valid optimizations based on regression tests. I want to believe
that it's also a sufficient fix (possibly overkill), but I'm not
sure how to prove that.
Differential Revision: https://reviews.llvm.org/D137791
This patch is the Part-2 (BE LLVM) implementation of HW Exception handling.
Part-1 (FE Clang) was committed in 797ad70152.
This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).
Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -fasync-exceptions,
plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.
NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.
The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow three rules:
First, no exception can move in or out of _try region., i.e., no "potential faulty
instruction can be moved across _try boundary.
Second, the order of exceptions for instructions 'directly' under a _try must be preserved
(not applied to those in callees).
Finally, global states (local/global/heap variables) that can be read outside of _try region
must be updated in memory (not just in register) before the subsequent exception occurs.
The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also
be handled properly by an upstream SEH _try-handler or a C++ catch(...).
As such, when that happens in the middle of an object's life scope, the dtor
must be invoked the same way as C++ Synchronous Exception during unwinding process.
Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.
This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.
One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:
A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others. If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:
Warning: jump bypasses variable with a non-trivial destructor
In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.
Implementation:
Part-1: Clang implementation (already in):
Please see commit 797ad70152).
Part-2 : LLVM implementation described below.
For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).
For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.
The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.
Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.htmlhttps://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html
Differential Revision: https://reviews.llvm.org/D102817/new/
The use of a PSV for buffer intrinsics is misleading because it may be
misinterpreted as all buffer intrinsics accessing the same address in
memory, which is clearly not true.
Instead, build MachineMemOperands without a pointer value but with an
address space, so that address space-based alias analysis can still
work.
There is a lot of test churn because previously address space 4
(constant address space) was used as an address space for buffer
intrinsics. This doesn't make much sense and seems to have been an
accident -- see the change in
AMDGPUTargetMachine::getAddressSpaceForPseudoSourceKind.
Differential Revision: https://reviews.llvm.org/D138711
I had reverted this before the holiday week because a problem was reported with a related change (D137140 - scalable vector known bits in DAG). I had initially confused the two patches, and then decided to leave this reverted out an abundance of caution. Now that we're through the holiday week, reapplying.
I also roled in fixes for several post commit review comments that hadn't landed with the original change.
Original commit message
This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines.
The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.
Differential Revision: https://reviews.llvm.org/D137141
As discussed on Issue #59217, under certain circumstances the DAG can generate duplicate MUL and MUL_LOHI nodes, often during MULO legalization.
This patch attempts to replace MUL nodes with additional uses of the LO result from the MUL_LOHI node
Differential Revision: https://reviews.llvm.org/D138790
class support and introduce GlobalISel implementation for AMDGPU
Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic
for llvm.is.fpclass
This allows DemandedBits to see that the SVE count intrinsics (CNTB,
CNTH, CNTW, CNTD) sans multiplier will only ever produce small
positive integers. The maximum value you could get here is 256, which
is CNTB on a machine with a 2048bit vector size (the maximum for SVE).
Using this various redundant operations (zexts, sexts, ands, ors, etc)
can be eliminated.
Differential Revision: https://reviews.llvm.org/D138424
or (xor x, y), x --> or x, y
or (xor x, y), y --> or x, y
or (xor x, y), (and x, y) --> or x, y
or (xor x, y), (or x, y) --> or x, y
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D138401
This patch is an alternative of D100091. It solved the problems in `f80` type lowering.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D137946
The major change is falling through to ComputeKnownBits when we don't have an implementation of ComputeNumSignBits due to conservatism over scalable vectors. Right now, we're mostly conservative in the same cases, but this allows our results to improve when we change ComputeKnownBits without also needing to improve ComputeNumSignBits at the same time.
This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines.
The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.
Differential Revision: https://reviews.llvm.org/D137141
his is the SelectionDAG equivalent of D136470, and is thus an alternate patch to D128159.
The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.
This patch also includes an implementation for SPLAT_VECTOR as without it, the lane wise reasoning has no base case. The original patch which inspired this (D128159), also included STEP_VECTOR. I plan to do that as a separate patch.
Differential Revision: https://reviews.llvm.org/D137140
This now allows folding an AND of a anyext masked_load to a
zext_masked_load even if the masked load has multiple users. Doing is
eliminates some redundant ANDs/MOVs for certain AArch64 SVE code.
I'm not sure if there's any cases where doing this could negatively the
other users of the masked_load. Looking at other optimizations of
masked loads, most don't apply if the load is used more than once, so it
doesn't look like this would interfere.
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D137844
Add vp.inttoptr & vp.ptrtoint support by lowering them into
vp.zext / vp.truncate with in SelectionDAGBuilder.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137169
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
Hide the underlying DbgValueInst by adding methods to extract the necessary
information and by adding a raw_ostream &operator<< overload to print it.
Remove the DebugLoc field as this is always the same as the DbgValueInst's
DebugLoc (see D136247).
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D136249
handleDebugValue has two DebugLoc parameters that appear to always take the
same value. Remove one of the duplicate parameters. See phabricator review for
more detail.
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D136247
AMDGPU legalizes i64 loads to loads of <2 x i32>, leaving the
i64 MMO with attached range metadata alone. The known bit width
was using the scalar element type, and asserting on a mismatch.
nearbyint has the property to execute without exception.
For not modifying fflags, the patch added new machine opcode
PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair
of frflags and fsflags.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137685
The patch also added function expandVPBSWAP to expand ISD::VP_BSWAP nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137928
We can reuse constants if we use SRL followed by AND and AND followed by SHL.
Similar was done to bitreverse previously.
Differential Revision: https://reviews.llvm.org/D138045
While working on this code to support outputs from callbr along indirect
branches, I kept making these changes again and again. Precommit these.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137445