Use the fact that `~X` is equivalent to `-1 - X`, which gives us
fully-precise answer, and we only need to special-handle the wrapped case.
This fires ~16k times for vanilla llvm test-suite + RawSpeed.
This is a continuation of 8d487668d0,
the logic is pretty much identical for SRem:
Name: pos pos
Pre: C0 >= 0 && C1 >= 0
%r = srem i8 C0, C1
=>
%r = urem i8 C0, C1
Name: pos neg
Pre: C0 >= 0 && C1 <= 0
%r = srem i8 C0, C1
=>
%r = urem i8 C0, -C1
Name: neg pos
Pre: C0 <= 0 && C1 >= 0
%r = srem i8 C0, C1
=>
%t0 = urem i8 -C0, C1
%r = sub i8 0, %t0
Name: neg neg
Pre: C0 <= 0 && C1 <= 0
%r = srem i8 C0, C1
=>
%t0 = urem i8 -C0, -C1
%r = sub i8 0, %t0
https://rise4fun.com/Alive/Vd6
Now, this new logic does not result in any new catches
as of vanilla llvm test-suite + RawSpeed.
but it should be virtually compile-time free,
and it may be important to be consistent in their handling,
because if we had a pair of sdiv-srem, and only converted one of them,
-divrempairs will no longer see them as a pair,
and thus not "merge" them.
The current code for handling pow(x, y) where y is an integer plus 0.5
is not explicitly guarded against attempting to transform the case where
abs(y) is exactly 0.5.
The latter case is meant to be handled by `replacePowWithSqrt`. Indeed,
if the pow(x, integer+0.5) case proceeds past a certain point, it will
hit an assertion by attempting to form pow(x, 0) using `getPow`.
This patch adds an explicit check to prevent attempting the
pow(x, integer+0.5) transformation on pow(x, +/-0.5) as suggested during
the review of D87877. This has the effect of retaining the shrinking of
`pow` to `powf` when the `sqrt` libcall cannot be formed.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D88066
This patch implements the vector string isolate (predicate and non-predicate
versions) builtins. The predicate builtins are custom selected within PPCISelDAGToDAG.
Differential Revision: https://reviews.llvm.org/D87671
This patch implements the 128-bit vector divide extended builtins in Clang/LLVM.
These builtins map to the vdivesq and vdiveuq instructions respectively.
Differential Revision: https://reviews.llvm.org/D87729
Just scalarize trunc stores - GenWidenVectorTruncStores does the same thing but is flawed (PR42046) and unused.
Differential Revision: https://reviews.llvm.org/D87708
Refactored __tgt_target_data_begin_mapper_<issue|wait> to receive the handle as an input/output argument.
This given the compiler warning of returning the handle as copy.
Differential Revision: https://reviews.llvm.org/D88029
Re-use an optimizition from the old LTO API (used by ld64).
This sorts modules in ascending order, based on bitcode size, so that larger modules are processed first. This allows for smaller modules to be process last, and better fill free threads 'slots', and thusly allow for better multi-thread load balancing.
In our case (on dual Intel Xeon Gold 6140, Windows 10 version 2004, two-stage build), this saves 15 sec when linking `clang.exe` with LLD & `-flto=thin`, `/opt:lldltojobs=all`, no ThinLTO cache, -DLLVM_INTEGRATED_CRT_ALLOC=d:\git\rpmalloc.
Before patch: 102 sec
After patch: 85 sec
Inspired by the work done by David Callahan in D60495.
Differential Revision: https://reviews.llvm.org/D87966
This provides a convenient way to print VPValues and recipes in a
debugger. In particular it saves the user from instantiating
VPSlotTracker to print recipes or values.
Stop combining loads and stores with PPCISD::ADD_TLS before we can merge the
node with with TLS_LOCAL_EXEC_MAT_ADDR. The issue is that
TLS_LOCAL_EXEC_MAT_ADDR cannot be selected by itself and requires the previous
ADD_TLS node that goes with it. However, we sometimes try to combine ADD_TLS
with loads and stores that come after it. If this happens then the ADD_TLS is
removed and TLS_LOCAL_EXEC_MAT_ADDR cannot be selected.
While this bug fix will address the issue it my not be ideal from a performance
perspective as we may be able to add patterns to combine TLS_LOCAL_EXEC_MAT_ADDR
with ADD_TLS with the load and store that comes after it all in one. However,
this is beyond the scope of this patch.
Reviewed By: NeHuang
Differential Revision: https://reviews.llvm.org/D88030
Currently these predicates are ignored, yet their handling is
pretty simple. I could not find a single test where it would
actually change something, but it's only because isImpliedCondOperands
is not smart enough to prove it further on. Yet the situation when
we come there with `less` predicate is pretty common.
Differential Revision: https://reviews.llvm.org/D87890
Reviewed By: fhahn
Changes TTI function getIntImmCostInst to take an additional Instruction parameter,
which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT.
We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated.
Required minor changes in some non-ARM backends to allow for the optional parameter to be included.
Differential Revision: https://reviews.llvm.org/D87457
It's simpler to do this at codegen time than to do ad-hoc constant
folding of machine instructions in SIFoldOperands.
Differential Revision: https://reviews.llvm.org/D88028
The VPTBlock has been modified to track the 'global' state of the
VPR, as well as the state for each block. Each object now just holds
a list of instructions that makeup the block, while static structures
hold the predicate information. This enables global access for
querying how both a VPT block and individual instructions are
predicated. These changes now allow us, again, to handle more
complicated cases where multiple instructions build a predicate
and/or where the same predicate in used in multiple blocks.
It doesn't, however, get us back to before the tracking was 'fixed'
as some extra logic will be required to properly handle VPT
instructions. Currently a VPT could be effectively predicated because
of it's inputs, but the existing logic will not detect that and so
will refuse to perform the transformation. This can be seen in
remat-vctp.ll test where we still don't perform the transform.
Differential Revision: https://reviews.llvm.org/D87681
Remove the domain from the instructions and create a shouldInspect
helper for LowOverheadLoops which queries it or a vpr operand.
Differential Revision: https://reviews.llvm.org/D87900
When cross compiling with clang-cl, clang splits the INCLUDE env
variable around semicolons (clang/lib/Driver/ToolChains/MSVC.cpp,
MSVCToolChain::AddClangSystemIncludeArgs) and lld splits the
LIB variable similarly (lld/COFF/Driver.cpp,
LinkerDriver::addLibSearchPaths). Therefore, the consensus for
cross compilation with clang-cl and lld-link seems to be to use
semicolons, despite path lists normally being separated by colons
on unix and EnvPathSeparator being set to that.
Therefore, handle the LIB variable similarly in Clang, when
handling lib file arguments when driving linking via Clang.
This fixes commands like "clang-cl test.c -Fetest.exe kernel32.lib" in
a cross compilation setting. Normally, most users call (lld-)link
directly, but meson happens to use this command syntax for
has_function() tests.
Reapply: Change Program.h to define procid_t as ::pid_t. When included
in lldb/unittests/Host/NativeProcessProtocolTest.cpp, it is included
after an lldb namespace containing an lldb::pid_t typedef, followed
later by a "using namespace lldb;". Previously, Program.h wasn't
included in this translation unit, but now it ends up included
transitively from Process.h.
Differential Revision: https://reviews.llvm.org/D88002
For relative symbols, add its offset when computing relocation value.
Also, warn on unsupported absolute symbols.
Differential Revision: https://reviews.llvm.org/D87407
Extend the handling of memory intrinsics to also include non-
target-specific intrinsics, in particular masked loads and stores.
Invent "isHandledNonTargetIntrinsic" to distinguish between intrin-
sics that should be handled natively from intrinsics that can be
passed to TTI.
Add code that handles masked loads and stores and update the
testcase to reflect the results.
Differential Revision: https://reviews.llvm.org/D87340
SimplifyCFG's options should always be overridden by command line flags,
but they mistakenly weren't in the default constructor.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D87718
Scheduling information is of little value when they may disrupt the
pipeline. This patch allows omitting the scheduling information for CSR
instructions while still setting `SchedMachineModel::CompleteModel`. For
specific cases, any scheduling information added will be used by the
scheduler.
Differential revision: https://reviews.llvm.org/D85366
In order to select the immediate forms using the imported patterns, we need to
lower them into new G_VASHR/G_VLSHR target generic ops. Add a combine to do this
matching build_vector of constant operands.
With this, we get selection for free.
This reverts commit 4d85444b31.
This commit broke building lldb's NativeProcessProtocolTest.cpp,
with errors like these:
In file included from include/llvm/Support/Process.h:32:0,
from tools/lldb/unittests/Host/NativeProcessProtocolTest.cpp:12:
include/llvm/Support/Program.h:39:11: error: reference to ‘pid_t’ is ambiguous
typedef pid_t procid_t;
/usr/include/sched.h:38:17: note: candidates are: typedef __pid_t pid_t
typedef __pid_t pid_t;
tools/lldb/include/lldb/lldb-types.h:85:18: note: typedef uint64_t lldb::pid_t
typedef uint64_t pid_t;
1. Store intrinsic ID in ParseMemoryInst instead of a boolean flag
"IsTargetMemInst". This will make it easier to add support for
target-independent intrinsics.
2. Extract the complex multiline conditions from EarlyCSE::processNode
into a new function "getMatchingValue".
Differential Revision: https://reviews.llvm.org/D87691
The patch modifies HexagonVectorLoopCarriedReuse pass to make it compatible with both Legacy Pass Manager through HexagonVectorLoopCarriedReuseLegacyPass and with New Pass Manager through HexagonVectorLoopCarriedReusePass.
Reviewed By: pzheng
Differential Revision: https://reviews.llvm.org/D86955
If we are going to write handler data (that is written as variable
length data following after the unwind info in .xdata), we need to
emit the handler data immediately, but for cases where no such
info is going to be written, skip emitting it right away. (Unwind
info for all remaining functions that hasn't gotten it emitted
directly is emitted at the end.)
This does slightly change the ordering of sections (triggering a
bunch of updates to DebugInfo/COFF tests), but the change should be
benign.
This also matches GCC's assembly output, which doesn't output
.seh_handlerdata unless it actually is needed.
For ARM64, the unwind info can be packed into the runtime function
entry itself (leaving no data in the .xdata section at all), but
that can only be done if there's no follow-on data in the .xdata
section. If emission of the unwind info is triggered via
EmitWinEHHandlerData (or the .seh_handlerdata directive), which
implicitly switches to the .xdata section, there's a chance of the
caller wanting to pass further data there, so the packed format
can't be used in that case.
Differential Revision: https://reviews.llvm.org/D87448
When cross compiling with clang-cl, clang splits the INCLUDE env
variable around semicolons (clang/lib/Driver/ToolChains/MSVC.cpp,
MSVCToolChain::AddClangSystemIncludeArgs) and lld splits the
LIB variable similarly (lld/COFF/Driver.cpp,
LinkerDriver::addLibSearchPaths). Therefore, the consensus for
cross compilation with clang-cl and lld-link seems to be to use
semicolons, despite path lists normally being separated by colons
on unix and EnvPathSeparator being set to that.
Therefore, handle the LIB variable similarly in Clang, when
handling lib file arguments when driving linking via Clang.
This fixes commands like "clang-cl test.c -Fetest.exe kernel32.lib" in
a cross compilation setting. Normally, most users call (lld-)link
directly, but meson happens to use this command syntax for
has_function() tests.
Differential Revision: https://reviews.llvm.org/D88002
This pass is like DeadCodeEliminationPass, but only does one pass
through a function instead of iterating on users of eliminated
instructions.
DeadCodeEliminationPass should be used in all cases.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D87933
The implementation of gather() should be reduced too,
but this change by itself makes things a little clearer:
we don't try to gather to a different type or
number-of-values than whatever is passed in as the value
list itself.
security boundary
It was never supported and that part was accidentally omitted when
upstreaming D76518.
Differential Revision: https://reviews.llvm.org/D86478
Change-Id: If6ba9506eb0431c87a1d42a38aa60e47ce263039
This patch adds support for the lxvp, lxvpx, plxvp, stxvp, stxvpx and pstxvp
instructions in the PowerPC backend. These instructions allow loading and
storing VSX register pairs. This patch also adds the VSRp register class
definition needed for these instructions.
Differential Revision: https://reviews.llvm.org/D84359
This matches the legacy PM name and makes all tests in
Transforms/LoopSimplifyCFG pass under NPM.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D87948
If some leaves have the same instructions to be vectorized, we may
incorrectly evaluate the best order for the root node (it is built for the
vector of instructions without repeated instructions and, thus, has less
elements than the root node). In this case we just can not try to reorder
the tree + we may calculate the wrong number of nodes that requre the
same reordering.
For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves
are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first
leaf, it will be shrink to \<a, b\>. If instructions in this leaf should
be reordered, the best order will be \<1, 0\>. We need to extend this
order for the root node. For the root node this order should look like
\<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes
with the reused instructions.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D45263
When exporting statepoint results to virtual registers we try to avoid
generating exports for duplicated inputs. But we erroneously use
IR Value* to check if inputs are duplicated. Instead, we should use
SDValue, because even different IR values can get lowered to the same
SDValue.
I'm adding a (degenerate) test case which emphasizes importance of this
feature for invoke statepoints.
If we fail to export only unique values we will end up with something
like that:
%0 = STATEPOINT
%1 = COPY %0
landing_pad:
<use of %1>
And when exceptional path is taken, %1 is left uninitialized (COPY is never
execute).
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D87695
The current nodes, AArch64::SMAXV_PRED for example, are defined to
return a NEON vector result. This is incorrect because they modify
the complete SVE register and are thus changed to represent such.
This patch also adds nodes for UADDV_PRED and SADDV_PRED, which
unifies the handling of all SVE reductions.
NOTE: Floating-point reductions are already implemented correctly,
so this patch is essentially making everything consistent with those.
Differential Revision: https://reviews.llvm.org/D87843
This reverts commit 0345d88de6.
Google internal backend uses EntrySU, we are looking into removing
dependency on it.
Differential Revision: https://reviews.llvm.org/D88018
This commit was originally because it was suspected to cause a crash,
but a reproducer did not surface.
A crash that was exposed by this change was fixed in 1d8f2e5292.
This reverts the revert commit 0581c0b0ee.
This adds lowering for f32 values using the vmov.f16, which zeroes the
top bits whilst setting the lower bits to a pattern. This range of
values does not often come up, except where a f16 constant value has
been converted to a f32.
Differential Revision: https://reviews.llvm.org/D87790
This does not result in changes for any of the current tests, but it might
improve debug information in some cases.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D86522
SelectionDAGBuilder was inconsistently mangling values based on ABI
Calling Conventions when getting them through copyFromRegs in
SelectionDAGBuilder, causing duplicate value type convertions for
function arguments. The checking for the mangling requirement was based
on the value's originating instruction and was performed outside of, and
inspite of, the regular Calling Convention Lowering.
The issue could be observed in a scenario such as:
```
%arg1 = load half, half* %const, align 2
%arg2 = call fastcc half @someFunc()
call fastcc void @otherFunc(half %arg1, half %arg2)
; Here, %arg2 was incorrectly mangled twice, as the CallConv data from
; the call to @someFunc() was taken into consideration for the check
; when getting the value for processing the call to @otherFunc(...),
; after the proper convertion had taken place when lowering the return
; value of the first call.
```
This patch fixes the issue by disregarding the Calling Convention
information for such copyFromRegs, making sure the ABI mangling is
properly contanined in the Calling Convention Lowering.
This fixes Bugzilla #47454.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87844
LSR claims to MemorySSA, but we also have to make sure it is preserved
when splitting critical edges. This can be done by passing MSSAU to
SplitCriticalEdge.
Fixes PR47557.
This is a follow-up of D86605. For strict DAG FP node, if its FP
exception behavior metadata is ignore, it should have nofpexcept flag.
But during custom lowering, this flag isn't passed down.
This is also seen on X86 target.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87390
The scalar elements of the vXi1 build_vector will have been type legalized to i8 by padding with 0s. So we can't check for all ones. Instead we should just look at bit 0 of the constant.
Differential Revision: https://reviews.llvm.org/D87863
This adds simple constant folding for VMOVrh, to constant fold fp16
constants to integer values. It can help especially with soft calling
conventions, but some of the results are not optimal as we end up
loading using a vldr. This will be improved in a follow up patch.
Differential Revision: https://reviews.llvm.org/D87789
InstCombine likes to canonicalize comparisons of the form
X == C || X == C+1 into (X & -2) == C'. Make sure LVI can still
recover the value range from this. Can of course also be useful
for proper mask comparisons.
For the sake of clarity, the implementation goes through KnownBits
to compute the range.
Rewrite this in a way where the core logic is in a separate
function, that is invoked with swapped operands. This makes it
easier to add handling for additional icmp patterns.