The loop-based probing done for stack clash protection altered R1D which
corrupted the backchain value to be stored after the probing was done.
By using R0D instead for the loop exit value, R1D is not modified.
Review: Ulrich Weigand.
Differential Revision: https://reviews.llvm.org/D92803
FEntryInserter prepends FENTRY_CALL to the first basic block. In case
there are other instructions, PostRA Machine Instruction Scheduler can
move FENTRY_CALL call around. This actually occurs on SystemZ (see the
testcase). This is bad for the following reasons:
* FENTRY_CALL clobbers registers.
* Linux Kernel depends on whatever FENTRY_CALL expands to to be the very
first instruction in the function.
Fix by adding isCall attribute to FENTRY_CALL, which prevents reordering
by making it a scheduling boundary for PostRA Machine Instruction
Scheduler.
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D91218
Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets.
Differential Revision: https://reviews.llvm.org/D91589
They are currently implicit because TargetMachine::shouldAssumeDSOLocal implies
dso_local.
For such function declarations, clang -fno-pic emits the dso_local specifier.
Adding explicit dso_local makes these tests align with the clang behavior and
helps implementing an option to use GOT indirection when taking the address of a
function symbol in -fno-pic (to avoid a canonical PLT entry (SHN_UNDEF with
non-zero st_value)).
They are currently implicit because TargetMachine::shouldAssumeDSOLocal implies
dso_local.
For external data, clang -fno-pic emits the dso_local specifier for ELF and
non-MinGW COFF. Adding explicit dso_local makes these tests in align with the
clang behavior and helps implementing an option to use GOT indirection for
external data access in -fno-pic mode (to avoid copy relocations).
This value had the default value of 4 which caused branch relaxation to fail.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D90065
Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes.
There are still problems with unsigned i32 conversions too which I'll try to fix in another patch.
Fixes part of PR47393
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87115
For historical reasons, the R6 register is a callee-saved argument
register. This means that if it is used to pass an argument to a function
that does not clobber it, it is live throughout the function.
This patch makes sure that in this special case any kill flags of it are
removed.
Review: Ulrich Weigand, Eli Friedman
Differential Revision: https://reviews.llvm.org/D89451
In order to correctly load an all-ones FP NaN value into a floating point
register with a VGBM, the analyzed 32/64 FP bits must first be shifted left
(into element 0 of the vector register).
SystemZVectorConstantInfo has so far relied on element replication which has
bypassed the need to do this shift, but now it is clear that this must be
done in order to handle NaNs.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D89389
In the presence of packed structures (#pragma pack(1)) where elements are
referenced through pointers, there will be stores/loads with alignment values
matching the default alignments for the element types while the elements are
in fact unaligned. Strictly speaking this is incorrect source code, but is
unfortunately part of existing code and therefore now addressed.
This patch improves the pattern predicate for PC-relative loads and stores by
not only checking the alignment value of the instruction, but also making
sure that the symbol (and element) itself is aligned.
Fixes https://bugs.llvm.org/show_bug.cgi?id=44405
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D87510
This rewrites big parts of the fast register allocator. The basic
strategy of doing block-local allocation hasn't changed but I tweaked
several details:
Track register state on register units instead of physical
registers. This simplifies and speeds up handling of register aliases.
Process basic blocks in reverse order: Definitions are known to end
register livetimes when walking backwards (contrary when walking
forward then uses may or may not be a kill so we need heuristics).
Check register mask operands (calls) instead of conservatively
assuming everything is clobbered. Enhance heuristics to detect
killing uses: In case of a small number of defs/uses check if they are
all in the same basic block and if so the last one is a killing use.
Enhance heuristic for copy-coalescing through hinting: We check the
first k defs of a register for COPYs rather than relying on there just
being a single definition. When testing this on the full llvm
test-suite including SPEC externals I measured:
average 5.1% reduction in code size for X86, 4.9% reduction in code on
aarch64. (ranging between 0% and 20% depending on the test) 0.5%
faster compiletime (some analysis suggests the pass is slightly slower
than before, but we more than make up for it because later passes are
faster with the reduced instruction count)
Also adds a few testcases that were broken without this patch, in
particular bug 47278.
Patch mostly by Matthias Braun
This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130.
This required adding a SDNodeFlags to SelectionDAG::getSetCC.
Now we manage to contant fold some stuff undefs during the
initial getNode that we don't do in later DAG combines.
Differential Revision: https://reviews.llvm.org/D87200
On SystemZ, a ZERO_EXTEND of an i1 vector handled by WidenVecRes_Convert()
always ended up being scalarized, because the type action of the input is
promotion which was previously an unhandled case in this method.
This fixes https://bugs.llvm.org/show_bug.cgi?id=47132.
Differential Revision: https://reviews.llvm.org/D86268
Patch by Eli Friedman.
Review: Ulrich Weigand
Previously SDNodeFlags::instersectWith(Flags) would do nothing if Flags was
in an undefined state, which is very bad given that this is the default when
getNode() is called without passing an explicit SDNodeFlags argument.
This meant that if an already existing and reused node had a flag which the
second caller to getNode() did not set, that flag would remain uncleared.
This was exposed by https://bugs.llvm.org/show_bug.cgi?id=47092, where an NSW
flag was incorrectly set on an add instruction (which did in fact overflow in
one of the two original contexts), so when SystemZElimCompare removed the
compare with 0 trusting that flag, wrong-code resulted.
There is more that needs to be done in this area as discussed here:
Differential Revision: https://reviews.llvm.org/D86871
Review: Ulrich Weigand, Sanjay Patel
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D85781
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D85781
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D85781
One of the callers only wants the condition, but the vselect can
be simplified by getNode making it hard or impossible to retrieve
the condition.
Instead, return the condition and make the other 2 callers
responsible for creating the vselect node using the condition.
Rename the function to WidenVSELECTMask accordingly.
Differential Revision: https://reviews.llvm.org/D85468
When passing the -vector feature to LLVM (or equivalently the
-mno-vx command line argument to clang), the intent is that
generated code must not use any vector features (in particular,
no vector registers must be used).
However, there are some cases where we still could generate
such uses; these are all related to some of the additional
vector features (like +vector-enhancements-1). Since none
of those features are actually usable with -vector, just make
sure we disable them all if -vector is given.
The knownbits.ll test case is somewhat fragile since:
- it relies on undef inputs; and
- it operates just at the limits of the MaxRecursionDepth
This means that optimization changes may easily cause the test
to spuriously fail. Rewrite the test so it still validates
the same thing, but in a less fragile manner.
Summary:
This fixes ASan and MSan tests on SystemZ after
commit 6a822e20ce ("[ASan][MSan] Remove EmptyAsm and set the CallInst
to nomerge to avoid from merging.").
Based on commit 80e107ccd0 ("Add NoMerge MIFlag to avoid MIR branch
folding").
Reviewers: uweigand, jonpa
Reviewed By: uweigand
Subscribers: hiraditya, llvm-commits, Andreas-Krebbel
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82794
Instead of doing multiple unpacks when zero extending vectors (e.g. v2i16 ->
v2i64), benchmarks have shown that it is better to do a VPERM (vector
permute) since that is only one sequential instruction on the critical path.
This patch achieves this by
1. Expand ZERO_EXTEND_VECTOR_INREG into a vector shuffle with a zero vector
instead of (multiple) unpacks.
2. Improve SystemZ::GeneralShuffle to perform a single unpack as the last
operation if Bytes matches it.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D78486
Extract the existing code from getInstructionThroughput into
TTImpl::getUserCost. The duplicated code in the AMDGPU backend has
also been removed.
Differential Revision: https://reviews.llvm.org/D81448
When we rematerialize a value as part of the coalescing, we may
widen the register class of the destination register.
When this happens, updateRegDefUses may create additional subranges
to account for the wider register class.
The created subranges are empty and if they are not defined by
the rematerialized instruction we clean them up.
However, if they are defined by the rematerialized instruction but
unused, we failed to flag them as dead definition and would leave
them as empty live-range.
This is wrong because empty live-ranges don't interfere with anything,
thus if we don't fix them, we would fail to account that the
rematerialized instruction clobbers some lanes.
E.g., let us consider the following pseudo code:
def.lane_low64:reg128 = ldimm
newdef:reg32 = COPY def.lane_low64_low32
When rematerialization happens for newdef, we end up with:
newdef.lane_low64:reg128 = ldimm
= use newdef.lane_low64_low32
Let's look at the live interval of newdef.
Before rematerialization, we would get:
newdef [defIdx, useIdx:0) 0@defIdx
Right after updateRegDefUses, newdef register class is widen to reg128
and the subrange definitions will be augmented to fill the subreg that
is used at the definition point, here lane_low64.
The resulting live interval would be:
newdef [newDefIdx, useIdx:0) 0@newDefIdx
* lane_low64_high32 EMPTY
* lane_low64_low32 [newDefIdx, useIdx:0)
Before this patch this would be the final status of the live interval.
Therefore we miss that lane_low64_high32 is actually live on the
definition point of newdef.
With this patch, after rematerializing, we check all the added subranges
and for the ones that are defined but empty, we flag them as dead def.
Thus, in that case, newdef would look like this:
newdef [newDefIdx, useIdx:0) 0@newDefIdx
* lane_low64_high32 [newDefIdx, newDefIdxDead) ; <-- instead of EMPTY
* lane_low64_low32 [newDefIdx, useIdx:0)
This fixes https://www.llvm.org/PR46154
Try to avoid creating VGBMs by reusing the permutation mask if it contains a
zero. If the first byte was into (any byte of) a zero vector, then the first
byte of the mask can become zero and reused by putting the mask also as the
first operand. If there instead was a first-byte use of the other source
operand, then that zero index can be reused if the mask is placed as the
second operand.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D79925
It's better to reuse the first source value than to use an undef second
operand, because that will make more resulting VPERMs have identical operands
and therefore MachineCSE more successful.
Review: Ulrich Weigand
Use FP-mem instructions when folding reloads into single lane (W..) vector
instructions.
Only do this when all other operands of the instruction have already been
allocated to an FP (F0-F15) register.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D76705
When using vec_load/store_len_r with an immediate length operand
of 16 or larger, LLVM will currently emit an VLRL/VSTRL instruction
with that immediate. This creates a valid encoding (which should be
supported by the assembler), but always traps at runtime. This patch
fixes this by not creating VLRL/VSTRL in those cases.
This would result in loading the length into a register and
calling VLRLR/VSTRLR instead. However, these operations with
a length of 15 or larger are in fact simply equivalent to a
full vector load or store. And in fact the same holds true for
vec_load/store_len as well.
Therefore, add a DAGCombine rule to replace those operations with
plain vector loads or stores if the length is known at compile
time and equal or larger to 15.
Remove bad kill flags fom load-and-test.mir as discovered by
https://reviews.llvm.org/D78586: "[MachineVerifier] Add more checks for
registers in live-in lists".
Review: Ulrich Weigand
This patch adds
- New arguments to getMinPrefetchStride() to let the target decide on a
per-loop basis if software prefetching should be done even with a stride
within the limit of the hw prefetcher.
- New TTI hook enableWritePrefetching() to let a target do write prefetching
by default (defaults to false).
- In LoopDataPrefetch:
- A search through the whole loop to gather information before emitting any
prefetches. This way the target can get information via new arguments to
getMinPrefetchStride() and emit prefetches more selectively. Collected
information includes: Does the loop have a call, how many memory
accesses, how many of them are strided, how many prefetches will cover
them. This is NFC to before as long as the target does not change its
definition of getMinPrefetchStride().
- If a previous access to the same exact address was 'read', and the
current one is 'write', make it a 'write' prefetch.
- If two accesses that are covered by the same prefetch do not dominate
each other, put the prefetch in a block that dominates both of them.
- If a ConstantMaxTripCount is less than ItersAhead, then skip the loop.
- A SystemZ implementation of getMinPrefetchStride().
Review: Ulrich Weigand, Michael Kruse
Differential Revision: https://reviews.llvm.org/D70228
A spilled load of an immediate can use MVHI/MVGHI instead.
A compare of a spilled register against an immediate can use CHSI/CGHSI.
A logical compare can use CLFHSI/CLGHSI.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D76055
Replace single-lane (W... form) vector "multiply and add" and "multiply and
subtract" instructions with equivalent floating point instructions whenever
possible in SystemZShortenInst.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D76370
ISD::ROTL/ROTR rotation values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits.
Differential Revision: https://reviews.llvm.org/D76201
The type legalizer will scalarize vector conversions from integer to floating
point if the source element size is less than that of the result.
This is avoided now by inserting a zero/sign-extension of the source vector
before type legalization.
Review: Ulrich Weigand
Differential revision: https://reviews.llvm.org/D75978
Swap the compare operands if LHS is spilled while updating the CCMask:s of
the CC users. This is relatively straight forward since the live-in lists for
the CC register can be assumed to be correct during register allocation
(thanks to 659efa2).
Also fold a spilled operand of an LOCR/SELR into an LOC(G).
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D67437
On SystemZ there are a set of "access registers" that can be copied in and
out of 32-bit GPRs with special instructions. These instructions can only
perform the copy using low 32-bit parts of the 64-bit GPRs. However, the
default register class for 32-bit integers is GRX32, which also contains the
high 32-bit part registers.
In order to never end up with a case of such a COPY into a high reg, this
patch adds a new simple pre-RA pass that selects such COPYs into target
instructions.
This pass also handles COPYs from CC (Condition Code register), and COPYs to
CC can now also be emitted from a high reg in copyPhysReg().
Fixes: https://bugs.llvm.org/show_bug.cgi?id=44254
Review: Ulrich Weigand.
Differential Revision: https://reviews.llvm.org/D75014
The incoming back chain slot was implicitly allocated whenever a GPR was
saved in SystemZFrameLowering::getRegSpillOffset(), but in cases where no
GPRs were saved/restored this did not take effect.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D75367
Forming subtract with overflow is beneficial on SystemZ, just like additions.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D75290
In order to build the Linux kernel, the back chain must be supported with
packed-stack. The back chain is then stored topmost in the register save
area.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D74506
isKnownNonNaN() could not recognize a zero splat because that is a
ConstantAggregateZero which is-a ConstantData but not a ConstantDataVector.
Patch makes a ConstantAggregateZero return true.
Review: Thomas Lively
Differential Revision: https://reviews.llvm.org/D74263
Summary:
Backends should fold the subtraction into the comparison, but not all
seem to. Moreover, on targets where pointers are not integers, such as
CHERI, an integer subtraction is not appropriate. Instead we should just
compare the two pointers directly, as this should work everywhere and
potentially generate more efficient code.
Reviewers: bogner, lebedev.ri, efriedma, t.p.northover, uweigand, sunfish
Reviewed By: lebedev.ri
Subscribers: dschuff, sbc100, arichardson, jgravelle-google, hiraditya, aheejin, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74454
This reverts commit 80a34ae311 with fixes.
Previously, since bots turning on EXPENSIVE_CHECKS are essentially turning on
MachineVerifierPass by default on X86 and the fact that
inline-asm-avx-v-constraint-32bit.ll and inline-asm-avx512vl-v-constraint-32bit.ll
are not expected to generate functioning machine code, this would go
down to `report_fatal_error` in MachineVerifierPass. Here passing
`-verify-machineinstrs=0` to make the intent explicit.
This reverts commit 80a34ae311 with fixes.
On bots llvm-clang-x86_64-expensive-checks-ubuntu and
llvm-clang-x86_64-expensive-checks-debian only,
llc returns 0 for these two tests unexpectedly. I tweaked the RUN line a little
bit in the hope that LIT is the culprit since this change is not in the
codepath these tests are testing.
llvm\test\CodeGen\X86\inline-asm-avx-v-constraint-32bit.ll
llvm\test\CodeGen\X86\inline-asm-avx512vl-v-constraint-32bit.ll
This reverts commit rGcd5b308b828e, rGcd5b308b828e, rG8cedf0e2994c.
There are issues to be investigated for polly bots and bots turning on
EXPENSIVE_CHECKS.
When more than one SelectPseudo instruction is handled a new MBB is
returned. This must not be done if that would result in leaving an undhandled
isel pseudo behind in the original MBB.
Fixes https://bugs.llvm.org/show_bug.cgi?id=44849.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D74352
Each function is with this compiled with the SystemZSubtarget initialized
from the functions attributes.
Review: Ulrich Weigand.
Differential Revision: https://reviews.llvm.org/D74086
This change implements the llvm intrinsic llvm.read_register for
the SystemZ platform which returns the value of the specified
register
(http://llvm.org/docs/LangRef.html#llvm-read-register-and-llvm-write-register-intrinsics).
This implementation returns the value of the stack register, and
can be extended to return the value of other registers. The
implementation for this intrinsic exists on various other platforms
including Power, x86, ARM, etc. but missing on SystemZ.
Reviewers: uweigand
Differential Revision: https://reviews.llvm.org/D73378
Printing floating point number in decimal is inconvenient for humans.
Verbose asm output will print out floating point values in comments, it
helps.
But in lots of cases, users still need additional work to covert the
decimal back to hex or binary to check the bit patterns,
especially when there are small precision difference.
Hexadecimal form is one of the supported form in LLVM IR, and easier for
debugging.
This patch try to print all FP constant in hex form instead.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D73566
The "{=v0}" constraint did not result in the expected error message in the
abscence of the vector facility, because 'v0' matches as a string into the
AnyRegBitRegClass in common code.
This patch adds checks for vector support in case of "{v" and soft-float in
case of "{f" to remedy this.
Review: Ulrich Weigand.
These names have been changed from CamelCase to camelCase, but there were
many places (comments mostly) that still used the old names.
This change is NFC.
Summary:
This patch could be treated as a rebase of D33960. It also fixes PR35547.
A fix for `llvm/test/Other/close-stderr.ll` is proposed in D68164. Seems
the consensus is that the test is passing by chance and I'm not
sure how important it is for us. So it is removed like in D33960 for now.
The rest of the test fixes are just adding `--crash` flag to `not` tool.
** The reason it fixes PR35547 is
`exit` does cleanup including calling class destructor whereas `abort`
does not do any cleanup. In multithreading environment such as ThinLTO or JIT,
threads may share states which mostly are ManagedStatic<>. If faulting thread
tearing down a class when another thread is using it, there are chances of
memory corruption. This is bad 1. It will stop error reporting like pretty
stack printer; 2. The memory corruption is distracting and nondeterministic in
terms of error message, and corruption type (depending one the timing, it
could be double free, heap free after use, etc.).
Reviewers: rnk, chandlerc, zturner, sepavloff, MaskRay, espindola
Reviewed By: rnk, MaskRay
Subscribers: wuzish, jholewinski, qcolombet, dschuff, jyknight, emaste, sdardis, nemanjai, jvesely, nhaehnle, sbc100, arichardson, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, lenary, s.egerton, pzheng, cfe-commits, MaskRay, filcab, davide, MatzeB, mehdi_amini, hiraditya, steven_wu, dexonsmith, rupprecht, seiya, llvm-commits
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D67847
Code in getRoot made the assumption that every node in PendingLoads
must always itself have a dependency on the current DAG root node.
After the changes in 04a8696, it turns out that this assumption no
longer holds true, causing wrong codegen in some cases (e.g. stores
after constrained FP intrinsics might get deleted).
To fix this, we now need to make sure that the TokenFactor created
by getRoot always includes the previous root, if there is no implicit
dependency already present.
The original getControlRoot code already has exactly this check,
so this patch simply reuses that code now for getRoot as well.
This fixes the regression.
NFC if no constrained FP intrinsic is present.
We need to ensure that fpexcept.strict nodes are not optimized away even if
the result is unused. To do that, we need to chain them into the block's
terminator nodes, like already done for PendingExcepts.
This patch adds two new lists of pending chains, PendingConstrainedFP and
PendingConstrainedFPStrict to hold constrained FP intrinsic nodes without
and with fpexcept.strict markers. This allows not only to solve the above
problem, but also to relax chains a bit further by no longer flushing all
FP nodes before a store or other memory access. (They are still flushed
before nodes with other side effects.)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D72341
SystemZDAGToDAGISel::Select will attempt to split logical instruction
with a large immediate constant. This must not happen if the result
matches one of the z15 combined operations, so the code checks for
those. However, one of them was missed, causing invalid code to
be generated in the test case for PR44496.
Fix several several additional problems with the int <-> FP conversion
logic both in common code and in the X86 target. In particular:
- The STRICT_FP_TO_UINT expansion emits a floating-point compare. This
compare can raise exceptions and therefore needs to be a strict compare.
I've made it signaling (even though quiet would also be correct) as
signaling is the more usual default for an LT. This code exists both
in common code and in the X86 target.
- The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode:
it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one
of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP
that ends up not chosen. I've fixed the algorithm to use only a single
STRICT_SINT_TO_FP instead.
- The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do
the wrong thing because it calls getOperationAction using the result VT.
But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to
be called using the operand VT.
- Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate
STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily.
Reviewed by: craig.topper
Differential Revision: https://reviews.llvm.org/D71840
The SELR(Mux) instructions can be converted to two-address form as LOCR(Mux)
instructions whenever one of the sources are the same reg as dest. By adding
this mapping in getTwoOperandOpcode(), we get:
- Two-address hints in getRegAllocationHints() for select register
instructions.
- No need anymore for special handling in SystemZShortenInst.cpp -
shortenSelect() removed.
The two-address hints are now added before the GRX32 hints, which should be
preferred.
Review: Ulrich Weigand
https://reviews.llvm.org/D68870
It was recently discovered that the handling of CC values was actually broken
since overflow was not properly handled ('nsw' flag not checked for).
Add and sub instructions now have a new target specific instruction flag
named SystemZII::CCIfNoSignedWrap. It means that the CC result can be used
instead of a compare with 0, but only if the instruction has the 'nsw' flag
set.
This patch also adds the improvements of conversion to logical instructions
and the analyzing of add with immediates, to be able to eliminate more
compares.
Review: Ulrich Weigand
https://reviews.llvm.org/D66868
The back-end currently has special DAGCombine code to detect
cases where two floating-point extend or truncate operations
can be combined into a single vector operation.
This patch extends that support to also handle strict FP operations.
Note that currently only the case where both operations have the
same input chain are supported. This already suffices to cover
the common case where the operations result from scalarizing a
non-legal vector type. More general cases can be supported in
the future.
Add new intrinsics
llvm.experimental.constrained.minimum
llvm.experimental.constrained.maximum
as strict versions of llvm.minimum and llvm.maximum.
Includes SystemZ back-end support.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71624
Let the "mnop-mcount" function attribute simply be present or non-present.
Update SystemZ backend as well to use hasFnAttribute() instead.
Review: Ulrich Weigand
https://reviews.llvm.org/D71669
The following intrinsics currently carry a rounding mode metadata argument:
llvm.experimental.constrained.minnum
llvm.experimental.constrained.maxnum
llvm.experimental.constrained.ceil
llvm.experimental.constrained.floor
llvm.experimental.constrained.round
llvm.experimental.constrained.trunc
This is not useful since the semantics of those intrinsics do not in any way
depend on the rounding mode. In similar cases, other constrained intrinsics
do not have the rounding mode argument. Remove it here as well.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71218
As of b1d8576 there is middle-end support for STRICT_[SU]INT_TO_FP,
so this patch adds SystemZ back-end support as well.
The patch is SystemZ target specific except for adding SD patterns
strict_[su]int_to_fp and any_[su]int_to_fp to TargetSelectionDAG.td
as usual.
Now that the machine verifier will check for cases of register/immediate
MachineOperands and their correspondence to the MC instruction descriptor,
this patch adds the operand types to the descriptors where they were
previously missing. All MCOI::OPERAND_UNKNOWN operand types have been handled
to get a known type, except for G_... (global isel) instructions.
Review: Ulrich Weigand
https://reviews.llvm.org/D71494
Any llvm function with the "packed-stack" attribute will be compiled to use
the packed stack layout which reuses unused parts of the incoming register
save area. This is needed for building the Linux kernel.
Review: Ulrich Weigand
https://reviews.llvm.org/D70821
Before z14, we did not have any FMA instruction for 128-bit
floating-point, so the @llvm.fma.f128 intrinsic needs to be
expanded to a libcall on those platforms.
This worked correctly for regular FMA, but was implemented
incorrectly for the strict version. This was not noticed
because we did not have test coverage for this case.
This patch fixes that incorrect expansion and adds the
missing test cases.
This adds support for constrained floating-point comparison intrinsics.
Specifically, we add:
declare <ty2>
@llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
metadata <condition code>,
metadata <exception behavior>)
declare <ty2>
@llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
metadata <condition code>,
metadata <exception behavior>)
The first variant implements an IEEE "quiet" comparison (i.e. we only
get an invalid FP exception if either argument is a SNaN), while the
second variant implements an IEEE "signaling" comparison (i.e. we get
an invalid FP exception if either argument is any NaN).
The condition code is implemented as a metadata string. The same set
of predicates as for the fcmp instruction is supported (except for the
"true" and "false" predicates).
These new intrinsics are mapped by SelectionDAG codegen onto two new
ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again
representing quiet vs. signaling comparison operations. Otherwise
those nodes look like SETCC nodes, with an additional chain argument
and result as usual for strict FP nodes. The patch includes support
for the common legalization operations for those nodes.
The patch also includes full SystemZ back-end support for the new
ISD nodes, mapping them to all available SystemZ instruction to
fully implement strict semantics (scalar and vector).
Differential Revision: https://reviews.llvm.org/D69281
D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence:
// Sel = Src < 0x8000000000000000
// Val = select Sel, Src, Src - 0x8000000000000000
// Ofs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val) ^ Ofs
The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.)
Instead, I'd suggest to use the following sequence:
// Sel = Src < 0x8000000000000000
// FltOfs = select Sel, 0, 0x8000000000000000
// IntOfs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val - FltOfs) ^ IntOfs
In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway).
In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit.
There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.)
Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper
Differential Revision: https://reviews.llvm.org/D67105
This patch implements the following changes:
1) SelectionDAGBuilder::visitConstrainedFPIntrinsic currently treats
each constrained intrinsic like a global barrier (e.g. a function call)
and fully serializes all pending chains. This is actually not required;
it is allowed for constrained intrinsics to be reordered w.r.t one
another or (nonvolatile) memory accesses. The MI-level scheduler already
allows for that flexibility, so it makes sense to allow it at the DAG
level as well.
This patch therefore changes the way chains for constrained intrisincs
are created, and handles them basically like load operations are handled.
This has the effect that constrained intrinsics are no longer serialized
against one another or (nonvolatile) loads. They are still serialized
against stores, but that seems hard to change with the current DAG chain
setup, and it also doesn't seem to be a big problem preventing DAG
2) The OPC_CheckFoldableChainNode check requires that each of the
intermediate nodes in a multi-node pattern match only has a single use.
This check tends to fail if those intermediate nodes are strict operations
as those have a chain output that typically indeed has another use.
However, we don't really need to consider chains here at all, since they
will all be rewritten anyway by UpdateChains later. Other parts of the
matcher therefore already ignore chains, but this hasOneUse check doesn't.
This patch replaces hasOneUse by a custom test that verifies there is no
more than one use of any non-chain output value.
In theory, this change could affect code unrelated to strict FP nodes,
but at least on SystemZ I could not find any single instance of that
happening
3) The SystemZ back-end currently does not allow matching multiply-and-
extend operations (32x32 -> 64bit or 64x64 -> 128bit FP multiply) for
strict FP operations. This was not possible in the past due to the
problems described under 1) and 2) above.
With those issues fixed, it is now possible to fully support those
instructions in strict mode as well, and this patch does so.
Differential Revision: https://reviews.llvm.org/D70913
* Context *
During register coalescing, we use rematerialization when coalescing is not
possible. That means we may rematerialize a super register when only a smaller
register is actually used.
E.g.,
0B v1 = ldimm 0xFF
1B v2 = COPY v1.low8bits
2B = v2
=>
0B v1 = ldimm 0xFF
1B v2 = ldimm 0xFF
2B = v2.low8bits
Where xB are the slot indexes.
Here v2 grew from a 8-bit register to a 16-bit register.
When that happens and subregister liveness is enabled, we create subranges for
the newly created value.
E.g., before remat, the live range of v2 looked like:
main range: [1r, 2r)
(Reads v2 is defined at index 1 slot register and used before the slot register
of index 2)
After remat, it should look like:
main range: [1r, 2r)
low 8 bits: [1r, 2r)
high 8 bits: [1r, 1d) <-- dead def
I.e., the unsused lanes of v2 should be marked as dead definition.
* The Problem *
Prior to this patch, the live-ranges from the previous exampel, would have the
full live-range for all subranges:
main range: [1r, 2r)
low 8 bits: [1r, 2r)
high 8 bits: [1r, 2r) <-- too long
* The Fix *
Technically, the code that this patch changes is not wrong:
When we create the subranges for the newly rematerialized value, we create only
one subrange for the whole bit mask.
In other words, at this point v2 live-range looks like this:
main range: [1r, 2r)
low & high: [1r, 2r)
Then, it gets wrong when we call LiveInterval::refineSubRanges on low 8 bits:
main range: [1r, 2r)
low 8 bits: [1r, 2r)
high 8 bits: [1r, 2r) <-- too long
Ideally, we would like LiveInterval::refineSubRanges to be able to do the right
thing and mark the dead lanes as such. However, this is not possible, because by
the time we update / refine the live ranges, the IR hasn't been updated yet,
therefore we actually don't have enough information to do the right thing.
Another option to fix the problem would have been to call
LiveIntervals::shrinkToUses after the IR is updated. This is not desirable as
this may have a noticeable impact on compile time.
Instead, what this patch does is when we create the subranges for the
rematerialized value, we explicitly create one subrange for the lanes that were
used before rematerialization and one for the lanes that were not used. The used
one inherits the live range of the main range and the unused one is just created
empty. The existing rematerialization code then detects that the unused one are
not live and it correctly sets dead def intervals for them.
https://llvm.org/PR41372
InstCombine may synthesize FMINNUM/FMAXNUM nodes from fcmp+select
sequences (where the fcmp is marked nnan). Currently, if the
target does not otherwise handle these nodes, they'll get expanded
to libcalls to fmin/fmax. However, these functions may reside in
libm, which may introduce a library dependency that was not originally
present in the source code, potentially resulting in link failures.
To fix this problem, add code to TargetLowering::expandFMINNUM_FMAXNUM
to expand FMINNUM/FMAXNUM to a compare+select sequence instead of the
libcall. This is done only if the node is marked as "nnan"; in this case,
the expansion to compare+select is always correct. This also suffices to
catch all cases where FMINNUM/FMAXNUM was synthesized as above.
Differential Revision: https://reviews.llvm.org/D70965
// Due to the SystemZ ABI, the DWARF CFA (Canonical Frame Address) is not
// equal to the incoming stack pointer, but to incoming stack pointer plus
// 160. The getOffsetOfLocalArea() returned value is interpreted as "the
// offset of the local area from the CFA".
The immediate offsets into the Register save area returned by
getCalleeSavedSpillSlots() should take this offset into account, which this
patch makes sure of.
Patch and review by Ulrich Weigand.
https://reviews.llvm.org/D70427
Previously we mutated the node and then converted it to a libcall. But this loses the chain information.
This patch keeps the chain, but unfortunately breaks tail call optimization as the functions involved in deciding if a node is in tail call position can't handle the chain. But correct ordering seems more important to be right.
Somehow the SystemZ tests improved. I looked at one of them and it seemed that we're handling the split vector elements in a different order and that made the copies work better.
Differential Revision: https://reviews.llvm.org/D70334
This is a special calling convention to be used by the GHC compiler.
Author: Stefan Schulze Frielinghaus
Differential Revision: https://reviews.llvm.org/D69024
Demand that an immediate offset to a PC relative address fits in 32 bits, or
else load it into a register and perform a separate add.
Verify in the assembler that such immediate offsets fit the bitwidth.
Even though the final address of a Load Address Relative Long may fit in 32
bits even with a >32 bit offset (depending on where the symbol lives relative
to PC), the GNU toolchain demands the offset by itself to be in range. This
patch adapts the same behavior for llvm.
Review: Ulrich Weigand
https://reviews.llvm.org/D69749
A set of function attributes is required in any function that uses constrained
floating point intrinsics. None of our tests use these attributes.
This patch fixes this.
These tests have been tested against the IR verifier changes in D68233.
Reviewed by: andrew.w.kaylor, cameron.mcinally, uweigand
Approved by: andrew.w.kaylor
Differential Revision: https://reviews.llvm.org/D67925
llvm-svn: 373761
SystemZPostRewrite needs to be run before (it may emit COPYs) the Post-RA
pseudo pass also at -O0, so it should be added in addPostRegAlloc().
Review: Ulrich Weigand
llvm-svn: 373182
These two test cases use -march=systemz instead of a triple. In
particular, the used file format is then based on the default host
triple. This leads to different behaviour on different platforms.
The SystemZ implementation uses the integrated assembler for a
long time now. The mature-mc-support test can be fully enabled.
Differential Revision: https://reviews.llvm.org/D68129
llvm-svn: 373098
With -pg -mfentry -mnop-mcount, a nop is emitted instead of the call to
fentry.
Review: Ulrich Weigand
https://reviews.llvm.org/D67765
llvm-svn: 372950
Merge more Select pseudo instructions in emitSelect() by allowing other
instructions between them as long as they do not clobber CC.
Debug value instructions are now moved down to below the new PHIs instead of
erasing them.
Review: Ulrich Weigand
https://reviews.llvm.org/D67619
llvm-svn: 372873
The recently announced IBM z15 processor implements the architecture
already supported as "arch13" in LLVM. This patch adds support for
"z15" as an alternate architecture name for arch13.
The patch also uses z15 in a number of places where we used arch13
as long as the official name was not yet announced.
llvm-svn: 372435
Summary:
This catches malformed mir files which specify alignment as log2 instead of pow2.
See https://reviews.llvm.org/D65945 for reference,
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67433
llvm-svn: 371608
Since NaN is very rare in normal programs, so the probability for floating point unordered comparison should be extremely small. Current probability is 3/8, it is too large, this patch changes it to a tiny number.
Differential Revision: https://reviews.llvm.org/D65303
llvm-svn: 371541
Handle the remaining cases also by handling asm goto in
SystemZInstrInfo::getBranchInfo().
Review: Ulrich Weigand
https://reviews.llvm.org/D67151
llvm-svn: 371048
Now that constrained fpto[su]i intrinsic are available,
add codegen support to the SystemZ backend.
In addition to pure back-end changes, I've also needed
to add the strict_fp_to_[su]int and any_fp_to_[su]int
pattern fragments in the obvious way.
llvm-svn: 370674
This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured.
The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use.
We do see a couple of regressions that need to be addressed:
AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better).
X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG.
The code owners have confirmed its ok for these cases to fixed up in future patches.
Differential Revision: https://reviews.llvm.org/D63281
llvm-svn: 366799
Reimplement scheduling constraints for strict FP instructions in
ScheduleDAGInstrs::buildSchedGraph to allow for more relaxed
scheduling. Specifially, allow one strict FP instruction to
be scheduled across another, as long as it is not moved across
any global barrier.
Differential Revision: https://reviews.llvm.org/D64412
Reviewed By: cameron.mcinally
llvm-svn: 366222
This fixes https://bugs.llvm.org/show_bug.cgi?id=42606 by extending
D64213. Instead of only checking if the carry comes from a matching
operation, we now check the full chain of carries. Otherwise we might
custom lower the outermost addcarry, but then generically legalize
an inner addcarry.
Differential Revision: https://reviews.llvm.org/D64658
llvm-svn: 365949
This patch series adds support for the next-generation arch13
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Assembler/disassembler support for new instructions.
- CodeGen for new instructions, including new LLVM intrinsics.
- Scheduler description for the new processor.
- Detection of arch13 as host processor.
Note: No currently available Z system supports the arch13
architecture. Once new systems become available, the
official system name will be added as supported -march name.
llvm-svn: 365932
Although removeCopyByCommutingDef deals with full copies, it is still
possible to copy undef lanes and thus, we wouldn't have any a value
number for these lanes.
This fixes PR40215.
llvm-svn: 365256
Only custom lower uaddo+addcarry or usubo+subcarry chains and leave
mixtures like usubo+addcarry or uaddo+subcarry to the generic
legalizer. Otherwise we run into issues because SystemZ uses
different CC values for carries and borrows.
Fixes https://bugs.llvm.org/show_bug.cgi?id=42512.
Differential Revision: https://reviews.llvm.org/D64213
llvm-svn: 365242
This implements a small enhancement to https://reviews.llvm.org/D55506
Specifically, while we were able to match strict FP nodes for
floating-point extend operations with a register as source, this
did not work for operations with memory as source.
That is because from regular operations, this is represented as
a combined "extload" node (which is a variant of a load SD node);
but there is no equivalent using a strict FP operation.
However, it turns out that even in the absence of an extload
node, we can still just match the operations explicitly, e.g.
(strict_fpextend (f32 (load node:$ptr))
This patch implements that method to match the LDEB/LXEB/LXDB
SystemZ instructions even when the extend uses a strict-FP node.
llvm-svn: 364450
Vector load/store instructions support an optional alignment field
that the compiler can use to provide known alignment info to the
hardware. If the field is used (and the information is correct),
the hardware may be able (on some models) to perform faster memory
accesses than otherwise.
This patch adds support for alignment hints in the assembler and
disassembler, and fills in known alignment during codegen.
llvm-svn: 363806
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.
Patch by Matthias Braun
llvm-svn: 363757
Some GEPs were not being split, presumably because that split would just be
undone by the DAGCombiner. Not performing those splits can prevent important
optimizations, such as preventing the element indices / member offsets from
being (partially) folded into load/store instruction immediates. This patch:
- Makes the splits also occur in the cases where the base address and the GEP
are in the same BB.
- Ensures that the DAGCombiner doesn't reassociate them back again.
Differential Revision: https://reviews.llvm.org/D60294
llvm-svn: 363544
This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.
This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.
Reviewers: arsenm, thegameg, qcolombet
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D60137
llvm-svn: 363533
Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:
* a latch block
* it has two successors, one is loop header, another is exit
* it has more than one predecessors
If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.
Differential Revision: https://reviews.llvm.org/D43256
llvm-svn: 363471
This behavior was added in r130928 for both FastISel and SD, and then
disabled in r131156 for FastISel.
This re-enables it for FastISel with the corresponding fix.
This is triggered only when FastISel can't lower the arguments and falls
back to SelectionDAG for it.
FastISel contains a map of "register fixups" where at the end of the
selection phase it replaces all uses of a register with another
register that FastISel sometimes pre-assigned. Code at the end of
SelectionDAGISel::runOnMachineFunction is doing the replacement at the
very end of the function, while other pieces that come in before that
look through the MachineFunction and assume everything is done. In this
case, the real issue is that the code emitting COPY instructions for the
liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg
assigned to the physreg is used, and if it's not, it will skip the COPY.
If a register wasn't replaced with its assigned fixup yet, the copy will
be skipped and we'll end up with uses of undefined registers.
This fix moves the replacement of registers before the emission of
copies for the live-ins.
The initial motivation for this fix is to enable tail calls for
swiftself functions, which were blocked because we couldn't prove that
the swiftself argument (which is callee-save) comes from a function
argument (live-in), because there was an extra copy (vreg to vreg).
A few tests are affected by this:
* llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21
(callee-save) but never reload it because it's attached to the return.
We now don't even spill it anymore.
* llvm/test/CodeGen/*/swiftself.ll: we tail-call now.
* llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this
test was not really testing the right thing, but it worked because the
same registers were re-used.
* llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes
* llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy
* llvm/test/CodeGen/Mips/*: get rid of spills and copies
* llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack
* llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack
* llvm/test/CodeGen/X86/swifterror.ll: same as AArch64
* llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed
Differential Revision: https://reviews.llvm.org/D62361
llvm-svn: 362963
This patch aims to reduce spilling and register moves by using the 3-address
versions of instructions per default instead of the 2-address equivalent
ones. It seems that both spilling and register moves are improved noticeably
generally.
Regalloc hints are passed to increase conversions to 2-address instructions
which are done in SystemZShortenInst.cpp (after regalloc).
Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are
the same), foldMemoryOperandImpl() can no longer trivially fold a spilled
source register since the reg/reg instruction is now 3-address. In order to
remedy this, new 3-address pseudo memory instructions are used to perform the
folding only when the dst and lhs virtual registers are known to be allocated
to the same physreg. In order to not let MachineCopyPropagation run and
change registers on these transformed instructions (making it 3-address), a
new target pass called SystemZPostRewrite.cpp is run just after
VirtRegRewriter, that immediately lowers the pseudo to a target instruction.
If it would have been possibe to insert a COPY instruction and change a
register operand (convert to 2-address) in foldMemoryOperandImpl() while
trusting that the caller (e.g. InlineSpiller) would update/repair the
involved LiveIntervals, the solution involving pseudo instructions would not
have been needed. This is perhaps a potential improvement (see Phabricator
post).
Common code changes:
* A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a
target pass immediately before MachineCopyPropagation.
* VirtRegMap is passed as an argument to foldMemoryOperand().
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D60888
llvm-svn: 362868
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).
This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.
To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:
- A new MCID flag "mayRaiseFPException" that the target should set on any
instruction that possibly can raise FP exception according to the
architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
instruction resulting from expansion of any constrained FP intrinsic.
Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).
Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.
The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.
This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.
Reviewed By: andrew.w.kaylor
Differential Revision: https://reviews.llvm.org/D55506
llvm-svn: 362663
[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes
This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Cameron McInally, Kevin P. Neal
Approved by: Cameron McInally
Differential Revision: https://reviews.llvm.org/D62546
llvm-svn: 362241
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?
The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`
https://rise4fun.com/Alive/ffh
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62252
llvm-svn: 362143