Commit Graph

806 Commits

Author SHA1 Message Date
Jonas Paulsson 42f628c842 Reapply "[SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing."
Fixed to properly compute the live-in lists of new blocks.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D92803
2020-12-11 18:25:47 -06:00
Jonas Paulsson bc7a61b703 Revert "[SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing."
Temporarily reverted.

This reverts commit ea475c77ff.
2020-12-10 18:05:51 -06:00
Jonas Paulsson ea475c77ff [SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing.
The loop-based probing done for stack clash protection altered R1D which
corrupted the backchain value to be stored after the probing was done.

By using R0D instead for the loop exit value, R1D is not modified.

Review: Ulrich Weigand.

Differential Revision: https://reviews.llvm.org/D92803
2020-12-10 15:06:18 -06:00
Ilya Leoshkevich d58f112ce0 Prevent FENTRY_CALL reordering
FEntryInserter prepends FENTRY_CALL to the first basic block. In case
there are other instructions, PostRA Machine Instruction Scheduler can
move FENTRY_CALL call around. This actually occurs on SystemZ (see the
testcase). This is bad for the following reasons:

* FENTRY_CALL clobbers registers.
* Linux Kernel depends on whatever FENTRY_CALL expands to to be the very
  first instruction in the function.

Fix by adding isCall attribute to FENTRY_CALL, which prevents reordering
by making it a scheduling boundary for PostRA Machine Instruction
Scheduler.

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D91218
2020-12-09 00:59:01 +01:00
Layton Kifer ac522f8700 [DAGCombiner] Fold (sext (not i1 x)) -> (add (zext i1 x), -1)
Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets.

Differential Revision: https://reviews.llvm.org/D91589
2020-12-06 11:52:10 -05:00
Fangrui Song 6b6c3aaeac [test] Add explicit dso_local to function declarations in static relocation model tests
They are currently implicit because TargetMachine::shouldAssumeDSOLocal implies
dso_local.

For such function declarations, clang -fno-pic emits the dso_local specifier.
Adding explicit dso_local makes these tests align with the clang behavior and
helps implementing an option to use GOT indirection when taking the address of a
function symbol in -fno-pic (to avoid a canonical PLT entry (SHN_UNDEF with
non-zero st_value)).
2020-12-05 14:54:37 -08:00
Fangrui Song 2262b04cab [test] Add explicit dso_local to constant/global variable declarations
They are currently implicit because TargetMachine::shouldAssumeDSOLocal implies
dso_local.

For external data, clang -fno-pic emits the dso_local specifier for ELF and
non-MinGW COFF. Adding explicit dso_local makes these tests in align with the
clang behavior and helps implementing an option to use GOT indirection for
external data access in -fno-pic mode (to avoid copy relocations).
2020-12-04 13:51:01 -08:00
Matt Arsenault 20c43d6bd5 OpaquePtr: Bulk update tests to use typed sret 2020-11-20 17:58:26 -05:00
Simon Pilgrim 7a8b2f692e [DAGCombiner] Precommit Sext Tests for D91589
Patch by: @laytonio (Layton Kifer)

Differential Revision: https://reviews.llvm.org/D91671
2020-11-18 15:56:16 +00:00
Simon Pilgrim 5a7be094e3 [SystemZ] Regenerate some fp tests + remove unused check prefixes
Just use default CHECK
2020-11-11 18:38:22 +00:00
Jonas Paulsson 7c026a83ee [SystemZ] Define MaxInstLength to have the value of 6.
This value had the default value of 4 which caused branch relaxation to fail.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D90065
2020-10-24 09:19:34 +02:00
Craig Topper 9e884169a2 [FPEnv][X86][SystemZ] Use different algorithms for i64->double uint_to_fp under strictfp to avoid producing -0.0 when rounding toward negative infinity
Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes.

There are still problems with unsigned i32 conversions too which I'll try to fix in another patch.

Fixes part of PR47393

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87115
2020-10-21 18:12:54 -07:00
Jonas Paulsson 1606755da0 [SystemZ] Mark unsaved argument R6 as live throughout function.
For historical reasons, the R6 register is a callee-saved argument
register. This means that if it is used to pass an argument to a function
that does not clobber it, it is live throughout the function.

This patch makes sure that in this special case any kill flags of it are
removed.

Review: Ulrich Weigand, Eli Friedman

Differential Revision: https://reviews.llvm.org/D89451
2020-10-21 14:38:59 +02:00
Jonas Paulsson 6756d43af9 [SystemZ] Bugfix in SystemZVectorConstantInfo
In order to correctly load an all-ones FP NaN value into a floating point
register with a VGBM, the analyzed 32/64 FP bits must first be shifted left
(into element 0 of the vector register).

SystemZVectorConstantInfo has so far relied on element replication which has
bypassed the need to do this shift, but now it is clear that this must be
done in order to handle NaNs.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D89389
2020-10-14 15:34:40 +02:00
Jonas Paulsson d851495f2f [SystemZ] Use LA instead of AGR in eliminateFrameIndex().
Since AGR clobbers CC it should not be used here.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47736.

Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D89034
2020-10-09 13:06:33 +02:00
Matt Arsenault 89baeaef2f Reapply "RegAllocFast: Rewrite and improve"
This reverts commit 73a6a164b8.
2020-09-30 10:35:25 -04:00
Jonas Paulsson 75a5febe31 [SystemZ] Don't emit PC-relative memory accesses to unaligned symbols.
In the presence of packed structures (#pragma pack(1)) where elements are
referenced through pointers, there will be stores/loads with alignment values
matching the default alignments for the element types while the elements are
in fact unaligned. Strictly speaking this is incorrect source code, but is
unfortunately part of existing code and therefore now addressed.

This patch improves the pattern predicate for PC-relative loads and stores by
not only checking the alignment value of the instruction, but also making
sure that the symbol (and element) itself is aligned.

Fixes https://bugs.llvm.org/show_bug.cgi?id=44405

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D87510
2020-09-29 14:51:13 +02:00
Dávid Bolvanský 179e15d53a [SystemZ] Optimize bcmp calls (PR47420)
Solves https://bugs.llvm.org/show_bug.cgi?id=47420

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87988
2020-09-25 17:55:39 +02:00
Muhammad Omair Javaid 73a6a164b8 Revert "Reapply Revert "RegAllocFast: Rewrite and improve""
This reverts commit 55f9f87da2.

Breaks following buildbots:
http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4306
http://lab.llvm.org:8011/builders/lldb-aarch64-ubuntu/builds/9154
2020-09-22 14:40:06 +05:00
Matt Arsenault 55f9f87da2 Reapply Revert "RegAllocFast: Rewrite and improve"
This reverts commit dbd53a1f0c.

Needed lldb test updates
2020-09-21 15:45:27 -04:00
Eric Christopher dbd53a1f0c Temporarily Revert "RegAllocFast: Rewrite and improve"
as it's breaking a few tests in the lldb test suite.

Bot: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4226/steps/test/logs/stdio

This reverts commit c8757ff3aa.
2020-09-18 18:11:21 -07:00
Matt Arsenault c8757ff3aa RegAllocFast: Rewrite and improve
This rewrites big parts of the fast register allocator. The basic
strategy of doing block-local allocation hasn't changed but I tweaked
several details:

Track register state on register units instead of physical
registers. This simplifies and speeds up handling of register aliases.
Process basic blocks in reverse order: Definitions are known to end
register livetimes when walking backwards (contrary when walking
forward then uses may or may not be a kill so we need heuristics).

Check register mask operands (calls) instead of conservatively
assuming everything is clobbered.  Enhance heuristics to detect
killing uses: In case of a small number of defs/uses check if they are
all in the same basic block and if so the last one is a killing use.
Enhance heuristic for copy-coalescing through hinting: We check the
first k defs of a register for COPYs rather than relying on there just
being a single definition.  When testing this on the full llvm
test-suite including SPEC externals I measured:

average 5.1% reduction in code size for X86, 4.9% reduction in code on
aarch64. (ranging between 0% and 20% depending on the test) 0.5%
faster compiletime (some analysis suggests the pass is slightly slower
than before, but we more than make up for it because later passes are
faster with the reduced instruction count)

Also adds a few testcases that were broken without this patch, in
particular bug 47278.

Patch mostly by Matthias Braun
2020-09-18 14:05:18 -04:00
Craig Topper b1e68f885b [SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.:
This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130.

This required adding a SDNodeFlags to SelectionDAG::getSetCC.

Now we manage to contant fold some stuff undefs during the
initial getNode that we don't do in later DAG combines.

Differential Revision: https://reviews.llvm.org/D87200
2020-09-08 15:27:21 -07:00
Jonas Paulsson 6dc3e22b57 [DAGTypeLegalizer] Handle ZERO_EXTEND of promoted type in WidenVecRes_Convert.
On SystemZ, a ZERO_EXTEND of an i1 vector handled by WidenVecRes_Convert()
always ended up being scalarized, because the type action of the input is
promotion which was previously an unhandled case in this method.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47132.

Differential Revision: https://reviews.llvm.org/D86268

Patch by Eli Friedman.
Review: Ulrich Weigand
2020-09-08 16:49:51 +02:00
Jonas Paulsson 714ceefad9 [SelectionDAG] Always intersect SDNode flags during getNode() node memoization.
Previously SDNodeFlags::instersectWith(Flags) would do nothing if Flags was
in an undefined state, which is very bad given that this is the default when
getNode() is called without passing an explicit SDNodeFlags argument.

This meant that if an already existing and reused node had a flag which the
second caller to getNode() did not set, that flag would remain uncleared.

This was exposed by https://bugs.llvm.org/show_bug.cgi?id=47092, where an NSW
flag was incorrectly set on an add instruction (which did in fact overflow in
one of the two original contexts), so when SystemZElimCompare removed the
compare with 0 trusting that flag, wrong-code resulted.

There is more that needs to be done in this area as discussed here:

Differential Revision: https://reviews.llvm.org/D86871

Review: Ulrich Weigand, Sanjay Patel
2020-09-05 10:30:38 +02:00
Dávid Bolvanský 0f14b2e6cb Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 50c743fa71. Patch will be split to smaller ones.
2020-08-17 20:44:33 +02:00
Dávid Bolvanský 50c743fa71 [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 19:54:27 +02:00
Dávid Bolvanský f9264995a6 Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 44587e2f7e. Sanitizer tests need to be updated.
2020-08-13 14:37:40 +02:00
Dávid Bolvanský 44587e2f7e [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 14:23:58 +02:00
Dávid Bolvanský a0485421d2 Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 385c9d673f.
2020-08-13 12:59:15 +02:00
Dávid Bolvanský 385c9d673f [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 12:45:40 +02:00
Craig Topper ffc248f3b8 [LegalTypes] Move VSELECT node creation out of WidenVSELECTAndMask and push to 2 of the 3 callers.
One of the callers only wants the condition, but the vselect can
be simplified by getNode making it hard or impossible to retrieve
the condition.

Instead, return the condition and make the other 2 callers
responsible for creating the vselect node using the condition.
Rename the function to WidenVSELECTMask accordingly.

Differential Revision: https://reviews.llvm.org/D85468
2020-08-06 13:18:16 -07:00
Ulrich Weigand 68a80a4436 [SystemZ] Ensure -mno-vx disables any use of vector features
When passing the -vector feature to LLVM (or equivalently the
-mno-vx command line argument to clang), the intent is that
generated code must not use any vector features (in particular,
no vector registers must be used).

However, there are some cases where we still could generate
such uses; these are all related to some of the additional
vector features (like +vector-enhancements-1).  Since none
of those features are actually usable with -vector, just make
sure we disable them all if -vector is given.
2020-07-23 15:34:59 +02:00
Ulrich Weigand e9c6b63d4a [SystemZ] Simplify knownbits.ll test
The knownbits.ll test case is somewhat fragile since:
- it relies on undef inputs; and
- it operates just at the limits of the MaxRecursionDepth

This means that optimization changes may easily cause the test
to spuriously fail.  Rewrite the test so it still validates
the same thing, but in a less fragile manner.
2020-06-30 16:31:59 +02:00
Ilya Leoshkevich 6764869548 [SystemZ] Add NoMerge MIFlag
Summary:
This fixes ASan and MSan tests on SystemZ after
commit 6a822e20ce ("[ASan][MSan] Remove EmptyAsm and set the CallInst
to nomerge to avoid from merging.").

Based on commit 80e107ccd0 ("Add NoMerge MIFlag to avoid MIR branch
folding").

Reviewers: uweigand, jonpa

Reviewed By: uweigand

Subscribers: hiraditya, llvm-commits, Andreas-Krebbel

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82794
2020-06-30 12:44:45 +02:00
Jonas Paulsson ef7aad0db4 [SystemZ] Improve handling of ZERO_EXTEND_VECTOR_INREG.
Instead of doing multiple unpacks when zero extending vectors (e.g. v2i16 ->
v2i64), benchmarks have shown that it is better to do a VPERM (vector
permute) since that is only one sequential instruction on the critical path.

This patch achieves this by

1. Expand ZERO_EXTEND_VECTOR_INREG into a vector shuffle with a zero vector
   instead of (multiple) unpacks.

2. Improve SystemZ::GeneralShuffle to perform a single unpack as the last
   operation if Bytes matches it.

Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D78486
2020-06-30 09:08:10 +02:00
Fangrui Song 4cd19a6e15 [BasicAA] Rename -disable-basicaa to -disable-basic-aa to be consistent with the canonical name "basic-aa" 2020-06-26 20:55:44 -07:00
Jonas Paulsson d3f7448e3c [SystemZ] Bugfix in storeLoadCanUseBlockBinary().
Check that the MemoryVT of LoadA matches that of LoadB.

This fixes https://bugs.llvm.org/show_bug.cgi?id=46239.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D81671
2020-06-17 09:49:31 +02:00
Sam Parker 09d30cb977 [CostModel] Unify Shuffle and InsertElement Costs
Extract the existing code from getInstructionThroughput into
TTImpl::getUserCost. The duplicated code in the AMDGPU backend has
also been removed.

Differential Revision: https://reviews.llvm.org/D81448
2020-06-10 09:13:34 +01:00
Jonas Paulsson 515bfc66ea [SystemZ] Implement -fstack-clash-protection
Probing of allocated stack space is now done when this option is passed. The
purpose is to protect against the stack clash attack (see
https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D78717
2020-06-06 18:38:36 +02:00
Hans Wennborg fcc199d696 Make regcoal_remat_empty_subrange.ll test require asserts build.
The -stress-sched flag is only available when asserts are enabled.
2020-06-04 19:46:22 +02:00
Quentin Colombet ccb3c8e861 [RegisterCoalescer] Update empty subranges when rematerializing
When we rematerialize a value as part of the coalescing, we may
widen the register class of the destination register.
When this happens, updateRegDefUses may create additional subranges
to account for the wider register class.
The created subranges are empty and if they are not defined by
the rematerialized instruction we clean them up.
However, if they are defined by the rematerialized instruction but
unused, we failed to flag them as dead definition and would leave
them as empty live-range.
This is wrong because empty live-ranges don't interfere with anything,
thus if we don't fix them, we would fail to account that the
rematerialized instruction clobbers some lanes.

E.g., let us consider the following pseudo code:
def.lane_low64:reg128 = ldimm
newdef:reg32 = COPY def.lane_low64_low32

When rematerialization happens for newdef, we end up with:
newdef.lane_low64:reg128 = ldimm
 = use newdef.lane_low64_low32

Let's look at the live interval of newdef.
Before rematerialization, we would get:
newdef [defIdx, useIdx:0) 0@defIdx

Right after updateRegDefUses, newdef register class is widen to reg128
and the subrange definitions will be augmented to fill the subreg that
is used at the definition point, here lane_low64.
The resulting live interval would be:
newdef [newDefIdx, useIdx:0) 0@newDefIdx
 * lane_low64_high32 EMPTY
 * lane_low64_low32 [newDefIdx, useIdx:0)

Before this patch this would be the final status of the live interval.
Therefore we miss that lane_low64_high32 is actually live on the
definition point of newdef.

With this patch, after rematerializing, we check all the added subranges
and for the ones that are defined but empty, we flag them as dead def.
Thus, in that case, newdef would look like this:
newdef [newDefIdx, useIdx:0) 0@newDefIdx
 * lane_low64_high32 [newDefIdx, newDefIdxDead) ; <-- instead of EMPTY
 * lane_low64_low32 [newDefIdx, useIdx:0)

This fixes https://www.llvm.org/PR46154
2020-06-03 17:10:55 -07:00
Kevin P. Neal c21a4f84b0 Fix errors in use of strictfp attribute.
Errors spotted with use of: https://reviews.llvm.org/D68233
2020-05-29 12:28:14 -04:00
Jonas Paulsson b3bd0c37ec [SystemZ] Eliminate the need to create a zero vector by reusing the VPERM mask.
Try to avoid creating VGBMs by reusing the permutation mask if it contains a
zero. If the first byte was into (any byte of) a zero vector, then the first
byte of the mask can become zero and reused by putting the mask also as the
first operand. If there instead was a first-byte use of the other source
operand, then that zero index can be reused if the mask is placed as the
second operand.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D79925
2020-05-19 09:37:19 +02:00
Jonas Paulsson 31ecef7627 [SystemZ] Don't create PERMUTE nodes with an undef operand.
It's better to reuse the first source value than to use an undef second
operand, because that will make more resulting VPERMs have identical operands
and therefore MachineCSE more successful.

Review: Ulrich Weigand
2020-05-18 19:42:14 +02:00
Jonas Paulsson 57feff93a8 [SystemZ] Improve foldMemoryOperandImpl: vec->FP conversions
Use FP-mem instructions when folding reloads into single lane (W..) vector
instructions.

Only do this when all other operands of the instruction have already been
allocated to an FP (F0-F15) register.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D76705
2020-05-12 09:21:24 +02:00
Ulrich Weigand 947f78ac27 [SystemZ] Fix/optimize vec_load_len and related intrinsics
When using vec_load/store_len_r with an immediate length operand
of 16 or larger, LLVM will currently emit an VLRL/VSTRL instruction
with that immediate.  This creates a valid encoding (which should be
supported by the assembler), but always traps at runtime.  This patch
fixes this by not creating VLRL/VSTRL in those cases.

This would result in loading the length into a register and
calling VLRLR/VSTRLR instead.  However, these operations with
a length of 15 or larger are in fact simply equivalent to a
full vector load or store.  And in fact the same holds true for
vec_load/store_len as well.

Therefore, add a DAGCombine rule to replace those operations with
plain vector loads or stores if the length is known at compile
time and equal or larger to 15.
2020-05-06 21:15:58 +02:00
Jonas Paulsson c84461ba8d [SystemZ] Fix test case.
Remove bad kill flags fom load-and-test.mir as discovered by
https://reviews.llvm.org/D78586: "[MachineVerifier] Add more checks for
registers in live-in lists".

Review: Ulrich Weigand
2020-04-28 09:43:03 +02:00
Jonas Paulsson 036242b868 [SystemZ] Bugfix in adjustSubwordCmp()
adjustSubwordCmp() should not optimize a load of an i1 value. This is
achieved by checking that the size and store-size of the MemoryVT are the
same.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45511.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D78187
2020-04-15 12:58:39 +02:00
Jonas Paulsson 36d4421f50 [LoopDataPrefetch + SystemZ] Let target decide on prefetching for each loop.
This patch adds

- New arguments to getMinPrefetchStride() to let the target decide on a
  per-loop basis if software prefetching should be done even with a stride
  within the limit of the hw prefetcher.

- New TTI hook enableWritePrefetching() to let a target do write prefetching
  by default (defaults to false).

- In LoopDataPrefetch:

  - A search through the whole loop to gather information before emitting any
    prefetches. This way the target can get information via new arguments to
    getMinPrefetchStride() and emit prefetches more selectively. Collected
    information includes: Does the loop have a call, how many memory
    accesses, how many of them are strided, how many prefetches will cover
    them. This is NFC to before as long as the target does not change its
    definition of getMinPrefetchStride().

  - If a previous access to the same exact address was 'read', and the
    current one is 'write', make it a 'write' prefetch.

  - If two accesses that are covered by the same prefetch do not dominate
    each other, put the prefetch in a block that dominates both of them.

  - If a ConstantMaxTripCount is less than ItersAhead, then skip the loop.

- A SystemZ implementation of getMinPrefetchStride().

Review: Ulrich Weigand, Michael Kruse

Differential Revision: https://reviews.llvm.org/D70228
2020-04-02 14:57:46 +02:00
Jonas Paulsson f481d48893 [SystemZ] Improve foldMemoryOperandImpl().
Fold MS(G)RKC -> MS(G)C.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D76771
2020-03-31 17:17:51 +02:00
Jonas Paulsson f09b891d4a [SystemZ] Improve foldMemoryOperandImpl()
A spilled load of an immediate can use MVHI/MVGHI instead.
A compare of a spilled register against an immediate can use CHSI/CGHSI.
A logical compare can use CLFHSI/CLGHSI.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D76055
2020-03-25 16:21:08 +01:00
Jinsong Ji 816ad48c82 [NFC][RUIP] Small debug output refine
Add a new line, so that we always print MI in a new line,
before and after UpdateRegMask, for easier check..
2020-03-24 03:29:45 +00:00
Jonas Paulsson 9adc7fc3cd [SystemZ] Perform instruction shortening for fused fp ops.
Replace single-lane (W... form) vector "multiply and add" and "multiply and
subtract" instructions with equivalent floating point instructions whenever
possible in SystemZShortenInst.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D76370
2020-03-23 14:12:13 +01:00
Simon Pilgrim 68224c1952 [TargetLowering] Only demand a rotation's modulo amount bits
ISD::ROTL/ROTR rotation values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits.

Differential Revision: https://reviews.llvm.org/D76201
2020-03-17 21:23:46 +00:00
Jonas Paulsson 132f25bcca [SystemZ] Avoid scalarization of [SU]INT_TO_FP ISD-nodes.
The type legalizer will scalarize vector conversions from integer to floating
point if the source element size is less than that of the result.

This is avoided now by inserting a zero/sign-extension of the source vector
before type legalization.

Review: Ulrich Weigand

Differential revision: https://reviews.llvm.org/D75978
2020-03-16 13:07:42 +01:00
Simon Pilgrim 775bf62698 [SystemZ] Regenerate rotate/shift tests 2020-03-15 16:42:46 +00:00
Jonas Paulsson 62ff9960d3 [SystemZ] Improve foldMemoryOperandImpl().
Swap the compare operands if LHS is spilled while updating the CCMask:s of
the CC users. This is relatively straight forward since the live-in lists for
the CC register can be assumed to be correct during register allocation
(thanks to 659efa2).

Also fold a spilled operand of an LOCR/SELR into an LOC(G).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D67437
2020-03-10 15:54:47 +01:00
Jonas Paulsson ae4d39c9e4 [SystemZ] Copy Access registers and CC with the correct register class.
On SystemZ there are a set of "access registers" that can be copied in and
out of 32-bit GPRs with special instructions. These instructions can only
perform the copy using low 32-bit parts of the 64-bit GPRs. However, the
default register class for 32-bit integers is GRX32, which also contains the
high 32-bit part registers.

In order to never end up with a case of such a COPY into a high reg, this
patch adds a new simple pre-RA pass that selects such COPYs into target
instructions.

This pass also handles COPYs from CC (Condition Code register), and COPYs to
CC can now also be emitted from a high reg in copyPhysReg().

Fixes: https://bugs.llvm.org/show_bug.cgi?id=44254

Review: Ulrich Weigand.

Differential Revision: https://reviews.llvm.org/D75014
2020-03-03 16:41:09 +01:00
Jonas Paulsson 237625757a [SystemZ] Bugfix for backchain with packed-stack
The incoming back chain slot was implicitly allocated whenever a GPR was
saved in SystemZFrameLowering::getRegSpillOffset(), but in cases where no
GPRs were saved/restored this did not take effect.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D75367
2020-03-03 15:03:01 +01:00
Jonas Paulsson cdcce3cabf [SystemZ] Also accept ISD::USUBO in shouldFormOverflowOp().
Forming subtract with overflow is beneficial on SystemZ, just like additions.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D75290
2020-03-03 14:38:57 +01:00
Craig Topper a5fa778882 [LegalizeTypes] Scalarize non-byte sized loads in WidenRecRes_Load and SplitVecResLoad
Should fix PR42803 and PR44902

Differential Revision: https://reviews.llvm.org/D74590
2020-02-24 15:14:33 -08:00
Jonas Paulsson 82879c2913 [SystemZ] Support the kernel back chain.
In order to build the Linux kernel, the back chain must be supported with
packed-stack. The back chain is then stored topmost in the register save
area.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D74506
2020-02-23 13:42:36 -08:00
Jonas Paulsson 0eddeeab29 [ValueTracking] Improve isKnownNonNaN() to recognize zero splats.
isKnownNonNaN() could not recognize a zero splat because that is a
ConstantAggregateZero which is-a ConstantData but not a ConstantDataVector.

Patch makes a ConstantAggregateZero return true.

Review: Thomas Lively

Differential Revision: https://reviews.llvm.org/D74263
2020-02-19 09:35:36 -08:00
Simon Pilgrim 7a554270c0 [SystemZ] Regenerate risbg tests. NFCI.
Pre-commit for some upcoming SimplifyDemandedBits bitrotate handling.
2020-02-19 16:39:28 +00:00
James Clarke b3cd44f80b Use SETNE directly rather than SUB/SETNE 0 for stack guard check
Summary:
Backends should fold the subtraction into the comparison, but not all
seem to. Moreover, on targets where pointers are not integers, such as
CHERI, an integer subtraction is not appropriate. Instead we should just
compare the two pointers directly, as this should work everywhere and
potentially generate more efficient code.

Reviewers: bogner, lebedev.ri, efriedma, t.p.northover, uweigand, sunfish

Reviewed By: lebedev.ri

Subscribers: dschuff, sbc100, arichardson, jgravelle-google, hiraditya, aheejin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74454
2020-02-18 13:21:26 +00:00
Fangrui Song 6b14814e10 [AsmPrinter] Omit unique ID for .stack_sizes
Follow-up for D74006.
2020-02-14 21:25:06 -08:00
Yuanfang Chen 4ad7685258 Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`"""
This reverts commit 80a34ae311 with fixes.

Previously, since bots turning on EXPENSIVE_CHECKS are essentially turning on
MachineVerifierPass by default on X86 and the fact that
inline-asm-avx-v-constraint-32bit.ll and inline-asm-avx512vl-v-constraint-32bit.ll
are not expected to generate functioning machine code, this would go
down to `report_fatal_error` in MachineVerifierPass. Here passing
`-verify-machineinstrs=0` to make the intent explicit.
2020-02-13 10:16:06 -08:00
Yuanfang Chen 17122ec10a Revert "Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`""""
This reverts commit bb51d24330.
2020-02-13 10:08:05 -08:00
Yuanfang Chen bb51d24330 Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`"""
This reverts commit 80a34ae311 with fixes.

On bots llvm-clang-x86_64-expensive-checks-ubuntu and
llvm-clang-x86_64-expensive-checks-debian only,
llc returns 0 for these two tests unexpectedly. I tweaked the RUN line a little
bit in the hope that LIT is the culprit since this change is not in the
codepath these tests are testing.
llvm\test\CodeGen\X86\inline-asm-avx-v-constraint-32bit.ll
llvm\test\CodeGen\X86\inline-asm-avx512vl-v-constraint-32bit.ll
2020-02-13 10:02:53 -08:00
Yuanfang Chen 80a34ae311 Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`""
This reverts commit rGcd5b308b828e, rGcd5b308b828e, rG8cedf0e2994c.

There are issues to be investigated for polly bots and bots turning on
EXPENSIVE_CHECKS.
2020-02-11 20:41:53 -08:00
Yuanfang Chen 8cedf0e299 Reland "[Support] make report_fatal_error `abort` instead of `exit`"
Summary:
Reland D67847 after D73742 is committed. Replace `sys::Process::Exit(1)`
with `abort` in `report_fatal_error`.

After this patch, for tools turning on `CrashRecoveryContext`,
crash handler installed by `CrashRecoveryContext` is called unless
they installed a non-returning handler using `llvm::install_fatal_error_handler`
like `cc1_main` currently does.

Reviewers: rnk, MaskRay, aganea, hans, espindola, jhenderson

Subscribers: jholewinski, qcolombet, dschuff, jyknight, emaste, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, rupprecht, jocewei, jsji, Jim, dmgreen, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74456
2020-02-11 18:20:40 -08:00
Jonas Paulsson 509bac030a [SystemZ] Fix new test case for expensive checks.
It needs 'tracksRegLiveness: true' to pass the machine verifier.
2020-02-11 11:33:41 -05:00
Jonas Paulsson 0311e28e9c [SystemZ] Bugfix in emitSelect()
When more than one SelectPseudo instruction is handled a new MBB is
returned. This must not be done if that would result in leaving an undhandled
isel pseudo behind in the original MBB.

Fixes https://bugs.llvm.org/show_bug.cgi?id=44849.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D74352
2020-02-11 10:41:01 -05:00
Jonas Paulsson fcdb99e0b5 [SystemZ] Add a subtarget cache like some other targets already have.
Each function is with this compiled with the SystemZSubtarget initialized
from the functions attributes.

Review: Ulrich Weigand.

Differential Revision: https://reviews.llvm.org/D74086
2020-02-10 13:10:58 -05:00
Kai Nacke 34946dfd79 [SystemZ] Add implementation for the intrinsic llvm.read_register
This change implements the llvm intrinsic llvm.read_register for
the SystemZ platform which returns the value of the specified
register
(http://llvm.org/docs/LangRef.html#llvm-read-register-and-llvm-write-register-intrinsics).
This implementation returns the value of the stack register, and
can be extended to return the value of other registers. The
implementation for this intrinsic exists on various other platforms
including Power, x86, ARM, etc. but missing on SystemZ.

Reviewers: uweigand

Differential Revision: https://reviews.llvm.org/D73378
2020-02-10 08:19:10 -05:00
Jinsong Ji 01edae1271 [AsmPrinter] Print FP constant in hexadecimal form instead
Printing floating point number in decimal is inconvenient for humans.
Verbose asm output will print out floating point values in comments, it
helps.

But in lots of cases, users still need additional work to covert the
decimal back to hex or binary to check the bit patterns,
especially when there are small precision difference.

Hexadecimal form is one of the supported form in LLVM IR, and easier for
debugging.

This patch try to print all FP constant in hex form instead.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D73566
2020-02-07 16:00:55 +00:00
Jonas Paulsson 4a3760d2ba [SystemZ] Improve handling of inline asm constraints.
The "{=v0}" constraint did not result in the expected error message in the
abscence of the vector facility, because 'v0' matches as a string into the
AnyRegBitRegClass in common code.

This patch adds checks for vector support in case of "{v" and soft-float in
case of "{f" to remedy this.

Review: Ulrich Weigand.
2020-02-05 17:04:16 -05:00
Jonas Paulsson e943329ba0 [SystemZ] Add 'REQUIRES:' or '-mtriple' to some newly added tests.
Needed to fix buildbots.
2020-02-04 10:52:10 -05:00
Jonas Paulsson 563e84790f [SystemZ] Support -msoft-float
This is needed when building the Linux kernel.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D72189
2020-02-04 10:32:45 -05:00
Krzysztof Parzyszek 020041d99b Update spelling of {analyze,insert,remove}Branch in strings and comments
These names have been changed from CamelCase to camelCase, but there were
many places (comments mostly) that still used the old names.

This change is NFC.
2020-01-21 10:15:38 -06:00
Yuanfang Chen 6e24c6037f Revert "[Support] make report_fatal_error `abort` instead of `exit`"
This reverts commit 647c3f4e47.

Got bots failure from sanitizer-windows and maybe others.
2020-01-15 17:52:25 -08:00
Yuanfang Chen 647c3f4e47 [Support] make report_fatal_error `abort` instead of `exit`
Summary:
This patch could be treated as a rebase of D33960. It also fixes PR35547.
A fix for `llvm/test/Other/close-stderr.ll` is proposed in D68164. Seems
the consensus is that the test is passing by chance and I'm not
sure how important it is for us. So it is removed like in D33960 for now.
The rest of the test fixes are just adding `--crash` flag to `not` tool.

** The reason it fixes PR35547 is

`exit` does cleanup including calling class destructor whereas `abort`
does not do any cleanup. In multithreading environment such as ThinLTO or JIT,
threads may share states which mostly are ManagedStatic<>. If faulting thread
tearing down a class when another thread is using it, there are chances of
memory corruption. This is bad 1. It will stop error reporting like pretty
stack printer; 2. The memory corruption is distracting and nondeterministic in
terms of error message, and corruption type (depending one the timing, it
could be double free, heap free after use, etc.).

Reviewers: rnk, chandlerc, zturner, sepavloff, MaskRay, espindola

Reviewed By: rnk, MaskRay

Subscribers: wuzish, jholewinski, qcolombet, dschuff, jyknight, emaste, sdardis, nemanjai, jvesely, nhaehnle, sbc100, arichardson, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, lenary, s.egerton, pzheng, cfe-commits, MaskRay, filcab, davide, MatzeB, mehdi_amini, hiraditya, steven_wu, dexonsmith, rupprecht, seiya, llvm-commits

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D67847
2020-01-15 17:05:13 -08:00
Ulrich Weigand 81ee484484 [FPEnv] Fix chain handling regression after 04a8696
Code in getRoot made the assumption that every node in PendingLoads
must always itself have a dependency on the current DAG root node.

After the changes in 04a8696, it turns out that this assumption no
longer holds true, causing wrong codegen in some cases (e.g. stores
after constrained FP intrinsics might get deleted).

To fix this, we now need to make sure that the TokenFactor created
by getRoot always includes the previous root, if there is no implicit
dependency already present.

The original getControlRoot code already has exactly this check,
so this patch simply reuses that code now for getRoot as well.
This fixes the regression.

NFC if no constrained FP intrinsic is present.
2020-01-14 14:10:57 +01:00
Ulrich Weigand 04a86966fb [FPEnv] Fix chain handling for fpexcept.strict nodes
We need to ensure that fpexcept.strict nodes are not optimized away even if
the result is unused. To do that, we need to chain them into the block's
terminator nodes, like already done for PendingExcepts.

This patch adds two new lists of pending chains, PendingConstrainedFP and
PendingConstrainedFPStrict to hold constrained FP intrinsic nodes without
and with fpexcept.strict markers. This allows not only to solve the above
problem, but also to relax chains a bit further by no longer flushing all
FP nodes before a store or other memory access. (They are still flushed
before nodes with other side effects.)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D72341
2020-01-13 14:38:49 +01:00
Ulrich Weigand b51fa8670f [SystemZ] Fix matching another pattern for nxgrk (PR44496)
SystemZDAGToDAGISel::Select will attempt to split logical instruction
with a large immediate constant.  This must not happen if the result
matches one of the z15 combined operations, so the code checks for
those.  However, one of them was missed, causing invalid code to
be generated in the test case for PR44496.
2020-01-09 19:06:22 +01:00
Ulrich Weigand 14cd4a5b32 [SystemZ] Extend fp-strict-alias test case
Explicitly add test for fpexcept.maytrap intrinsics.
2020-01-07 12:44:51 +01:00
Ulrich Weigand 4814b68b7a [SystemZ] Fix python failure in test case
With recent Python the Large/spill-02.py test failed with an error:
TypeError: can't multiply sequence by non-int of type 'float'
2020-01-07 10:26:37 +01:00
Jonas Paulsson 982695c069 [SystemZ] Create brcl 0,0 instead of brcl 0,3 in EmitNop for 6 bytes.
For consistency with GCC, the target label is moved to the brcl itself
instead of the next instruction.

Review: Ulrich Weigand
2020-01-02 13:21:04 -08:00
Fangrui Song a36ddf0aa9 Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351 2019-12-24 16:27:51 -08:00
Fangrui Song 502a77f125 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24 15:57:33 -08:00
Ulrich Weigand 0d3f782e41 [FPEnv][X86] More strict int <-> FP conversion fixes
Fix several several additional problems with the int <-> FP conversion
logic both in common code and in the X86 target. In particular:

- The STRICT_FP_TO_UINT expansion emits a floating-point compare. This
  compare can raise exceptions and therefore needs to be a strict compare.
  I've made it signaling (even though quiet would also be correct) as
  signaling is the more usual default for an LT. This code exists both
  in common code and in the X86 target.

- The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode:
  it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one
  of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP
  that ends up not chosen. I've fixed the algorithm to use only a single
  STRICT_SINT_TO_FP instead.

- The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do
  the wrong thing because it calls getOperationAction using the result VT.
  But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to
  be called using the operand VT.

- Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate
  STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily.

Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D71840
2019-12-23 21:11:45 +01:00
Jonas Paulsson 9fcebad5e5 [SystemZ] Add a mapping from "select register" to "load on condition" (2-addr).
The SELR(Mux) instructions can be converted to two-address form as LOCR(Mux)
instructions whenever one of the sources are the same reg as dest. By adding
this mapping in getTwoOperandOpcode(), we get:

- Two-address hints in getRegAllocationHints() for select register
  instructions.

- No need anymore for special handling in SystemZShortenInst.cpp -
  shortenSelect() removed.

The two-address hints are now added before the GRX32 hints, which should be
preferred.

Review: Ulrich Weigand
https://reviews.llvm.org/D68870
2019-12-20 10:44:58 -08:00
Jonas Paulsson 3174683e21 [SystemZ] Bugfix and improve the handling of CC values.
It was recently discovered that the handling of CC values was actually broken
since overflow was not properly handled ('nsw' flag not checked for).

Add and sub instructions now have a new target specific instruction flag
named SystemZII::CCIfNoSignedWrap. It means that the CC result can be used
instead of a compare with 0, but only if the instruction has the 'nsw' flag
set.

This patch also adds the improvements of conversion to logical instructions
and the analyzing of add with immediates, to be able to eliminate more
compares.

Review: Ulrich Weigand
https://reviews.llvm.org/D66868
2019-12-20 10:20:23 -08:00
Ulrich Weigand ede8293d7d [SystemZ][FPEnv] Enable strict vector FP extends/truncations
The back-end currently has special DAGCombine code to detect
cases where two floating-point extend or truncate operations
can be combined into a single vector operation.

This patch extends that support to also handle strict FP operations.

Note that currently only the case where both operations have the
same input chain are supported.  This already suffices to cover
the common case where the operations result from scalarizing a
non-legal vector type.  More general cases can be supported in
the future.
2019-12-20 15:36:56 +01:00
Jonas Paulsson 6be1578895 [SystemZ] Recognize mrecord-mcount in backend
Emit the __mcount_loc section for all fentry calls.

Review: Ulrich Weigand
https://reviews.llvm.org/D71629
2019-12-19 09:06:18 -08:00
Ulrich Weigand 1946461344 [FPEnv] Strict versions of llvm.minimum/llvm.maximum
Add new intrinsics
   llvm.experimental.constrained.minimum
   llvm.experimental.constrained.maximum
as strict versions of llvm.minimum and llvm.maximum.

Includes SystemZ back-end support.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D71624
2019-12-18 21:35:28 +01:00
Jonas Paulsson ca520592c0 [Clang FE, SystemZ] Don't add "true" value for the "mnop-mcount" attribute.
Let the "mnop-mcount" function attribute simply be present or non-present.
Update SystemZ backend as well to use hasFnAttribute() instead.

Review: Ulrich Weigand
https://reviews.llvm.org/D71669
2019-12-18 11:04:13 -08:00
Ulrich Weigand 1e89188d35 [FPEnv] Remove unnecessary rounding mode argument for constrained intrinsics
The following intrinsics currently carry a rounding mode metadata argument:

    llvm.experimental.constrained.minnum
    llvm.experimental.constrained.maxnum
    llvm.experimental.constrained.ceil
    llvm.experimental.constrained.floor
    llvm.experimental.constrained.round
    llvm.experimental.constrained.trunc

This is not useful since the semantics of those intrinsics do not in any way
depend on the rounding mode. In similar cases, other constrained intrinsics
do not have the rounding mode argument. Remove it here as well.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D71218
2019-12-17 21:10:36 +01:00
Ulrich Weigand d1c0f14be8 [SystemZ][FPEnv] Back-end support for STRICT_[SU]INT_TO_FP
As of b1d8576 there is middle-end support for STRICT_[SU]INT_TO_FP,
so this patch adds SystemZ back-end support as well.

The patch is SystemZ target specific except for adding SD patterns
strict_[su]int_to_fp and any_[su]int_to_fp to TargetSelectionDAG.td
as usual.
2019-12-17 18:24:05 +01:00
Jonas Paulsson 49f55dda01 [SystemZ] Improve verification of MachineOperands.
Now that the machine verifier will check for cases of register/immediate
MachineOperands and their correspondence to the MC instruction descriptor,
this patch adds the operand types to the descriptors where they were
previously missing. All MCOI::OPERAND_UNKNOWN operand types have been handled
to get a known type, except for G_... (global isel) instructions.

Review: Ulrich Weigand
https://reviews.llvm.org/D71494
2019-12-16 09:51:54 -08:00
Jonas Paulsson 61f5ba5c32 [SystemZ] Implement the packed stack layout
Any llvm function with the "packed-stack" attribute will be compiled to use
the packed stack layout which reuses unused parts of the incoming register
save area. This is needed for building the Linux kernel.

Review: Ulrich Weigand
https://reviews.llvm.org/D70821
2019-12-12 10:26:03 -08:00
Ulrich Weigand 5ad67df988 [SystemZ] Add llvm.minimum / llvm.maximum tests
The backend already supports the @llvm.minimum and @llvm.maximum
intrinsics, but we had no test cases for those.  Add tests.
2019-12-11 17:01:13 +01:00
Ulrich Weigand ac473394ff [SystemZ] Fix 128-bit strict FMA expansion pre-z14
Before z14, we did not have any FMA instruction for 128-bit
floating-point, so the @llvm.fma.f128 intrinsic needs to be
expanded to a libcall on those platforms.

This worked correctly for regular FMA, but was implemented
incorrectly for the strict version.  This was not noticed
because we did not have test coverage for this case.

This patch fixes that incorrect expansion and adds the
missing test cases.
2019-12-11 16:32:08 +01:00
Ulrich Weigand 9db13b5a7d [FPEnv] Constrained FCmp intrinsics
This adds support for constrained floating-point comparison intrinsics.

Specifically, we add:

      declare <ty2>
      @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
                                          metadata <condition code>,
                                          metadata <exception behavior>)
      declare <ty2>
      @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
                                           metadata <condition code>,
                                           metadata <exception behavior>)

The first variant implements an IEEE "quiet" comparison (i.e. we only
get an invalid FP exception if either argument is a SNaN), while the
second variant implements an IEEE "signaling" comparison (i.e. we get
an invalid FP exception if either argument is any NaN).

The condition code is implemented as a metadata string.  The same set
of predicates as for the fcmp instruction is supported (except for the
"true" and "false" predicates).

These new intrinsics are mapped by SelectionDAG codegen onto two new
ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again
representing quiet vs. signaling comparison operations.  Otherwise
those nodes look like SETCC nodes, with an additional chain argument
and result as usual for strict FP nodes.  The patch includes support
for the common legalization operations for those nodes.

The patch also includes full SystemZ back-end support for the new
ISD nodes, mapping them to all available SystemZ instruction to
fully implement strict semantics (scalar and vector).

Differential Revision: https://reviews.llvm.org/D69281
2019-12-07 11:28:39 +01:00
Craig Topper 28b573d249 [TargetLowering] Fix another potential FPE in expandFP_TO_UINT
D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence:

// Sel = Src < 0x8000000000000000
// Val = select Sel, Src, Src - 0x8000000000000000
// Ofs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val) ^ Ofs
The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.)

Instead, I'd suggest to use the following sequence:

// Sel = Src < 0x8000000000000000
// FltOfs = select Sel, 0, 0x8000000000000000
// IntOfs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val - FltOfs) ^ IntOfs
In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway).

In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit.

There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.)

Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper

Differential Revision: https://reviews.llvm.org/D67105
2019-12-06 14:11:04 -08:00
Ulrich Weigand daee549b17 [FPEnv][SelectionDAG] Relax chain requirements
This patch implements the following changes:

1) SelectionDAGBuilder::visitConstrainedFPIntrinsic currently treats
each constrained intrinsic like a global barrier (e.g. a function call)
and fully serializes all pending chains. This is actually not required;
it is allowed for constrained intrinsics to be reordered w.r.t one
another or (nonvolatile) memory accesses. The MI-level scheduler already
allows for that flexibility, so it makes sense to allow it at the DAG
level as well.

This patch therefore changes the way chains for constrained intrisincs
are created, and handles them basically like load operations are handled.
This has the effect that constrained intrinsics are no longer serialized
against one another or (nonvolatile) loads. They are still serialized
against stores, but that seems hard to change with the current DAG chain
setup, and it also doesn't seem to be a big problem preventing DAG

2) The OPC_CheckFoldableChainNode check requires that each of the
intermediate nodes in a multi-node pattern match only has a single use.
This check tends to fail if those intermediate nodes are strict operations
as those have a chain output that typically indeed has another use.
However, we don't really need to consider chains here at all, since they
will all be rewritten anyway by UpdateChains later. Other parts of the
matcher therefore already ignore chains, but this hasOneUse check doesn't.

This patch replaces hasOneUse by a custom test that verifies there is no
more than one use of any non-chain output value.

In theory, this change could affect code unrelated to strict FP nodes,
but at least on SystemZ I could not find any single instance of that
happening

3) The SystemZ back-end currently does not allow matching multiply-and-
extend operations (32x32 -> 64bit or 64x64 -> 128bit FP multiply) for
strict FP operations.  This was not possible in the past due to the
problems described under 1) and 2) above.

With those issues fixed, it is now possible to fully support those
instructions in strict mode as well, and this patch does so.

Differential Revision: https://reviews.llvm.org/D70913
2019-12-06 11:02:11 +01:00
Quentin Colombet 2ec71ea7c7 [RegisterCoalescer] Fix the creation of subranges when rematerialization is used
* Context *

During register coalescing, we use rematerialization when coalescing is not
possible. That means we may rematerialize a super register when only a smaller
register is actually used.
E.g.,
0B v1 = ldimm 0xFF
1B v2 = COPY v1.low8bits
2B   = v2
=>
0B v1 = ldimm 0xFF
1B v2 = ldimm 0xFF
2B   = v2.low8bits

Where xB are the slot indexes.
Here v2 grew from a 8-bit register to a 16-bit register.

When that happens and subregister liveness is enabled, we create subranges for
the newly created value.
E.g., before remat, the live range of v2 looked like:
main range: [1r, 2r)
(Reads v2 is defined at index 1 slot register and used before the slot register
of index 2)

After remat, it should look like:
main range: [1r, 2r)
low 8 bits: [1r, 2r)
high 8 bits: [1r, 1d) <-- dead def

I.e., the unsused lanes of v2 should be marked as dead definition.

* The Problem *

Prior to this patch, the live-ranges from the previous exampel, would have the
full live-range for all subranges:
main range: [1r, 2r)
low 8 bits: [1r, 2r)
high 8 bits: [1r, 2r) <-- too long

* The Fix *

Technically, the code that this patch changes is not wrong:
When we create the subranges for the newly rematerialized value, we create only
one subrange for the whole bit mask.
In other words, at this point v2 live-range looks like this:
main range: [1r, 2r)
low & high: [1r, 2r)

Then, it gets wrong when we call LiveInterval::refineSubRanges on low 8 bits:
main range: [1r, 2r)
low 8 bits: [1r, 2r)
high 8 bits: [1r, 2r) <-- too long

Ideally, we would like LiveInterval::refineSubRanges to be able to do the right
thing and mark the dead lanes as such. However, this is not possible, because by
the time we update / refine the live ranges, the IR hasn't been updated yet,
therefore we actually don't have enough information to do the right thing.

Another option to fix the problem would have been to call
LiveIntervals::shrinkToUses after the IR is updated. This is not desirable as
this may have a noticeable impact on compile time.

Instead, what this patch does is when we create the subranges for the
rematerialized value, we explicitly create one subrange for the lanes that were
used before rematerialization and one for the lanes that were not used. The used
one inherits the live range of the main range and the unused one is just created
empty. The existing rematerialization code then detects that the unused one are
not live and it correctly sets dead def intervals for them.

https://llvm.org/PR41372
2019-12-05 16:32:30 -08:00
Ulrich Weigand c3d05c1b52 [SelectionDAG] Expand nnan FMINNUM/FMAXNUM to select sequence
InstCombine may synthesize FMINNUM/FMAXNUM nodes from fcmp+select
sequences (where the fcmp is marked nnan).  Currently, if the
target does not otherwise handle these nodes, they'll get expanded
to libcalls to fmin/fmax.  However, these functions may reside in
libm, which may introduce a library dependency that was not originally
present in the source code, potentially resulting in link failures.

To fix this problem, add code to TargetLowering::expandFMINNUM_FMAXNUM
to expand FMINNUM/FMAXNUM to a compare+select sequence instead of the
libcall. This is done only if the node is marked as "nnan"; in this case,
the expansion to compare+select is always correct. This also suffices to
catch all cases where FMINNUM/FMAXNUM was synthesized as above.

Differential Revision: https://reviews.llvm.org/D70965
2019-12-04 10:32:35 +01:00
Jonas Paulsson a7d3f6933d [SystemZ] Return the right offsets from getCalleeSavedSpillSlots().
// Due to the SystemZ ABI, the DWARF CFA (Canonical Frame Address) is not
// equal to the incoming stack pointer, but to incoming stack pointer plus
// 160.  The getOffsetOfLocalArea() returned value is interpreted as "the
// offset of the local area from the CFA".

The immediate offsets into the Register save area returned by
getCalleeSavedSpillSlots() should take this offset into account, which this
patch makes sure of.

Patch and review by Ulrich Weigand.
https://reviews.llvm.org/D70427
2019-11-25 19:03:05 +01:00
Ulrich Weigand 97743089bf [SystemZ] Avoid mixing strict and non-strict FP operations in tests
This is to prepare for having the IR verifier reject mixed functions.
Note that fp-strict-mul-02.ll and fp-strict-mul-04.ll still remain
to be fixed.
2019-11-20 19:51:30 +01:00
Ulrich Weigand ac37755c60 [SystemZ] Use fneg in test cases
Now that we have fneg, prefer using it over "fsub -0.0, ...".
This helps in particular with strict FP tests, as fneg does
not raise any exceptions.
2019-11-20 19:08:27 +01:00
Craig Topper 6e20d70a69 [LegalizeDAG] Convert strict fp nodes to libcalls without losing the chain.
Previously we mutated the node and then converted it to a libcall. But this loses the chain information.

This patch keeps the chain, but unfortunately breaks tail call optimization as the functions involved in deciding if a node is in tail call position can't handle the chain. But correct ordering seems more important to be right.

Somehow the SystemZ tests improved. I looked at one of them and it seemed that we're handling the split vector elements in a different order and that made the copies work better.

Differential Revision: https://reviews.llvm.org/D70334
2019-11-18 11:24:08 -08:00
David Zarzycki a9018fddf9 [X86] Add more add/sub carry tests
Preparation for: https://reviews.llvm.org/D70079

https://reviews.llvm.org/D70077
2019-11-12 11:36:59 +02:00
Ulrich Weigand 22f9429149 [SystemZ] Add GHC calling convention
This is a special calling convention to be used by the GHC compiler.

Author: Stefan Schulze Frielinghaus
Differential Revision: https://reviews.llvm.org/D69024
2019-11-04 13:45:51 +01:00
Jonas Paulsson 580310ff0c [SystemZ] Improve handling of huge PC relative immediate offsets.
Demand that an immediate offset to a PC relative address fits in 32 bits, or
else load it into a register and perform a separate add.

Verify in the assembler that such immediate offsets fit the bitwidth.

Even though the final address of a Load Address Relative Long may fit in 32
bits even with a >32 bit offset (depending on where the symbol lives relative
to PC), the GNU toolchain demands the offset by itself to be in range. This
patch adapts the same behavior for llvm.

Review: Ulrich Weigand
https://reviews.llvm.org/D69749
2019-11-04 10:38:18 +01:00
Kevin P. Neal 68b8052121 [FPEnv] Strict FP tests should use the requisite function attributes.
A set of function attributes is required in any function that uses constrained
floating point intrinsics. None of our tests use these attributes.

This patch fixes this.

These tests have been tested against the IR verifier changes in D68233.

Reviewed by:	andrew.w.kaylor, cameron.mcinally, uweigand
Approved by:	andrew.w.kaylor
Differential Revision:	https://reviews.llvm.org/D67925

llvm-svn: 373761
2019-10-04 17:03:46 +00:00
Jonas Paulsson e794c049b3 [SystemZ] Add SystemZPostRewrite in addPostRegAlloc() instead at -O0.
SystemZPostRewrite needs to be run before (it may emit COPYs) the Post-RA
pseudo pass also at -O0, so it should be added in addPostRegAlloc().

Review: Ulrich Weigand
llvm-svn: 373182
2019-09-30 07:29:54 +00:00
Kai Nacke d8e38b9b88 Change -march=systemz to triple and fix test
These two test cases use -march=systemz instead of a triple. In
particular, the used file format is then based on the default host
triple. This leads to different behaviour on different platforms.

The SystemZ implementation uses the integrated assembler for a
long time now. The mature-mc-support test can be fully enabled.

Differential Revision: https://reviews.llvm.org/D68129

llvm-svn: 373098
2019-09-27 16:19:15 +00:00
Jonas Paulsson 6e504d7706 [SystemZ] Recognize mnop-mcount in backend
With -pg -mfentry -mnop-mcount, a nop is emitted instead of the call to
fentry.

Review: Ulrich Weigand
https://reviews.llvm.org/D67765

llvm-svn: 372950
2019-09-26 08:38:07 +00:00
Jonas Paulsson c5d90e4b5c [SystemZ] Improve emitSelect()
Merge more Select pseudo instructions in emitSelect() by allowing other
instructions between them as long as they do not clobber CC.

Debug value instructions are now moved down to below the new PHIs instead of
erasing them.

Review: Ulrich Weigand
https://reviews.llvm.org/D67619

llvm-svn: 372873
2019-09-25 14:00:33 +00:00
Ulrich Weigand 819c1651f7 [SystemZ] Support z15 processor name
The recently announced IBM z15 processor implements the architecture
already supported as "arch13" in LLVM.  This patch adds support for
"z15" as an alternate architecture name for arch13.

The patch also uses z15 in a number of places where we used arch13
as long as the official name was not yet announced.

llvm-svn: 372435
2019-09-20 23:04:45 +00:00
Guillaume Chatelet 48904e9452 [Alignment] Use llvm::Align in MachineFunction and TargetLowering - fixes mir parsing
Summary:
This catches malformed mir files which specify alignment as log2 instead of pow2.
See https://reviews.llvm.org/D65945 for reference,

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67433

llvm-svn: 371608
2019-09-11 11:16:48 +00:00
Guozhi Wei b329e0728b [BPI] Adjust the probability for floating point unordered comparison
Since NaN is very rare in normal programs, so the probability for floating point unordered comparison should be extremely small. Current probability is 3/8, it is too large, this patch changes it to a tiny number.

Differential Revision: https://reviews.llvm.org/D65303

llvm-svn: 371541
2019-09-10 17:25:11 +00:00
Jonas Paulsson 821858780e [SystemZ] Recognize INLINEASM_BR in backend
Handle the remaining cases also by handling asm goto in
SystemZInstrInfo::getBranchInfo().

Review: Ulrich Weigand
https://reviews.llvm.org/D67151

llvm-svn: 371048
2019-09-05 10:20:05 +00:00
Jonas Paulsson a0a811739d [SystemZ] Recognize INLINEASM_BR in backend.
SystemZInstrInfo::analyzeBranch() needs to check for INLINEASM_BR
instructions, or it will crash.

Review: Ulrich Weigand
llvm-svn: 370753
2019-09-03 13:31:22 +00:00
Jonas Paulsson f12415812c [SystemZ] Add support for fentry.
SystemZAsmPrinter now properly emits function calls to __fentry__.

Review: Ulrich Weigand
llvm-svn: 370743
2019-09-03 11:21:12 +00:00
Ulrich Weigand b21e245711 [SystemZ] Support constrained fpto[su]i intrinsics
Now that constrained fpto[su]i intrinsic are available,
add codegen support to the SystemZ backend.

In addition to pure back-end changes, I've also needed
to add the strict_fp_to_[su]int and any_fp_to_[su]int
pattern fragments in the obvious way.

llvm-svn: 370674
2019-09-02 16:49:29 +00:00
Simon Pilgrim a4f08dded7 [SystemZ] Regenerate <8 x i31> store test
To help show the diffs from an upcoming SimplifyDemandedBits patch.

llvm-svn: 367216
2019-07-29 09:49:23 +00:00
Simon Pilgrim 743d45ee25 [TargetLowering] Add SimplifyMultipleUseDemandedBits
This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured.

The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use.

We do see a couple of regressions that need to be addressed:

    AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better).
	
    X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG.

The code owners have confirmed its ok for these cases to fixed up in future patches.

Differential Revision: https://reviews.llvm.org/D63281

llvm-svn: 366799
2019-07-23 12:39:08 +00:00
Ulrich Weigand 450c62e33e [Strict FP] Allow more relaxed scheduling
Reimplement scheduling constraints for strict FP instructions in
ScheduleDAGInstrs::buildSchedGraph to allow for more relaxed
scheduling.  Specifially, allow one strict FP instruction to
be scheduled across another, as long as it is not moved across
any global barrier.

Differential Revision: https://reviews.llvm.org/D64412

Reviewed By: cameron.mcinally

llvm-svn: 366222
2019-07-16 15:55:45 +00:00
Nikita Popov 411fa4c0df [SystemZ] Fix addcarry of addcarry of const carry (PR42606)
This fixes https://bugs.llvm.org/show_bug.cgi?id=42606 by extending
D64213. Instead of only checking if the carry comes from a matching
operation, we now check the full chain of carries. Otherwise we might
custom lower the outermost addcarry, but then generically legalize
an inner addcarry.

Differential Revision: https://reviews.llvm.org/D64658

llvm-svn: 365949
2019-07-12 20:03:34 +00:00
Ulrich Weigand 0f0a8b7784 [SystemZ] Add support for new cpu architecture - arch13
This patch series adds support for the next-generation arch13
CPU architecture to the SystemZ backend.

This includes:
- Basic support for the new processor and its features.
- Assembler/disassembler support for new instructions.
- CodeGen for new instructions, including new LLVM intrinsics.
- Scheduler description for the new processor.
- Detection of arch13 as host processor.

Note: No currently available Z system supports the arch13
architecture.  Once new systems become available, the
official system name will be added as supported -march name.

llvm-svn: 365932
2019-07-12 18:13:16 +00:00
Quentin Colombet 0ffe0db6fa [RegisterCoalescer] Fix an overzealous assert
Although removeCopyByCommutingDef deals with full copies, it is still
possible to copy undef lanes and thus, we wouldn't have any a value
number for these lanes.

This fixes PR40215.

llvm-svn: 365256
2019-07-06 00:34:54 +00:00
Nikita Popov a2a09cb606 [SystemZ] Fix addcarry of usubo (PR42512)
Only custom lower uaddo+addcarry or usubo+subcarry chains and leave
mixtures like usubo+addcarry or uaddo+subcarry to the generic
legalizer. Otherwise we run into issues because SystemZ uses
different CC values for carries and borrows.

Fixes https://bugs.llvm.org/show_bug.cgi?id=42512.

Differential Revision: https://reviews.llvm.org/D64213

llvm-svn: 365242
2019-07-05 20:35:11 +00:00
Ulrich Weigand 4c86dd9032 Allow matching extend-from-memory with strict FP nodes
This implements a small enhancement to https://reviews.llvm.org/D55506

Specifically, while we were able to match strict FP nodes for
floating-point extend operations with a register as source, this
did not work for operations with memory as source.

That is because from regular operations, this is represented as
a combined "extload" node (which is a variant of a load SD node);
but there is no equivalent using a strict FP operation.

However, it turns out that even in the absence of an extload
node, we can still just match the operations explicitly, e.g.
   (strict_fpextend (f32 (load node:$ptr))

This patch implements that method to match the LDEB/LXEB/LXDB
SystemZ instructions even when the extend uses a strict-FP node.

llvm-svn: 364450
2019-06-26 17:19:12 +00:00
Ulrich Weigand 3641b10f3d [SystemZ] Support vector load/store alignment hints
Vector load/store instructions support an optional alignment field
that the compiler can use to provide known alignment info to the
hardware.  If the field is used (and the information is correct),
the hardware may be able (on some models) to perform faster memory
accesses than otherwise.

This patch adds support for alignment hints in the assembler and
disassembler, and fills in known alignment during codegen.

llvm-svn: 363806
2019-06-19 14:20:00 +00:00
Matt Arsenault 9cac4e6d14 Rename ExpandISelPseudo->FinalizeISel, delay register reservation
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.

Patch by Matthias Braun

llvm-svn: 363757
2019-06-19 00:25:39 +00:00
Jonas Paulsson 5c64a8c4c6 [SystemZ] Fix AHIMuxK pseudo expansion.
Do not emit a copy if the source and destination registers are the same.

Review: Ulrich Weigand
llvm-svn: 363665
2019-06-18 12:10:02 +00:00
Luis Marques 2e46312ffd [DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting
Some GEPs were not being split, presumably because that split would just be 
undone by the DAGCombiner. Not performing those splits can prevent important 
optimizations, such as preventing the element indices / member offsets from 
being (partially) folded into load/store instruction immediates. This patch:

- Makes the splits also occur in the cases where the base address and the GEP 
  are in the same BB.
- Ensures that the DAGCombiner doesn't reassociate them back again.

Differential Revision: https://reviews.llvm.org/D60294

llvm-svn: 363544
2019-06-17 10:54:12 +00:00
Fangrui Song ac14f7b10c [lit] Delete empty lines at the end of lit.local.cfg NFC
llvm-svn: 363538
2019-06-17 09:51:07 +00:00
Sander de Smalen 5d6ee76c16 Describe stack-id as an enum
This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.

This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.

Reviewers: arsenm, thegameg, qcolombet

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D60137

llvm-svn: 363533
2019-06-17 09:13:29 +00:00
Guozhi Wei d2210af332 [MBP] Move a latch block with conditional exit and multi predecessors to top of loop
Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:

    * a latch block
    * it has two successors, one is loop header, another is exit
    * it has more than one predecessors

If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.

Differential Revision: https://reviews.llvm.org/D43256

llvm-svn: 363471
2019-06-14 23:08:59 +00:00
Francis Visoiu Mistrih a438432acc [FastISel] Skip creating unnecessary vregs for arguments
This behavior was added in r130928 for both FastISel and SD, and then
disabled in r131156 for FastISel.

This re-enables it for FastISel with the corresponding fix.

This is triggered only when FastISel can't lower the arguments and falls
back to SelectionDAG for it.

FastISel contains a map of "register fixups" where at the end of the
selection phase it replaces all uses of a register with another
register that FastISel sometimes pre-assigned. Code at the end of
SelectionDAGISel::runOnMachineFunction is doing the replacement at the
very end of the function, while other pieces that come in before that
look through the MachineFunction and assume everything is done. In this
case, the real issue is that the code emitting COPY instructions for the
liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg
assigned to the physreg is used, and if it's not, it will skip the COPY.
If a register wasn't replaced with its assigned fixup yet, the copy will
be skipped and we'll end up with uses of undefined registers.

This fix moves the replacement of registers before the emission of
copies for the live-ins.

The initial motivation for this fix is to enable tail calls for
swiftself functions, which were blocked because we couldn't prove that
the swiftself argument (which is callee-save) comes from a function
argument (live-in), because there was an extra copy (vreg to vreg).

A few tests are affected by this:

* llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21
(callee-save) but never reload it because it's attached to the return.
We now don't even spill it anymore.
* llvm/test/CodeGen/*/swiftself.ll: we tail-call now.
* llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this
test was not really testing the right thing, but it worked because the
same registers were re-used.
* llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes
* llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy
* llvm/test/CodeGen/Mips/*: get rid of spills and copies
* llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack
* llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack
* llvm/test/CodeGen/X86/swifterror.ll: same as AArch64
* llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed

Differential Revision: https://reviews.llvm.org/D62361

llvm-svn: 362963
2019-06-10 16:53:37 +00:00
Jonas Paulsson fdc4ea34e3 [SystemZ, RegAlloc] Favor 3-address instructions during instruction selection.
This patch aims to reduce spilling and register moves by using the 3-address
versions of instructions per default instead of the 2-address equivalent
ones. It seems that both spilling and register moves are improved noticeably
generally.

Regalloc hints are passed to increase conversions to 2-address instructions
which are done in SystemZShortenInst.cpp (after regalloc).

Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are
the same), foldMemoryOperandImpl() can no longer trivially fold a spilled
source register since the reg/reg instruction is now 3-address. In order to
remedy this, new 3-address pseudo memory instructions are used to perform the
folding only when the dst and lhs virtual registers are known to be allocated
to the same physreg. In order to not let MachineCopyPropagation run and
change registers on these transformed instructions (making it 3-address), a
new target pass called SystemZPostRewrite.cpp is run just after
VirtRegRewriter, that immediately lowers the pseudo to a target instruction.

If it would have been possibe to insert a COPY instruction and change a
register operand (convert to 2-address) in foldMemoryOperandImpl() while
trusting that the caller (e.g. InlineSpiller) would update/repair the
involved LiveIntervals, the solution involving pseudo instructions would not
have been needed. This is perhaps a potential improvement (see Phabricator
post).

Common code changes:

* A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a
target pass immediately before MachineCopyPropagation.

* VirtRegMap is passed as an argument to foldMemoryOperand().

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D60888

llvm-svn: 362868
2019-06-08 06:19:15 +00:00
Ulrich Weigand 6c5d5ce551 Allow target to handle STRICT floating-point nodes
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).

This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.

To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:

- A new MCID flag "mayRaiseFPException" that the target should set on any
  instruction that possibly can raise FP exception according to the
  architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
  instruction resulting from expansion of any constrained FP intrinsic.

Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).

Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.

The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.

This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.

Reviewed By: andrew.w.kaylor

Differential Revision: https://reviews.llvm.org/D55506

llvm-svn: 362663
2019-06-05 22:33:10 +00:00
QingShan Zhang 72667b4e48 [NFC] Update the test to check the endianness after the CodeGenPrepare instead of checking the assembly instructions.
llvm-svn: 362471
2019-06-04 08:45:07 +00:00
Simon Pilgrim 74467814f2 [SystemZ] Remove sitofp(undef) from reduced test case.
Pre-commit for D62807 - which adds DAG [us]itofp(undef) --> 0 constant fold

llvm-svn: 362396
2019-06-03 12:58:36 +00:00
Kevin P. Neal ac79007205 Revert revert of r362112 with minor SystemZ test file corrections.
[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes

This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Cameron McInally, Kevin P. Neal
Approved by:	Cameron McInally
Differential Revision:	https://reviews.llvm.org/D62546

llvm-svn: 362241
2019-05-31 16:32:12 +00:00
Roman Lebedev 05ad5fd213 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 3
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 362143
2019-05-30 20:37:18 +00:00