Better use isZero() and isIntN() in SystemZTargetTransformInfo rather than
calling getZExtValue() since the immediate operand may be wider than 64 bits,
which is not allowed with getZExtValue().
Fixes https://bugs.llvm.org/show_bug.cgi?id=47600
Review: Simon Pilgrim
Changes TTI function getIntImmCostInst to take an additional Instruction parameter,
which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT.
We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated.
Required minor changes in some non-ARM backends to allow for the optional parameter to be included.
Differential Revision: https://reviews.llvm.org/D87457
The versions that take 'unsigned' will be removed in the future.
I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D87592
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.
So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.
This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.
Differential Revision: https://reviews.llvm.org/D86744
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.
This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.
One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.
I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.
Differential Revision: https://reviews.llvm.org/D85165
Currently, getCastInstrCost has limited information about the cast it's
rating, often just the opcode and types. Sometimes there is a context
instruction as well, but it isn't trustworthy: for instance, when the
vectorizer is rating a plan, it calls getCastInstrCost with the old
instructions when, in fact, it's trying to evaluate the cost of the
instruction post-vectorization. Thus, the current system can get the
cost of certain casts incorrect as the correct cost can vary greatly
based on the context in which it's used.
For example, if the vectorizer queries getCastInstrCost to evaluate the
cost of a sext(load) with tail predication enabled, getCastInstrCost
will think it's free most of the time, but it's not always free. On ARM
MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar
situations can come up with how masked loads can be extended when being
split.
To fix that, this path adds a new parameter to getCastInstrCost to give
it a hint about the context of the cast. It adds a CastContextHint enum
which contains the type of the load/store being created by the
vectorizer - one for each of the types it can produce.
Original patch by Pierre van Houtryve
Differential Revision: https://reviews.llvm.org/D79162
When passing the -vector feature to LLVM (or equivalently the
-mno-vx command line argument to clang), the intent is that
generated code must not use any vector features (in particular,
no vector registers must be used).
However, there are some cases where we still could generate
such uses; these are all related to some of the additional
vector features (like +vector-enhancements-1). Since none
of those features are actually usable with -vector, just make
sure we disable them all if -vector is given.
Summary:
This patch separates the peeling specific parameters from the UnrollingPreferences,
and creates a new struct called PeelingPreferences. Functions which used the
UnrollingPreferences struct for peeling have been updated to use the PeelingPreferences struct.
Author: sidbav (Sidharth Baveja)
Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel), anhtuyen (Anh Tuyen Tran), nikic (Nikita Popov)
Reviewed By: Meinersbur (Michael Kruse)
Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D80580
Revision e1de2773a5 provided support for
accepting integer registers in inline asm i.e.
__asm("lhi %r0, 5") -> lhi %r0, 5
__asm("lhi 0, 5") -> lhi 0,5
This patch aims to extend this support to instructions which compute
addresses as well. (i.e instructions of type BDMem and BD[X|R|V|L]Mem)
Author: anirudhp
Differential Revision: https://reviews.llvm.org/D83251
Summary:
This fixes ASan and MSan tests on SystemZ after
commit 6a822e20ce ("[ASan][MSan] Remove EmptyAsm and set the CallInst
to nomerge to avoid from merging.").
Based on commit 80e107ccd0 ("Add NoMerge MIFlag to avoid MIR branch
folding").
Reviewers: uweigand, jonpa
Reviewed By: uweigand
Subscribers: hiraditya, llvm-commits, Andreas-Krebbel
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82794
Instead of doing multiple unpacks when zero extending vectors (e.g. v2i16 ->
v2i64), benchmarks have shown that it is better to do a VPERM (vector
permute) since that is only one sequential instruction on the critical path.
This patch achieves this by
1. Expand ZERO_EXTEND_VECTOR_INREG into a vector shuffle with a zero vector
instead of (multiple) unpacks.
2. Improve SystemZ::GeneralShuffle to perform a single unpack as the last
operation if Bytes matches it.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D78486
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value. If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.
This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.
Differential Revision: https://reviews.llvm.org/D80368
Following on from this RFC[0] from a while back, this is the first patch towards
implementing variadic debug values.
This patch specifically adds a set of functions to MachineInstr for performing
operations specific to debug values, and replacing uses of the more general
functions where appropriate. The most prevalent of these is replacing
getOperand(0) with getDebugOperand(0) for debug-value-specific code, as the
operands corresponding to values will no longer be at index 0, but index 2 and
upwards: getDebugOperand(x) == getOperand(x+2). Similar replacements have been
added for the other operands, along with some helper functions to replace
oft-repeated code and operate on a variable number of value operands.
[0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139376.html<Paste>
Differential Revision: https://reviews.llvm.org/D81852
Add the remaining arithmetic opcodes into the generic implementation
of getUserCost and then call this from getInstructionThroughput. Most
of the backends have been modified to return the base implementation
for cost kinds other RecipThroughput. The outlier here is AMDGPU
which already uses getArithmeticInstrCost for all the cost kinds.
This change means that most of the opcodes can be removed from that
backends implementation of getUserCost.
Differential Revision: https://reviews.llvm.org/D80992
Add cases for icmp, fcmp and select into the switch statement of the
generic getUserCost implementation with getInstructionThroughput then
calling into it. The BasicTTI and backend implementations have be set
to return a default value (1) when a cost other than throughput is
being queried.
Differential Revision: https://reviews.llvm.org/D80550
Use getMemoryOpCost from the generic implementation of getUserCost
and have getInstructionThroughput return the result of that for loads
and stores.
This also means that the X86 implementation of getUserCost can be
removed with the functionality folded into its getMemoryOpCost.
Differential Revision: https://reviews.llvm.org/D80984
Add the remaining cast instruction opcodes to the base implementation
of getUserCost and directly return the result. This allows
getInstructionThroughput to return getUserCost for the casts. This
has required changes to PPC and SystemZ because they implement
getUserCost and/or getCastInstrCost with adjustments for vector
operations. Adjusts have also been made in the remaining backends
that implement the method so that they still produce a cost of zero
or one for cost kinds other than throughput.
Differential Revision: https://reviews.llvm.org/D79848
Replace with forward declaration and move dependency down to source files that actually need it.
Both TargetLowering.h and TargetMachine.h are 2 of the most expensive headers (top 10) in the ClangBuildAnalyzer report when building llc.
Negations are incorrectly added in numerous places and the code just happens to work.
Also fix a missed DW_CFA_def_cfa_offset negation in c693b9c321d5a40d012340619674cf790c9ac86c:
ARMAsmBackendDarwin::generateCompactUnwindEncoding
Combine the two API calls into one by introducing a structure to hold
the relevant data. This has the added benefit of moving the boiler
plate code for arguments and flags, into the constructors. This is
intended to be a non-functional change, but the complicated web of
logic involved here makes it very hard to guarantee.
Differential Revision: https://reviews.llvm.org/D79941
Try to avoid creating VGBMs by reusing the permutation mask if it contains a
zero. If the first byte was into (any byte of) a zero vector, then the first
byte of the mask can become zero and reused by putting the mask also as the
first operand. If there instead was a first-byte use of the other source
operand, then that zero index can be reused if the mask is placed as the
second operand.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D79925
It's better to reuse the first source value than to use an undef second
operand, because that will make more resulting VPERMs have identical operands
and therefore MachineCSE more successful.
Review: Ulrich Weigand
Use FP-mem instructions when folding reloads into single lane (W..) vector
instructions.
Only do this when all other operands of the instruction have already been
allocated to an FP (F0-F15) register.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D76705
This patch stores the alignment for ConstantPoolSDNode as an
Align and updates the getConstantPool interface to take a MaybeAlign.
Removing getAlignment() will be done as a follow up.
Differential Revision: https://reviews.llvm.org/D79436
When using vec_load/store_len_r with an immediate length operand
of 16 or larger, LLVM will currently emit an VLRL/VSTRL instruction
with that immediate. This creates a valid encoding (which should be
supported by the assembler), but always traps at runtime. This patch
fixes this by not creating VLRL/VSTRL in those cases.
This would result in loading the length into a register and
calling VLRLR/VSTRLR instead. However, these operations with
a length of 15 or larger are in fact simply equivalent to a
full vector load or store. And in fact the same holds true for
vec_load/store_len as well.
Therefore, add a DAGCombine rule to replace those operations with
plain vector loads or stores if the length is known at compile
time and equal or larger to 15.
getScalarizationOverhead is only ever called with vectors (and we already had a load of cast<VectorType> calls immediately inside the functions).
Followup to D78357
Reviewed By: @samparker
Differential Revision: https://reviews.llvm.org/D79341
Make the kind of cost explicit throughout the cost model which,
apart from making the cost clear, will allow the generic parts to
calculate better costs. It will also allow some backends to
approximate and correlate the different costs if they wish. Another
benefit is that it will also help simplify the cost model around
immediate and intrinsic costs, where we currently have multiple APIs.
RFC thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html
Differential Revision: https://reviews.llvm.org/D79002
For compatibility with other assemblers on the platform, allow
using just plain integer register numbers in all places where a
register operand is expected.
Bug: llvm.org/PR45582
Remove redundant Group and Regs arguments from parseRegister
and eliminate one of its overloaded versions.
Remove redundant Regs argument from parseAddress.
NFC intended.
Summary:
Before this patch, `relaxInstruction` takes three arguments, the first
argument refers to the instruction before relaxation and the third
argument is the output instruction after relaxation. There are two quite
strange things:
1) The first argument's type is `const MCInst &`, the third
argument's type is `MCInst &`, but they may be aliased to the same
variable
2) The backends of ARM, AMDGPU, RISC-V, Hexagon assume that the third
argument is a fresh uninitialized `MCInst` even if `relaxInstruction`
may be called like `relaxInstruction(Relaxed, STI, Relaxed)` in a
loop.
In this patch, we drop the thrid argument, and let `relaxInstruction`
directly modify the given instruction. Also, this patch fixes the bug https://bugs.llvm.org/show_bug.cgi?id=45580, which is introduced by D77851, and
breaks the assumption of ARM, AMDGPU, RISC-V, Hexagon.
Reviewers: Razer6, MaskRay, jyknight, asb, luismarques, enderby, rtaylor, colinl, bcain
Reviewed By: Razer6, MaskRay, bcain
Subscribers: bcain, nickdesaulniers, nathanchance, wuzish, annita.zhang, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, tpr, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78364
The API for shuffles and reductions uses generic Type parameters,
instead of VectorType, and so assertions and casts are used a lot.
This patch makes those types explicit, which means that the clients
can't be lazy, but results in less ambiguity, and that can only be a
good thing.
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45562
Differential Revision: https://reviews.llvm.org/D78357
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: sdesmalen, efriedma, jonpa
Reviewed By: sdesmalen
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77265
This does not change much in code generation, but in rare cases MachineCSE
can figure out that an instruction is redundant after commuting it.
Review: Ulrich Weigand
This patch adds
- New arguments to getMinPrefetchStride() to let the target decide on a
per-loop basis if software prefetching should be done even with a stride
within the limit of the hw prefetcher.
- New TTI hook enableWritePrefetching() to let a target do write prefetching
by default (defaults to false).
- In LoopDataPrefetch:
- A search through the whole loop to gather information before emitting any
prefetches. This way the target can get information via new arguments to
getMinPrefetchStride() and emit prefetches more selectively. Collected
information includes: Does the loop have a call, how many memory
accesses, how many of them are strided, how many prefetches will cover
them. This is NFC to before as long as the target does not change its
definition of getMinPrefetchStride().
- If a previous access to the same exact address was 'read', and the
current one is 'write', make it a 'write' prefetch.
- If two accesses that are covered by the same prefetch do not dominate
each other, put the prefetch in a block that dominates both of them.
- If a ConstantMaxTripCount is less than ItersAhead, then skip the loop.
- A SystemZ implementation of getMinPrefetchStride().
Review: Ulrich Weigand, Michael Kruse
Differential Revision: https://reviews.llvm.org/D70228
Registers used in any address (as well as in a few other contexts)
have special semantics when a "zero" register is used, which is
why the back-end defines extra register classes ADDR32, ADDR64 etc
to be used to prevent the register allocator from using %r0 there.
However, when writing assembler code "by hand", you sometimes need
to trigger that special semantics. However, currently the AsmParser
will reject %r0 in those places. In some cases it may be possible
to write that instruction differently - but in others it is currently
not possible at all.
This check in AsmParser simply seems overly strict, so this patch
just removes the check completely. This brings the behaviour of
AsmParser in line with the GNU assembler as well.
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45092
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jyknight, sdardis, nemanjai, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, jfb, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77059
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, jrtc27, atanasyan, jfb, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76925
Follow-up of D72172 and D72180
This patch passes `uint64_t Address` to print methods of PC-relative
operands so that subsequent target specific patches can change
`*InstPrinter::print{Operand,PCRelImm,...}` to customize the output.
Add MCInstPrinter::PrintBranchImmAsAddress which is set to true by
llvm-objdump.
```
// Current llvm-objdump -d output
aarch64: 20000: bl #0
ppc: 20000: bl .+4
x86: 20000: callq 0
// Ideal output
aarch64: 20000: bl 0x20000
ppc: 20000: bl 0x20004
x86: 20000: callq 0x20005
// GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled
aarch64: 20000: bl 20000
ppc: 20000: bl 0x20004
x86: 20000: callq 20005
```
In `lib/Target/X86/X86GenAsmWriter1.inc` (generated by `llvm-tblgen -gen-asm-writer`):
```
case 12:
// CALL64pcrel32, CALLpcrel16, CALLpcrel32, EH_SjLj_Setup, JCXZ, JECXZ, J...
- printPCRelImm(MI, 0, O);
+ printPCRelImm(MI, Address, 0, O);
return;
```
Some targets have 2 `printOperand` overloads, one without `Address` and
one with `Address`. They should annotate derived `Operand` properly with
`let OperandType = "OPERAND_PCREL"`.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D76574
A spilled load of an immediate can use MVHI/MVGHI instead.
A compare of a spilled register against an immediate can use CHSI/CGHSI.
A logical compare can use CLFHSI/CLGHSI.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D76055
Replace single-lane (W... form) vector "multiply and add" and "multiply and
subtract" instructions with equivalent floating point instructions whenever
possible in SystemZShortenInst.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D76370
The type legalizer will scalarize vector conversions from integer to floating
point if the source element size is less than that of the result.
This is avoided now by inserting a zero/sign-extension of the source vector
before type legalization.
Review: Ulrich Weigand
Differential revision: https://reviews.llvm.org/D75978
For context, the proposed RISC-V bit manipulation extension has a subset
of instructions which require one of two SubtargetFeatures to be
enabled, 'zbb' or 'zbp', and there is no defined feature which both of
these can imply to use as a constraint either (see comments in D65649).
AssemblerPredicates allow multiple SubtargetFeatures to be declared in
the "AssemblerCondString" field, separated by commas, and this means
that the two features must both be enabled. There is no equivalent to
say that _either_ feature X or feature Y must be enabled, short of
creating a dummy SubtargetFeature for this purpose and having features X
and Y imply the new feature.
To solve the case where X or Y is needed without adding a new feature,
and to better match a typical TableGen style, this replaces the existing
"AssemblerCondString" with a dag "AssemblerCondDag" which represents the
same information. Two operators are defined for use with
AssemblerCondDag, "all_of", which matches the current behaviour, and
"any_of", which adds the new proposed ORing features functionality.
This was originally proposed in the RFC at
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139138.html
Changes to all current backends are mechanical to support the replaced
functionality, and are NFCI.
At this stage, it is illegal to combine features with ands and ors in a
single AssemblerCondDag. I suspect this case is sufficiently rare that
adding more complex changes to support it are unnecessary.
Differential Revision: https://reviews.llvm.org/D74338
Refines the gather/scatter cost model, but also changes the TTI
function getIntrinsicInstrCost to accept an additional parameter
which is needed for the gather/scatter cost evaluation.
This did require trivial changes in some non-ARM backends to
adopt the new parameter.
Extending gathers and truncating scatters are now priced cheaper.
Differential Revision: https://reviews.llvm.org/D75525
Swap the compare operands if LHS is spilled while updating the CCMask:s of
the CC users. This is relatively straight forward since the live-in lists for
the CC register can be assumed to be correct during register allocation
(thanks to 659efa2).
Also fold a spilled operand of an LOCR/SELR into an LOC(G).
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D67437
On SystemZ there are a set of "access registers" that can be copied in and
out of 32-bit GPRs with special instructions. These instructions can only
perform the copy using low 32-bit parts of the 64-bit GPRs. However, the
default register class for 32-bit integers is GRX32, which also contains the
high 32-bit part registers.
In order to never end up with a case of such a COPY into a high reg, this
patch adds a new simple pre-RA pass that selects such COPYs into target
instructions.
This pass also handles COPYs from CC (Condition Code register), and COPYs to
CC can now also be emitted from a high reg in copyPhysReg().
Fixes: https://bugs.llvm.org/show_bug.cgi?id=44254
Review: Ulrich Weigand.
Differential Revision: https://reviews.llvm.org/D75014
The incoming back chain slot was implicitly allocated whenever a GPR was
saved in SystemZFrameLowering::getRegSpillOffset(), but in cases where no
GPRs were saved/restored this did not take effect.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D75367
Forming subtract with overflow is beneficial on SystemZ, just like additions.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D75290
In order to build the Linux kernel, the back chain must be supported with
packed-stack. The back chain is then stored topmost in the register save
area.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D74506
A cost query for a vector instruction should return a cost even without
target vector support, and not trigger an assert.
VectorCombine does this with an input containing source code vectors.
Review: Ulrich Weigand
On some targets, like SPARC, forming overflow ops is only profitable if
the math result is used: https://godbolt.org/z/DxSmdB
This patch adds a new MathUsed parameter to allow the targets
to make the decision and defaults to only allowing it
if the math result is used. That is the conservative choice.
This patch also updates AArch64ISelLowering, X86ISelLowering,
ARMISelLowering.h, SystemZISelLowering.h to allow forming overflow
ops if the math result is not used. On those targets using the
overflow intrinsic for the overflow check only generates better code.
Reviewers: nikic, RKSimon, lebedev.ri, spatel
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D74722
Summary:
This was a very odd API, where you had to pass a flag into a zext
function to say whether the extended bits really were zero or not. All
callers passed in a literal true or false.
I think it's much clearer to make the function name reflect the
operation being performed on the value we're tracking (rather than on
the KnownBits Zero and One fields), so zext means the value is being
zero extended and new function anyext means the value is being extended
with unknown bits.
NFC.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74482
Summary:
Add a new method (tryParseRegister) that attempts to parse a register specification.
MASM allows the use of IFDEF <register>, as well as IFDEF <symbol>. To accommodate this, we make it possible to check whether a register specification can be parsed at the current location, without failing the entire parse if it can't.
Reviewers: thakis
Reviewed By: thakis
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73486
When more than one SelectPseudo instruction is handled a new MBB is
returned. This must not be done if that would result in leaving an undhandled
isel pseudo behind in the original MBB.
Fixes https://bugs.llvm.org/show_bug.cgi?id=44849.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D74352
Each function is with this compiled with the SystemZSubtarget initialized
from the functions attributes.
Review: Ulrich Weigand.
Differential Revision: https://reviews.llvm.org/D74086
This change implements the llvm intrinsic llvm.read_register for
the SystemZ platform which returns the value of the specified
register
(http://llvm.org/docs/LangRef.html#llvm-read-register-and-llvm-write-register-intrinsics).
This implementation returns the value of the stack register, and
can be extended to return the value of other registers. The
implementation for this intrinsic exists on various other platforms
including Power, x86, ARM, etc. but missing on SystemZ.
Reviewers: uweigand
Differential Revision: https://reviews.llvm.org/D73378
When specifying -march=arch[8|9|10], those CPU types do NOT support
the vector extension. In this case the vector ABI must be disabled.
The generated data layout should NOT contain 64-v128.
Reviewers: uweigand
Differential Revision: https://reviews.llvm.org/D74146
The "{=v0}" constraint did not result in the expected error message in the
abscence of the vector facility, because 'v0' matches as a string into the
AnyRegBitRegClass in common code.
This patch adds checks for vector support in case of "{v" and soft-float in
case of "{f" to remedy this.
Review: Ulrich Weigand.
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73885
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
In GlobalISel we may in some unfortunate circumstances generate PHIs with
operands that are on separate banks. If-conversion doesn't currently check for
that case and ends up generating a CSEL on AArch64 with incorrect register
operands.
Differential Revision: https://reviews.llvm.org/D72961
Except AMDGPU/R600RegisterInfo (a bunch of MIR tests seem to have
problems), every target overrides it with true. PostMachineScheduler
requires livein information. Not providing it can cause assertion
failures in ScheduleDAGInstrs::addSchedBarrierDeps().
This was dropping the invariant metadata on dead argument loads, so
they weren't deleted.
Atomics still need to be fixed the same way. Also, apparently store
was never preserving dereferencable which should also be fixed.
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.
A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.
This patch reduces the number of public symbols in libLLVM.so by about
25%. This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so
One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.
Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278
Reviewers: chandlerc, beanz, mgorny, rnk, hans
Reviewed By: rnk, hans
Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D54439
The argument is llvm::null() everywhere except llvm::errs() in
llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no
target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds.
If we ever have the needs to add verbose log to disassemblers, we can
record log with a member function, instead of passing it around as an
argument.
In D71841 we inverted the sense of the SDNode-level flag to ensure all nodes
default to potentially raising FP exceptions unless otherwise specified --
i.e. if we forget to propagate the flag somewhere, the effect is now only
lost performance, not incorrect code.
However, the related flag at the MI level still defaults to nodes not raising
FP exceptions unless otherwise specified. To be fully on the (conservatively)
safe side, we should invert that flag as well.
This patch does so by replacing MIFlag::FPExcept with MIFlag::NoFPExcept.
(Note that this does also introduce an incompatible change in the MIR format.)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D72466
SystemZDAGToDAGISel::Select will attempt to split logical instruction
with a large immediate constant. This must not happen if the result
matches one of the z15 combined operations, so the code checks for
those. However, one of them was missed, causing invalid code to
be generated in the test case for PR44496.
printInst prints a branch/call instruction as `b offset` (there are many
variants on various targets) instead of `b address`.
It is a convention to use address instead of offset in most external
symbolizers/disassemblers. This difference makes `llvm-objdump -d`
output unsatisfactory.
Add `uint64_t Address` to printInst(), so that it can pass the argument to
printInstruction(). `raw_ostream &OS` is moved to the last to be
consistent with other print* methods.
The next step is to pass `Address` to printInstruction() (generated by
tablegen from the instruction set description). We can gradually migrate
targets to print addresses instead of offsets.
In any case, downstream projects which don't know `Address` can pass 0 as
the argument.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D72172
-mpacked-stack is currently not supported with -mbackchain, so this should
result in a compilation error message instead of being silently ignored.
Review: Ulrich Weigand
The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all
other such flags. This is a problem, because it implies that all code that
transforms SDNodes without copying flags can introduce a correctness bug,
not just a missed optimization.
This patch changes the default to false. This makes it necessary to move
setting the (No)FPExcept flag for constrained intrinsics from the
visitConstrainedIntrinsic routine to the generic visit routine at the
place where the other flags are set, or else the intersectFlagsWith
call would erase the NoFPExcept flag again.
In order to avoid making non-strict FP code worse, whenever
SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes
none of which can raise FP exceptions, it will preserve this property
on all results nodes generated, by setting the NoFPExcept flag on
those result nodes that would otherwise be considered as raising
an FP exception.
To check whether or not an SD node should be considered as raising
an FP exception, the following logic applies:
- For machine nodes, check the mayRaiseFPException property of
the underlying MI instruction
- For regular nodes, check isStrictFPOpcode
- For target nodes, check a newly introduced isTargetStrictFPOpcode
The latter is implemented by reserving a range of target opcodes,
similarly to how memory opcodes are identified. (Note that there a
bit of a quirk in identifying target nodes that are both memory nodes
and strict FP nodes. To simplify the logic, right now all target memory
nodes are automatically also considered strict FP nodes -- this could
be fixed by adding one more range.)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71841
The SELR(Mux) instructions can be converted to two-address form as LOCR(Mux)
instructions whenever one of the sources are the same reg as dest. By adding
this mapping in getTwoOperandOpcode(), we get:
- Two-address hints in getRegAllocationHints() for select register
instructions.
- No need anymore for special handling in SystemZShortenInst.cpp -
shortenSelect() removed.
The two-address hints are now added before the GRX32 hints, which should be
preferred.
Review: Ulrich Weigand
https://reviews.llvm.org/D68870
It was recently discovered that the handling of CC values was actually broken
since overflow was not properly handled ('nsw' flag not checked for).
Add and sub instructions now have a new target specific instruction flag
named SystemZII::CCIfNoSignedWrap. It means that the CC result can be used
instead of a compare with 0, but only if the instruction has the 'nsw' flag
set.
This patch also adds the improvements of conversion to logical instructions
and the analyzing of add with immediates, to be able to eliminate more
compares.
Review: Ulrich Weigand
https://reviews.llvm.org/D66868
The back-end currently has special DAGCombine code to detect
cases where two floating-point extend or truncate operations
can be combined into a single vector operation.
This patch extends that support to also handle strict FP operations.
Note that currently only the case where both operations have the
same input chain are supported. This already suffices to cover
the common case where the operations result from scalarizing a
non-legal vector type. More general cases can be supported in
the future.
Recommit after making the same API change in non-x86 targets. This has been build for all targets, and tested for effected ones. Why the difference? Because my disk filled up when I tried make check for all.
For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
Add new intrinsics
llvm.experimental.constrained.minimum
llvm.experimental.constrained.maximum
as strict versions of llvm.minimum and llvm.maximum.
Includes SystemZ back-end support.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71624
Let the "mnop-mcount" function attribute simply be present or non-present.
Update SystemZ backend as well to use hasFnAttribute() instead.
Review: Ulrich Weigand
https://reviews.llvm.org/D71669
As of b1d8576 there is middle-end support for STRICT_[SU]INT_TO_FP,
so this patch adds SystemZ back-end support as well.
The patch is SystemZ target specific except for adding SD patterns
strict_[su]int_to_fp and any_[su]int_to_fp to TargetSelectionDAG.td
as usual.
Now that the machine verifier will check for cases of register/immediate
MachineOperands and their correspondence to the MC instruction descriptor,
this patch adds the operand types to the descriptors where they were
previously missing. All MCOI::OPERAND_UNKNOWN operand types have been handled
to get a known type, except for G_... (global isel) instructions.
Review: Ulrich Weigand
https://reviews.llvm.org/D71494
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).
In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.
Committing this change will significantly reduce our merge conflicts
for each upstream merge.
Reviewers: spatel, bogner
Reviewed By: bogner
Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70917
Any llvm function with the "packed-stack" attribute will be compiled to use
the packed stack layout which reuses unused parts of the incoming register
save area. This is needed for building the Linux kernel.
Review: Ulrich Weigand
https://reviews.llvm.org/D70821
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
Soon Intrinsic::ID will be a plain integer, so this overload will not be
possible.
Rename both overloads to ensure that downstream targets observe this as
a build failure instead of a runtime failure.
Split off from D71320
Reviewers: efriedma
Differential Revision: https://reviews.llvm.org/D71381
Before z14, we did not have any FMA instruction for 128-bit
floating-point, so the @llvm.fma.f128 intrinsic needs to be
expanded to a libcall on those platforms.
This worked correctly for regular FMA, but was implemented
incorrectly for the strict version. This was not noticed
because we did not have test coverage for this case.
This patch fixes that incorrect expansion and adds the
missing test cases.
This attempts to teach the cost model in Arm that code such as:
%s = shl i32 %a, 3
%a = and i32 %s, %b
Can under Arm or Thumb2 become:
and r0, r1, r2, lsl #3
So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.
We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.
Differential Revision: https://reviews.llvm.org/D70966
My patch 9db13b5a7d seems to have
caused some build bots to fail due to warnings that appear only
when using -Wcovered-switch-default.
This patch is an attempt to fix this by trying to avoid both the warning
"default label in switch which covers all enumeration values"
for the inner switch statements and at the same time the warning
"this statement may fall through"
for the outer switch statement in getVectorComparison
(SystemZISelLowering.cpp).
This adds support for constrained floating-point comparison intrinsics.
Specifically, we add:
declare <ty2>
@llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
metadata <condition code>,
metadata <exception behavior>)
declare <ty2>
@llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
metadata <condition code>,
metadata <exception behavior>)
The first variant implements an IEEE "quiet" comparison (i.e. we only
get an invalid FP exception if either argument is a SNaN), while the
second variant implements an IEEE "signaling" comparison (i.e. we get
an invalid FP exception if either argument is any NaN).
The condition code is implemented as a metadata string. The same set
of predicates as for the fcmp instruction is supported (except for the
"true" and "false" predicates).
These new intrinsics are mapped by SelectionDAG codegen onto two new
ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again
representing quiet vs. signaling comparison operations. Otherwise
those nodes look like SETCC nodes, with an additional chain argument
and result as usual for strict FP nodes. The patch includes support
for the common legalization operations for those nodes.
The patch also includes full SystemZ back-end support for the new
ISD nodes, mapping them to all available SystemZ instruction to
fully implement strict semantics (scalar and vector).
Differential Revision: https://reviews.llvm.org/D69281
This patch implements the following changes:
1) SelectionDAGBuilder::visitConstrainedFPIntrinsic currently treats
each constrained intrinsic like a global barrier (e.g. a function call)
and fully serializes all pending chains. This is actually not required;
it is allowed for constrained intrinsics to be reordered w.r.t one
another or (nonvolatile) memory accesses. The MI-level scheduler already
allows for that flexibility, so it makes sense to allow it at the DAG
level as well.
This patch therefore changes the way chains for constrained intrisincs
are created, and handles them basically like load operations are handled.
This has the effect that constrained intrinsics are no longer serialized
against one another or (nonvolatile) loads. They are still serialized
against stores, but that seems hard to change with the current DAG chain
setup, and it also doesn't seem to be a big problem preventing DAG
2) The OPC_CheckFoldableChainNode check requires that each of the
intermediate nodes in a multi-node pattern match only has a single use.
This check tends to fail if those intermediate nodes are strict operations
as those have a chain output that typically indeed has another use.
However, we don't really need to consider chains here at all, since they
will all be rewritten anyway by UpdateChains later. Other parts of the
matcher therefore already ignore chains, but this hasOneUse check doesn't.
This patch replaces hasOneUse by a custom test that verifies there is no
more than one use of any non-chain output value.
In theory, this change could affect code unrelated to strict FP nodes,
but at least on SystemZ I could not find any single instance of that
happening
3) The SystemZ back-end currently does not allow matching multiply-and-
extend operations (32x32 -> 64bit or 64x64 -> 128bit FP multiply) for
strict FP operations. This was not possible in the past due to the
problems described under 1) and 2) above.
With those issues fixed, it is now possible to fully support those
instructions in strict mode as well, and this patch does so.
Differential Revision: https://reviews.llvm.org/D70913
As it can be seen from accompanying cleanup, it is not unheard of
to write `~Known.Zero` meaning "what maximal value can this KnownBits
produce". But i think `~Known.Zero` isn't *that* self-explanatory,
as compared to a method with a name.
Note that not all `~Known.Zero` places were cleaned up,
only those where this arguably improves things.
The improvement in the machine verifier for operand types (D63973) discovered
a bad operand in a test using a PPA instruction. It was an immediate 0 where
a register was expected.
This patch fixes this (NFC) by now making the PPA second register operand
NoRegister instead of a zero immediate in the MIR.
Review: Ulrich Weigand
https://reviews.llvm.org/D70501
// Due to the SystemZ ABI, the DWARF CFA (Canonical Frame Address) is not
// equal to the incoming stack pointer, but to incoming stack pointer plus
// 160. The getOffsetOfLocalArea() returned value is interpreted as "the
// offset of the local area from the CFA".
The immediate offsets into the Register save area returned by
getCalleeSavedSpillSlots() should take this offset into account, which this
patch makes sure of.
Patch and review by Ulrich Weigand.
https://reviews.llvm.org/D70427
float node
This patch add an option 'disable-strictnode-mutation' to prevent strict
node mutating to an normal node.
So we can make sure that the patch which sets strict-node as legal works
correctly.
Patch by Chen Liu(LiuChen3)
Differential Revision: https://reviews.llvm.org/D70226
Summary:
Most libraries are defined in the lib/ directory but there are also a
few libraries defined in tools/ e.g. libLLVM, libLTO. I'm defining
"Component Libraries" as libraries defined in lib/ that may be included in
libLLVM.so. Explicitly marking the libraries in lib/ as component
libraries allows us to remove some fragile checks that attempt to
differentiate between lib/ libraries and tools/ libraires:
1. In tools/llvm-shlib, because
llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of
all libraries defined in the whole project, there was custom code
needed to filter out libraries defined in tools/, none of which should
be included in libLLVM.so. This code assumed that any library
defined as static was from lib/ and everything else should be
excluded.
With this change, llvm_map_components_to_libnames(LIB_NAMES, "all")
only returns libraries that have been added to the LLVM_COMPONENT_LIBS
global cmake property, so this custom filtering logic can be removed.
Doing this also fixes the build with BUILD_SHARED_LIBS=ON
and LLVM_BUILD_LLVM_DYLIB=ON.
2. There was some code in llvm_add_library that assumed that
libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or
ARG_LINK_COMPONENTS set. This is only true because libraries
defined lib lib/ use LLVMBuild.txt and don't set these values.
This code has been fixed now to check if the library has been
explicitly marked as a component library, which should now make it
easier to remove LLVMBuild at some point in the future.
I have tested this patch on Windows, MacOS and Linux with release builds
and the following combinations of CMake options:
- "" (No options)
- -DLLVM_BUILD_LLVM_DYLIB=ON
- -DLLVM_LINK_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON
Reviewers: beanz, smeenai, compnerd, phosek
Reviewed By: beanz
Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70179
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.
AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
This is a special calling convention to be used by the GHC compiler.
Author: Stefan Schulze Frielinghaus
Differential Revision: https://reviews.llvm.org/D69024
Demand that an immediate offset to a PC relative address fits in 32 bits, or
else load it into a register and perform a separate add.
Verify in the assembler that such immediate offsets fit the bitwidth.
Even though the final address of a Load Address Relative Long may fit in 32
bits even with a >32 bit offset (depending on where the symbol lives relative
to PC), the GNU toolchain demands the offset by itself to be in range. This
patch adapts the same behavior for llvm.
Review: Ulrich Weigand
https://reviews.llvm.org/D69749
MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64
regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo
we can find out Mips ABI and pick appropriate prefix.
Tags: #llvm, #clang, #lldb
Differential Revision: https://reviews.llvm.org/D66795
The static analyzer is warning about a potential null dereference, but we should be able to use cast<> directly and if not assert will fire for us.
llvm-svn: 375430
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, jyknight, sdardis, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69216
llvm-svn: 375398
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68993
llvm-svn: 375084