Variable Shifts/Rotates using the CL register have different behaviours to the immediate instructions - split accordingly to help remove yet more repeated overrides from the schedule models.
llvm-svn: 342852
Confirmed with Craig Topper - fix a typo that was missing a Port4 uop for ROR*mCL instructions on some Intel models.
Yet another step on the scheduler model cleanup marathon......
llvm-svn: 342846
This is an alternative to https://reviews.llvm.org/D37896. We can't decompose
multiplies generically without a target hook to tell us when it's profitable.
ARM and AArch64 may be able to remove some existing code that overlaps with
this transform.
This extends D52195 and may resolve PR34474:
https://bugs.llvm.org/show_bug.cgi?id=34474
(still an open question about transforming legal vector multiplies, but we
could open another bug report for those)
llvm-svn: 342844
The SandyBridge model was missing schedule values for the RCL/RCR values - instead using the (incredibly optimistic) WriteShift (now WriteRotate) defaults.
I've added overrides with more realistic (slow) values, based on a mixture of Agner/instlatx64 numbers and what later Intel models do as well.
This is necessary to allow WriteRotate to be updated to remove other rotate overrides.
It'd probably be a good idea to investigate a WriteRotateCarry class at some point but its not high priority given the unusualness of these instructions.
llvm-svn: 342842
Despite being rotates, these more modern instructions avoid many of the quirks of the regular x86 rotate instructions and consistently have a schedule closer to shifts.
llvm-svn: 342839
NFCI for now, but it should make it easier to remove a lot of unnecessary overrides in a future commit.
Now that funnel shift intrinsics are coming online we need to get this cleaned up to make vectorization costs from scalar rotate patterns more straightforward.
llvm-svn: 342837
Our lowering that tries to avoid this sign extend can be defeated by the DAG combine folding it with a truncate.
The pattern needs to extend to an v8i32 then truncate back down to v8i16.
llvm-svn: 342830
Summary:
Specifying X[8-15,18] registers as callee-saved is used to support
CONFIG_ARM64_LSE_ATOMICS in Linux kernel. As part of this patch we:
- use custom CSR list/mask when user specifies custom CSRs
- update Machine Register Info's list of CSRs with additional custom CSRs in
LowerCall and LowerFormalArguments.
Reviewers: srhines, nickdesaulniers, efriedma, javed.absar
Reviewed By: nickdesaulniers
Subscribers: kristof.beyls, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D52216
llvm-svn: 342824
Summary: Similar to D51893 which was for memcpy
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52063
llvm-svn: 342796
We don't have a vXi8 shift left so we need to bitcast to a vXi16 vector to perform the shift. If we let lowering legalize the vXi8 shift we get an extra and that we don't need and fail to remove.
llvm-svn: 342795
Previously we used SUBREG_TO_REG+MOV32ri. But regular isel was changed recently to use the MOV32ri64 pseudo. Fast isel now does the same.
llvm-svn: 342788
Summary:
By using the existing isCodeGenOnly bit in the tablegen defs, as
suggested by tlively in https://reviews.llvm.org/D51662
Tested: llvm-lit -v `find test -name WebAssembly`
Reviewers: tlively
Subscribers: dschuff, sbc100, jgravelle-google, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52373
llvm-svn: 342772
This patch introduces a SchedWriteVariant to describe zero-idiom VXORP(S|D)Yrr
and VANDNP(S|D)Yrr.
This is a follow-up of r342555.
On Jaguar, a VXORPSYrr is 2 macro opcodes. Only one opcode is eliminated at
register-renaming stage. The other opcode has to be executed to set the upper
half of the destination YMM.
Same for VANDNP(S|D)Yrr.
Differential Revision: https://reviews.llvm.org/D52347
llvm-svn: 342728
Summary:
The default target of the switch instruction may sometimes be an
"unreachable" block, when it is guaranteed that one of the cases is
always taken. The dominator tree concludes that such a switch
instruction does not have an immediate post dominator. This confuses
divergence analysis, which is unable to propagate sync dependence to
the targets of the switch instruction.
As a workaround, the AMDGPU target now invokes lower-switch as a
preISel pass. LowerSwitch is designed to handle the unreachable
default target correctly, allowing the divergence analysis to locate
the correct immediate dominator of the now-lowered switch.
Reviewers: arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, simoll
Differential Revision: https://reviews.llvm.org/D52221
llvm-svn: 342722
Summary: This change is the first part of the AMDGPU target description
change. The aim of it is the effective splitting the vector and scalar
flows at the selection stage. Selection uses predicate functions based
on the framework implemented earlier - https://reviews.llvm.org/D35267
Differential revision: https://reviews.llvm.org/D52019
Reviewers: rampitec
llvm-svn: 342719
Currently, BPF has XADD (locked add) insn support and the
asm looks like:
lock *(u32 *)(r1 + 0) += r2
lock *(u64 *)(r1 + 0) += r2
The instruction itself does not have a return value.
At the source code level, users often use
__sync_fetch_and_add()
which eventually translates to XADD. The return value of
__sync_fetch_and_add() is supposed to be the old value
in the xadd memory location. Since BPF::XADD insn does not
support such a return value, this patch added a PreEmit
phase to check such a usage. If such an illegal usage
pattern is detected, a fatal error will be reported like
line 4: Invalid usage of the XADD return value
if compiled with -g, or
Invalid usage of the XADD return value
if compiled without -g.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 342692
Summary: Adds the necessary support to lib/ObjectYAML and fixes SIMD
calls to allow the tests to work. Also removes some dead code that
would otherwise have to have been updated.
Reviewers: aheejin, dschuff, sbc100
Subscribers: jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52105
llvm-svn: 342689
x86 had 2 versions of peekThroughBitcast. DAGCombiner had 1. Plus, it had a 1-off implementation for the one-use variant.
Move the x86 versions of the code to SelectionDAG, so we don't have different copies of the code.
No functional change intended.
I'm putting this next to isBitwiseNot() because I am planning to use it in there. Another option is next to the
helpers in the ISD namespace (eg, ISD::isConstantSplatVector()). But if there's no good reason for those to be
there, I'd prefer to pull other helpers over to SelectionDAG in follow-up steps.
Differential Revision: https://reviews.llvm.org/D52285
llvm-svn: 342669
This is a trivial refactoring that I'm committing now as it makes a patch I'm
about to post for review easier to follow. There is some overlap between
evaluateConstantImm and addExpr in RISCVAsmParser. This patch allows
evaluateConstantImm to be reused from addExpr to remove this overlap. The
benefit will be greater when a future patch adds extra code to allows
immediates to be evaluated from constant symbols (e.g. `.equ CONST, 0x1234`).
No functional change intended.
llvm-svn: 342641
Examples such as `jal a3`, `j a3` and `jal a3, a3` are accepted by gas
but rejected by LLVM MC. This patch rectifies this. I introduce
RISCVAsmParser::parseJALOffset to ensure that symbol names that coincide with
register names can safely be parsed. This is made a somewhat fiddly due to the
single-operand alias form (see the comment in parseJALOffset for more info).
Differential Revision: https://reviews.llvm.org/D52029
llvm-svn: 342629
Building a vector out of multiple loads can be converted to a load of the vector type if the loads are consecutive.
But the special condition is that the element number is 1, such as <1 x i128>. So just early exit to fix the assert.
Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52072
llvm-svn: 342611
Summary:
This change leaves holes in the opcode space where missing
instructions could logically be added later if they were found to be
useful.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52282
llvm-svn: 342610
Enable enableMultipleCopyHints() on X86.
Original Patch by @jonpa:
While enabling the mischeduler for SystemZ, it was discovered that for some reason a test needed one extra seemingly needless COPY (test/CodeGen/SystemZ/call-03.ll). The handling for that is resulted in this patch, which improves the register coalescing by providing not just one copy hint, but a sorted list of copy hints. On SystemZ, this gives ~12500 less register moves on SPEC, as well as marginally less spilling.
Instead of improving just the SystemZ backend, the improvement has been implemented in common-code (calculateSpillWeightAndHint(). This gives a lot of test failures, but since this should be a general improvement I hope that the involved targets will help and review the test updates.
Differential Revision: https://reviews.llvm.org/D38128
llvm-svn: 342578
As the code comments suggest, these are about splitting, and they
are not necessarily limited to lowering, so that misled me.
There's nothing that's actually x86-specific in these either, so
they might be better placed in a common header so any target can
use them.
llvm-svn: 342575
The patch extends size reduction pass for MicroMIPS. Two MOVE
instructions are transformed into one MOVEP instrucition.
Patch by Milena Vujosevic Janicic.
Differential revision: https://reviews.llvm.org/D52037
llvm-svn: 342572
The patch fixes definition of MOVEP instruction. Two registers are used
instead of register pairs. This is necessary as machine verifier cannot
handle register pairs.
Patch by Milena Vujosevic Janicic.
Differential revision: https://reviews.llvm.org/D52035
llvm-svn: 342571
This patch adds an initial x86 SimplifyDemandedVectorEltsForTargetNode implementation to handle target shuffles.
Currently the patch only decodes a target shuffle, calls SimplifyDemandedVectorElts on its input operands and removes any shuffle that reduces to undef/zero/identity.
Future work will need to integrate this with combineX86ShufflesRecursively, add support for other x86 ops, etc.
NOTE: There is a minor regression that appears to be affecting further (extractelement?) combines which I haven't been able to solve yet - possibly something to do with how nodes are added to the worklist after simplification.
Differential Revision: https://reviews.llvm.org/D52140
llvm-svn: 342564
Summary:
This is required for GPUs with 16 bit instructions where f16 is a
legal register type and hence int_to_fp i1 to f16 is not lowered
by legalizing.
Reviewers: arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D52018
Change-Id: Ie4c0fd6ced7cf10ad612023c6879724d9ded5851
llvm-svn: 342558
Clang-compiled object files currently don't include the symbol sizes and
types. Some tools however need that information. For example, ctfconvert
uses that information to generate FreeBSD's CTF representation from ELF
files.
With this patch, symbol sizes and types are included in object files.
Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Reported-by: Yutaro Hayakawa <yhayakawa3720@gmail.com>
llvm-svn: 342556
This patch adds the ability for processor models to describe dependency breaking
instructions.
Different processors may specify a different set of dependency-breaking
instructions.
That means, we cannot assume that all processors of the same target would use
the same rules to classify dependency breaking instructions.
The main goal of this patch is to provide the means to describe dependency
breaking instructions directly via tablegen, and have the following
TargetSubtargetInfo hooks redefined in overrides by tabegen'd
XXXGenSubtargetInfo classes (here, XXX is a Target name).
```
virtual bool isZeroIdiom(const MachineInstr *MI, APInt &Mask) const {
return false;
}
virtual bool isDependencyBreaking(const MachineInstr *MI, APInt &Mask) const {
return isZeroIdiom(MI);
}
```
An instruction MI is a dependency-breaking instruction if a call to method
isDependencyBreaking(MI) on the STI (TargetSubtargetInfo object) evaluates to
true. Similarly, an instruction MI is a special case of zero-idiom dependency
breaking instruction if a call to STI.isZeroIdiom(MI) returns true.
The extra APInt is used for those targets that may want to select which machine
operands have their dependency broken (see comments in code).
Note that by default, subtargets don't know about the existence of
dependency-breaking. In the absence of external information, those method calls
would always return false.
A new tablegen class named STIPredicate has been added by this patch to let
processor models classify instructions that have properties in common. The idea
is that, a MCInstrPredicate definition can be used to "generate" an instruction
equivalence class, with the idea that instructions of a same class all have a
property in common.
STIPredicate definitions are essentially a collection of instruction equivalence
classes.
Also, different processor models can specify a different variant of the same
STIPredicate with different rules (i.e. predicates) to classify instructions.
Tablegen backends (in this particular case, the SubtargetEmitter) will be able
to process STIPredicate definitions, and automatically generate functions in
XXXGenSubtargetInfo.
This patch introduces two special kind of STIPredicate classes named
IsZeroIdiomFunction and IsDepBreakingFunction in tablegen. It also adds a
definition for those in the BtVer2 scheduling model only.
This patch supersedes the one committed at r338372 (phabricator review: D49310).
The main advantages are:
- We can describe subtarget predicates via tablegen using STIPredicates.
- We can describe zero-idioms / dep-breaking instructions directly via
tablegen in the scheduling models.
In future, the STIPredicates framework can be used for solving other problems.
Examples of future developments are:
- Teach how to identify optimizable register-register moves
- Teach how to identify slow LEA instructions (each subtarget defining its own
concept of "slow" LEA).
- Teach how to identify instructions that have undocumented false dependencies
on the output registers on some processors only.
It is also (in my opinion) an elegant way to expose knowledge to both external
tools like llvm-mca, and codegen passes.
For example, machine schedulers in LLVM could reuse that information when
internally constructing the data dependency graph for a code region.
This new design feature is also an "opt-in" feature. Processor models don't have
to use the new STIPredicates. It has all been designed to be as unintrusive as
possible.
Differential Revision: https://reviews.llvm.org/D52174
llvm-svn: 342555
This is an alternative to D37896. I don't see a way to decompose multiplies
generically without a target hook to tell us when it's profitable.
ARM and AArch64 may be able to remove some duplicate code that overlaps with
this transform.
As a first step, we're only getting the most clear wins on the vector examples
requested in PR34474:
https://bugs.llvm.org/show_bug.cgi?id=34474
As noted in the code comment, it's likely that the x86 constraints are tighter
than necessary, but it may not always be a win to replace a pmullw/pmulld.
Differential Revision: https://reviews.llvm.org/D52195
llvm-svn: 342554
This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have
updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they
will see no functional change. Previously this hook returned bool, but it now
returns AtomicExpansionKind.
This hook allows targets to select how a given cmpxchg is to be expanded.
D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic.
See my associated RFC for more info on the motivation for this change
<http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>.
Differential Revision: https://reviews.llvm.org/D48130
llvm-svn: 342550
Fixes the unwind information generated for floating-point registers.
Previously, all padding registers were assumed to be four bytes wide. Now, the
width of the register is used to specify the amount of padding.
Patch by Jackson Woodruff!
Differential revision: https://reviews.llvm.org/D51494
llvm-svn: 342545