When we have a precisely known VLEN, we can replace runtime usage of VLENB with compile time constants. This converts offsets involving both fixed and scalable components into fixed offsets. The result is that we avoid the csr read of vlenb, and can often fold the multiply as well.
Differential Revision: https://reviews.llvm.org/D137591
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.
On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.
* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change
A prior version of this patch was commited in 5e82ee5373. 2323a4ee61 reverted
that change because the unit test files caused build errors. The change with fixes
were committed in b88b8307bf but reverted once again e8e92c8313 due to more
build errors.
This commit adds the prior changes and fixes the build error.
Differential Revision: https://reviews.llvm.org/D137440
When selectVOP3PMadMixModsImpl fails, it can still create new copy instr
via selectVOP3ModsImpl. When selectG_FMA_FMAD gives up, new copy instr
will remain dead but will not be automatically removed.
InstructionSelect does not check if instructions created during selection
are dead.
Such dead copy doesn't have register class on dst operand and causes crash.
Fix is to build copy when operands are being added to selected instruction.
Differential Revision: https://reviews.llvm.org/D138044
Before D114230, indirect moves used regular MOV opcodes and were
identified by having an implicit use of M0. Since D114230 they use
dedicated opcodes instead, so remove some old code that checks for
implicit uses of M0. NFCI.
Differential Revision: https://reviews.llvm.org/D138308
While get_active_lane_mask lowering it uses WHILELO instruction,
but forconstant range suitable for PTRUE then we could issue PTRUE instruction
instead.
Differential Revision: https://reviews.llvm.org/D137547
The zip/uzp (2-vector) instruction classes have the incorrect
register constraints and mark the destination as also being an
input. However, the instructions are fully destructive so I've
restructured the classes.
Differential Revision: https://reviews.llvm.org/D138288
1- in streaming mode, use SVE OR/mov instruction instead of NEON OR,
during copying phyReg -AArch64InstrInfo::copyPhysReg-.
2- add testing file:
register-mov.ll
Differential Revision: https://reviews.llvm.org/D138211
First boundary of a region wasn't updated when a sinked instruction was added first into the region.
Reviewed By: vangthao
Differential Revision: https://reviews.llvm.org/D138256
These instructions always output the canonical mnemonic. The GNU tools
emit the canonical mnemonic for the branch pseudo instructions as well
(e.g. "bgt" will be recognised by the assembler but never printed by
objdump).
Reviewed By: xen0n
Differential Revision: https://reviews.llvm.org/D138100
Current lowerVECTOR_SHUFFLEAsVSlidedown only seeks whether input are
EXTRACT_SUBVECTOR and their source are same. The commit will make the
function seek input and their source until they are not
EXTRACT_SUBVECTOR.
Differential Revision: https://reviews.llvm.org/D138025
This diff splits out (from LLVMCore) IR printing passes into IRPrinter.
This structure is similar to what we already have for IRReader and
enables us to avoid circular dependencies between LLVMCore and Analysis
(this is a preparation for https://reviews.llvm.org/D137768).
The legacy interface is left unchanged, once the legacy pass manager
is removed (in the future) we will be able to clean it up further.
The bazel build configuration has been updated as well.
Test plan:
1/ Tested the following cmake configurations: static/dynamic linking * lld/gold * clang/gcc
2/ bazel build --config=generic_clang @llvm-project//...
Differential revision: https://reviews.llvm.org/D138081
This includes instruction formats, definitions, encodings, scheduling
classes, and builtins/intrinsics.
New and improved version of 76536989ba, so much so that even clang
builds with it.
This reverts commit 766536989b.
The commit caused:
clang/include/clang/Basic/BuiltinsHexagonDep.def:1896:69: error: use of undeclared identifier 'HVXV73'
TARGET_BUILTIN(__builtin_HEXAGON_V6_vadd_sf_bf, "V32iV16iV16i", "", HVXV73)
when building `clang`.
When the Linux kernel is compiled without -mretpoline, KCFI fails
ungracefully because it doesn't handle indirect calls with a memory
target operand. Since the KCFI check will need to load the target
address into a register for validating the type hash anyway, simply
unfold memory operands in indirect calls that need a KCFI check.
Fixes#59017
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
isLiteralConstant and isLiteralConstantLike were similar to
!isInlineConstant with slight differences like handling isReg operands.
To avoid a profusion of similar functions with undocumented differences,
this patch removes all the isLiteralConstant* variants. Callers are responsible
for handling the isReg case.
Differential Revision: https://reviews.llvm.org/D125759
When a PTRUE of non-element size is encountered, the PTEST optimization
logic bails out since it cannot handle that type of PTRUE. Instead, it
should be treated as a generic predicate to allow later optimizations trigger.
Differential Revision: https://reviews.llvm.org/D138116
This patch adds tranformation of fmul+fadd/fsub chains to fused multiply
instructions:
* fmul+fadd->fmadd
* fmul+fsub->fmsub/fnmsub
We also will try to combine these instructions if the fmul has more than one use
and cannot be deleted. However, removing the dependence between fmul and fadd can
still be profitable, and we rely on machine combiner approximations of scheduling.
Differential Revision: https://reviews.llvm.org/D136764
This patch fixes some of the V_ADD/SUB_U64_PSEUDO not getting converted to their sdwa form.
We still get below patterns in generated code:
v_and_b32_e32 v0, 0xff, v0
v_add_co_u32_e32 v0, vcc, v1, v0
v_addc_co_u32_e64 v1, s[0:1], 0, 0, vcc
and,
v_and_b32_e32 v2, 0xff, v2
v_add_co_u32_e32 v0, vcc, v0, v2
v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
1st and 2nd instructions of both above examples should have been folded into sdwa add with BYTE_0 src operand.
The reason being the pseudo instruction is broken down into VOP3 instruction pair of V_ADD_CO_U32_e64 and V_ADDC_U32_e64.
The sdwa pass attempts lowering them to their VOP2 form before converting them into sdwa instructions. However V_ADDC_U32_e64
cannot be shrunk to it's VOP2 form if it has non-reg src1 operand.
This change attempts to fix that problem by only shrinking V_ADD_CO_U32_e64 instruction.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D136663
Type legalization will want to turn (srl X, Y) into RISCVISD::SRLW,
which will prevent us from using a BEXT instruction.
This is similar to what we do for (i32 (and (srl X, Y), 1)).
Don't emit deprecated v8-style FP compares & branches when targeting v9
processors.
For now, always use %fcc0, because currently the allocator requires allocatable
registers to also be spillable, which isn't the case with v9 FCC registers.
The work to enable allocation over the entire FCC register file will be done in
a future patch.
Fixes bug #17834
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D135515
Do not emit deprecated v8-style branches when targeting a v9 processor.
As a side effect, this also fixes the emission of useless ba's when doing
conditional branches on 64-bit integer values.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D130006
When expanding a PseudoCALL, the corresponding flags (e.g. nomerge)
need to be passed to the new instruction.
This patch also adds test for the nomerge attribute.
The `nomerge` attribute was added during `LowerCall`, but was lost
during expand PseudoCALL. Now add it back.
Reviewed By: SixWeining
Differential Revision: https://reviews.llvm.org/D137888
Now matches the default SchedWriteVecIMul values used for the instruction.
NOTE: The folded variant overrides are still there as the latency differs by 1cy
This fixes a bug in 'allocateLazySaveBuffer' that led to the
buffer pointer being stored to the wrong address.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D137734
D134260/D138107 exposed that the MachineCombiner was not copying
pcsections metadata where it should. This patch switches the MIBuild
methods to use MIMetadata that can copy the debug loc and pcsections at
the same time.
Differential Revision: https://reviews.llvm.org/D138112
This adds some more scalar instructions that are both associative and
commutative to isAssociativeAndCommutative, allowing the machine
combiner to reassociate them to reduce critical path length.
Differential Revision: https://reviews.llvm.org/D134260