* Reordered MVT simple types to group scalable vector types
together.
* New range functions in MachineValueType.h to only iterate over
the fixed-length int/fp vector types.
* Stopped backends which don't support scalable vector types from
iterating over scalable types.
Reviewers: sdesmalen, greened
Reviewed By: greened
Differential Revision: https://reviews.llvm.org/D66339
llvm-svn: 372099
The adds some very basic folding of PREDICATE_CASTS, removing cases when they
are chained together. These would already be removed eventually, as these are
lowered to copies. This just allows it to happen earlier, which can help other
simplifications.
Differential Revision: https://reviews.llvm.org/D67591
llvm-svn: 372012
Lower CTTZ on MVE using VBRSR and VCLS which will reverse the bits and
count the leading zeros, equivalent to a count trailing zeros (CTTZ).
llvm-svn: 372000
Masked loads and store fit naturally with MVE, the instructions being easily
predicated. This adds lowering for the simple cases of masked loads and stores.
It does not yet deal with widening/narrowing or pre/post inc, and so is
currently behind an option.
The llvm masked load intrinsic will accept a "passthru" value, dictating the
values used for the zero masked lanes. In MVE the instructions write 0 to the
zero predicated lanes, so we need to match a passthru that isn't 0 (or undef)
with a select instruction to pull in the correct data after the load.
Differential Revision: https://reviews.llvm.org/D67186
llvm-svn: 371932
This patch adds vecreduce_smax, vecredude_umax, vecreduce_smin, vecreduce_umin and selection for vmaxv and minv.
Differential Revision: https://reviews.llvm.org/D66413
llvm-svn: 371827
These predicate vectors can usually be loaded and stored with a single
instruction, a VSTR_P0. However this instruction will store the entire P0
predicate, 16 bits, zeroextended to 32bits. Each lane of the the
v4i1/v8i1/v16i1 representing 4/2/1 bits.
As far as I understand, when llvm says "store this v4i1", it really does need
to store 4 bits (or 8, that being the size of a byte, with this bottom 4 as the
interesting bits). For example a bitcast from a v8i1 to a i8 is defined as a
store followed by a load, which is how the code is expanded.
So this instead lowers the v4i1/v8i1 load/store through some shuffles to get
the bits into the correct positions. This, as you might imagine, is not as
efficient as a single instruction. But I believe it is needed for correctness.
v16i1 equally should not load/store 32bits, only storing the 16bits of data.
Stack loads/stores are still using the VSTR_P0 (as can be seen by the test not
changing). This is fine as they are self-consistent, it is only "externally
observable loads/stores" (from our point of view) that need to be corrected.
Differential revision: https://reviews.llvm.org/D67085
llvm-svn: 371419
This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together.
This is useful for various MVE instructions, such as vmla and others that take R registers.
Loop tests have been added to the vmla test file to make sure vmlas are generated in loops.
Differential revision: https://reviews.llvm.org/D66295
llvm-svn: 371218
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67229
llvm-svn: 371200
A number of inline assembly constraints are currently supported by LLVM, but rejected as invalid by Clang:
Target independent constraints:
s: An integer constant, but allowing only relocatable values
ARM specific constraints:
j: An immediate integer between 0 and 65535 (valid for MOVW)
x: A 32, 64, or 128-bit floating-point/SIMD register: s0-s15, d0-d7, or q0-q3
N: An immediate integer between 0 and 31 (Thumb1 only)
O: An immediate integer which is a multiple of 4 between -508 and 508. (Thumb1 only)
This patch adds support to Clang for the missing constraints along with some checks to ensure that the constraints are used with the correct target and Thumb mode, and that immediates are within valid ranges (at least where possible). The constraints are already implemented in LLVM, but just a couple of minor corrections to checks (V8M Baseline includes MOVW so should work with 'j', 'N' and 'O' shouldn't be valid in Thumb2) so that Clang and LLVM are in line with each other and the documentation.
Differential Revision: https://reviews.llvm.org/D65863
Change-Id: I18076619e319bac35fbb60f590c069145c9d9a0a
llvm-svn: 371079
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:
- `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
- `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
- `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,
Reviewers: lattner, thegameg, courbet
Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65945
llvm-svn: 371045
This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can
also be used in ISel Lowering, adding codesize values to the computed costs, to
be able to compare either approximate instruction counts or codesize costs.
It also adds a HasLowerConstantMaterializationCost, which compares the
ConstantMaterializationCost of two values, returning true if the first is
smaller either in instruction count/codesize, or falling back to the other in
the case that they are equal.
This is used in constant CSEL lowering to invert the predicate if the opposite
is easier to materialise.
Differential revision: https://reviews.llvm.org/D66701
llvm-svn: 370741
Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value.
This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used.
Code by Ranjeet Singh and Simon Tatham, with some modifications from me.
Differential revision: https://reviews.llvm.org/D66483
llvm-svn: 370739
We should be using MQPR, and if we don't we can get COPYs and PHIs created for
QPR. These get folded into instructions, failing verification checks.
Differential revision: https://reviews.llvm.org/D66214
llvm-svn: 370676
These were never enabled correctly and are causing other problems. Taking them
out for the moment, whilst we work on the issues.
This reverts r370329.
llvm-svn: 370607
Masked loads and store fit naturally with MVE, the instructions being easily
predicated. This adds lowering for the simple cases of masked loads and stores.
It does not yet deal with widening/narrowing or pre/post inc.
The llvm masked load intrinsic will accept a "passthru" value, dictating the
values used for the zero masked lanes. In MVE the instructions write 0 to the
zero predicated lanes, so we need to match a passthru that isn't 0 (or undef)
with a select instruction to pull in the correct data after the load.
We also need to do something with unaligned loads/stores. Currently this uses a
similar method used in big endian, using an VLDRB.8 (and potentially a VREV in
BE). This does mean that the predicate mask is converted from, for example, a
v4i1 to a v16i1. The VLDR instructions are defined as using the first bit of
the relevant mask lane, so this could potentially load different results if the
predicate is little odd. As the input is a v4i1 however, I believe this is OK
and all the bits required should be set in the predicate, making the VLDRB.8
load the same data.
Differential Revision: https://reviews.llvm.org/D66534
llvm-svn: 370329
The patch fixed the issue that RV64 didn't clear the upper bits
when return complex floating value with lp64 ABI.
float _Complex
complex_add(float _Complex a, float _Complex b)
{
return a + b;
}
RealResult = zero_extend(RealA + RealB)
ImageResult = ImageA + ImageB
Return (RealResult | (ImageResult << 32))
The patch introduces shouldExtendTypeInLibCall target hook to suppress
the AssertZext generation when lowering floating LibCall.
Thanks to Eli's comments from the Bugzilla
https://bugs.llvm.org/show_bug.cgi?id=42820
Differential Revision: https://reviews.llvm.org/D65497
llvm-svn: 370275
Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent.
Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri
Subscribers: javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66804
llvm-svn: 370190
The CodeGen/Thumb2/mve-vaddv.ll test needed to be amended to reflect the
changes from the above patch.
This reverts commit cd53ff6, reapplying 7c6b229.
llvm-svn: 369638
It broke the bots, see e.g. http://lab.llvm.org:8011/builders/clang-cuda-build/builds/36275/
> This patch fixes shifts by a 128/256 bit shift amount. It also fixes
> codegen for shifts of 32 by delegating to LLVM's default optimisation
> instead of emitting a long shift.
>
> Tests that used to generate long shifts of 32 are updated to check for the
> more optimised codegen.
>
> Differential revision: https://reviews.llvm.org/D66519
>
> llvm-svn: 369626
llvm-svn: 369636
This patch fixes shifts by a 128/256 bit shift amount. It also fixes
codegen for shifts of 32 by delegating to LLVM's default optimisation
instead of emitting a long shift.
Tests that used to generate long shifts of 32 are updated to check for the
more optimised codegen.
Differential revision: https://reviews.llvm.org/D66519
llvm-svn: 369626
The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497.
The struct contain argument flags which will pass to makeLibCall function.
The patch should not has any functionality changes.
Differential Revision: https://reviews.llvm.org/D65795
llvm-svn: 369622
This patch adds vecreduce_add and the relevant instruction selection for
vaddv.
Differential revision: https://reviews.llvm.org/D66085
llvm-svn: 369245
Push LR register before calling __gnu_mcount_nc as it expects the value of LR register to be the top value of
the stack on ARM32.
Differential Revision: https://reviews.llvm.org/D65019
llvm-svn: 369147
We don't yet know how to generate these instructions for MVE. And in the case
of VLD3, we don't even have the instruction. For the moment don't tell the
vectoriser that we have VLD4, just to end up serialising the results.
Differential Revision: https://reviews.llvm.org/D66009
llvm-svn: 369101
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
This adds pre- and post- increment and decrements for MVE loads and stores. It
uses the builtin pre and post load/store detection, unlike Neon. Loads are
selected with the code in tryT2IndexedLoad, stores are selected with tablegen
patterns. The immediates have a +/-7bit range, multiplied by the size of the
element.
Differential Revision: https://reviews.llvm.org/D63840
llvm-svn: 368305
This adds some missing patterns for big endian loads/stores, allowing unaligned
loads/stores to also be selected with an extra VREV, which produces better code
than aligning through a stack. Also moves VLDR_P0 to not be LE only, and
adjusts some of the tests to show all that working.
Differential Revision: https://reviews.llvm.org/D65583
llvm-svn: 368304
VLDRH needs to have an alignment of at least 2, including the
widening/narrowing versions. This tightens up the ISel patterns for it and
alters allowsMisalignedMemoryAccesses so that unaligned accesses are expanded
through the stack. It also fixed some incorrect shift amounts, which seemed to
be passing a multiple not a shift.
Differential Revision: https://reviews.llvm.org/D65580
llvm-svn: 368256
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet, jfb, jakehehrlich
Reviewed By: jfb
Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65514
llvm-svn: 367828
This is extremely specific, but saves three instructions when it's
legal. I don't think the code can be usefully generalized.
Differential Revision: https://reviews.llvm.org/D65351
llvm-svn: 367492
Thumb1 has very limited immediate modes, so turning an "and" into a
shift can save multiple instructions.
It's possible to simplify the generated code for test2 and test3 in
cmp-and-fold.ll a little more, but I'll implement that as a followup.
Differential Revision: https://reviews.llvm.org/D65175
llvm-svn: 367491
These are some better patterns for converting between predicates and floating
points. Much like the extends, we select "1"/"-1" or "0" depending on the
predicate value. Or we perform a compare against 0 to convert to a predicate.
Differential Revision: https://reviews.llvm.org/D65103
llvm-svn: 367191
This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and
VCMPZ with an extra operand as the condition code. I believe this will make
some combines simpler, allowing us to just look at these codes and not the
operands. It also helps fill in a missing VCGTUZ MVE selection without adding
extra nodes for it.
Differential Revision: https://reviews.llvm.org/D65072
llvm-svn: 366934
The prevents us from trying to convert an i1 predicate vector to a float, or
vice-versa. Better patterns are possible, which will follow in a subsequent
commit. For now we just expand them.
Differential Revision: https://reviews.llvm.org/D65066
llvm-svn: 366931
This adds a DeMorgan combine for OR's of compares to turn them into AND's,
helping prevent them from going into and out of gpr registers. It also fills in
the VCLE and VCLT nodes that MVE can select, allowing it to invert more
compares.
Differential Revision: https://reviews.llvm.org/D65059
llvm-svn: 366920
Much like integers, this adds MVE floating point compares and select. It
requires a lot more buildvector/shuffle code because we may need to expand the
compares without mve.fp, and requires support for and/or because of the way we
lower llvm condition codes.
Some original code by David Sherwood
Differential Revision: https://reviews.llvm.org/D65054
llvm-svn: 366909
This adds support code for building and shuffling i1 predicate registers. It
generally uses two basic principles, either converting the predicate into an
scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or
by converting the register to an full vector register and back.
Some of the code here is a not super efficient but will hopefully cover most
cases of moving i1 vectors around and can be improved in subsequent patches.
Some code by David Sherwood.
Differential Revision: https://reviews.llvm.org/D65052
llvm-svn: 366890
This adds the very basics for MVE vector predication, adding integer VCMP and
VSEL instruction support. This is done through predicate registers (MVT::v16i1,
MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom
lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc).
An extra VCNE was added, as this can be handled sensibly by MVE's expanded
number of VCMP condition codes. (There are also VCLE and VCLT which are added
later).
VPSEL is also added here, simply selecting on the vselect.
Original code by David Sherwood.
Differential Revision: https://reviews.llvm.org/D65051
llvm-svn: 366885