This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system.
llvm-svn: 287206
We save an inter-register file move this way. If there's any CPU where
the FP logic is slower, we could transform this back to int-logic in
MachineCombiner.
This helps, but doesn't solve, PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137
The 'andn' test shows that we're missing a pattern match to
recognize the xor with -1 constant as a 'not' op.
llvm-svn: 287171
Summary:
A lot of the pseudo instructions are required because LLVM assumes that
all integers of the same size as the pointer size are legal. This means
that it will not currently expand 16-bit instructions to their 8-bit
variants because it thinks 16-bit types are legal for the operations.
This also adds all of the CodeGen tests that required the pass to run.
Reviewers: arsenm, kparzysz
Subscribers: wdng, mgorny, modocache, llvm-commits
Differential Revision: https://reviews.llvm.org/D26577
llvm-svn: 287162
We only ever create TargetConstantPool, TargetJumpTable, TargetExternalSymbol,
TargetGlobalAddress, TargetGlobalTLSAddress, MCSymbol and TargetBlockAddress
nodes as operands of X86ISD::Wrapper nodes, so we can remove one check and
invert the other.
Also update the documentation comment for X86ISD::Wrapper.
Differential Revision: https://reviews.llvm.org/D26731
llvm-svn: 287160
One half of the shifts obviously needed conditional selection based on whether
the shift amount is more than 32-bits, but leaving the other half as the
natural shift isn't acceptable either: it's undefined behaviour to shift a
32-bit value by more than 31.
llvm-svn: 287149
Summary:
Extend replaceZeroVectorStore to handle more vector type stores,
floating point zero vectors and set alignment more accurately on split
stores.
This is a follow-up change to r286875.
This change fixes PR31038.
Reviewers: MatzeB
Subscribers: mcrosier, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D26682
llvm-svn: 287142
Summary:
1. Don't try to copy values to and from the same register class.
2. Replace copies with of registers with immediate values with v_mov/s_mov
instructions.
The main purpose of this change is to make MachineSink do a better job of
determining when it is beneficial to split a critical edge, since the pass
assumes that copies will become move instructions.
This prevents a regression in uniform-cfg.ll if we enable critical edge
splitting for AMDGPU.
Reviewers: arsenm
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D23408
llvm-svn: 287131
We can replace "scalar" FP-bitwise-logic with other forms of bitwise-logic instructions.
Scalar SSE/AVX FP-logic instructions only exist in your imagination and/or the bowels of
compilers, but logically equivalent int, float, and double variants of bitwise-logic
instructions are reality in x86, and the float variant may be a shorter instruction
depending on which flavor (SSE or AVX) of vector ISA you have...so just prefer float all
the time.
This is a preliminary step towards solving PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137
Differential Revision:
https://reviews.llvm.org/D26712
llvm-svn: 287122
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen.
LLVM counterpart to D26686
Differential Revision: https://reviews.llvm.org/D26736
llvm-svn: 287108
MipsFastISel uses a a class to represent addresses with a signed member
to represent the offset. MipsFastISel::emitStore, emitLoad and computeAddress
all treated the offset as being positive. In cases where the offset was
actually negative and a frame pointer was used, this would cause the constant
synthesis routine to crash as it would generate an unexpected instruction
sequence when frame indexes are replaced.
Reviewers: vkalintiris
Differential Revision: https://reviews.llvm.org/D26192
llvm-svn: 287099
This patch adds the single operand form of the not alias to microMIPS and
MIPS along with additional tests.
This partially resolves PR/30381.
Thanks to Sean Bruno for reporting the issue!
llvm-svn: 287097
Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.
Reviewers: zvi, delena, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26660
llvm-svn: 287083
Doing this before register allocation reduces register pressure as we do
not even have to allocate a register for those dead definitions.
Differential Revision: https://reviews.llvm.org/D26111
llvm-svn: 287076
For the default, small and medium code model, use the existing
difference from the jump table towards the label. For all other code
models, setup the picbase and use the difference between the picbase and
the block address.
Overall, this results in smaller data tables at the expensive of one or
two more arithmetic operation at the jump site. Given that we only create
jump tables with a lot more than two entries, it is a net win in size.
For larger code models the assumption remains that individual functions
are no larger than 2GB.
Differential Revision: https://reviews.llvm.org/D26336
llvm-svn: 287059
wbinvl.* are vector instruction that do not sue vector registers.
v2: check only M?BUF instructions
Differential Revision: https://reviews.llvm.org/D26633
llvm-svn: 287056
Lower a = b * C where C = (2^n + 1) * 2^m to
add w0, w0, w0, lsl n
lsl w0, w0, m
Differential Revision: https://reviews.llvm.org/D229245
llvm-svn: 287019
The wave barrier represents the discardable barrier. Its main purpose is to
carry convergent attribute, thus preventing illegal CFG optimizations. All lanes
in a wave come to convergence point simultaneously with SIMT, thus no special
instruction is needed in the ISA. The barrier is discarded during code generation.
Differential Revision: https://reviews.llvm.org/D26585
llvm-svn: 287007
This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic.
This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases.
Fix for PR13248
Differential Revision: https://reviews.llvm.org/D26583
llvm-svn: 286979
Move some code inside the proper 'if' block to make sure it is only run once,
when the subtarget is first created. Things can still break if we use different
ARM target machines or if we have functions with different 'target-cpu' or
'target-features', we should fix that too in the future.
llvm-svn: 286974
This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE,
they behaves exactly the same with vec_xl and vec_xst, therefore they are
simply implemented by defining a matching macro. On LE, they are implemented
by defining new builtins and intrinsics. For int/float/long long/double, it
is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short,
we also need some extra shuffling before or after call the builtins to get the
desired BE order. For int128, simply call vec_xl or vec_xst.
llvm-svn: 286967
Summary: This is needed to be able to use this flags in InstrMappings.
Reviewers: tstellarAMD, vpykhtin
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D26666
llvm-svn: 286960
Summary:
Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class.
We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction.
Fixes pr30981.
Reviewers: mkuper, igorb, craig.topper, guyblank, qcolombet
Subscribers: qcolombet, llvm-commits
Differential Revision: https://reviews.llvm.org/D26620
llvm-svn: 286958
For 64bit ABIs it is common practice to use relative Jump Tables with
potentially different relocation bases. As the logic for the jump table
itself doesn't depend on the relocation base, make it easier for targets
to use the generic logic. Start by dropping the now redundant MIPS logic.
Differential Revision: https://reviews.llvm.org/D26578
llvm-svn: 286951
This patch adds the Sched Machine Model for Cortex-R52.
Details of the pipeline and descriptions are in comments
in file ARMScheduleR52.td included in this patch.
Reviewers: rengolin, jmolloy
Differential Revision: https://reviews.llvm.org/D26500
llvm-svn: 286949
Summary:
Add basic functionality to support call lowering for X86.
Currently only supports functions which return void and take zero arguments.
Inspired by commit 286573.
Reviewers: ab, qcolombet, t.p.northover
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26593
llvm-svn: 286935
Implement the Newton series for square root, its reciprocal and reciprocal
natively using the specialized instructions in AArch64 to perform each
series iteration.
Differential revision: https://reviews.llvm.org/D26518
llvm-svn: 286907
For example we were producing
push {r8, r10, r11, r4, r5, r7, lr}
This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.
Fixed usage of std::sort so that we (hopefully) use instantiations that
actually exist in GCC 4.8.
llvm-svn: 286881
Summary:
Replace a splat of zeros to a vector store by scalar stores of WZR/XZR.
The load store optimizer pass will merge them to store pair stores.
This should be better than a movi to create the vector zero followed by
a vector store if the zero constant is not re-used, since one
instructions and one register live range will be removed.
For example, the final generated code should be:
stp xzr, xzr, [x0]
instead of:
movi v0.2d, #0
str q0, [x0]
Reviewers: t.p.northover, mcrosier, MatzeB, jmolloy
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26561
llvm-svn: 286875
For example we were producing
push {r8, r10, r11, r4, r5, r7, lr}
This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.
llvm-svn: 286866
add an intrinsic to expose the 'VSX Scalar Convert Half-Precision to
Single-Precision' instruction.
Differential review: https://reviews.llvm.org/D26536
llvm-svn: 286862
Summary:
Extend image intrinsics to support data types of V1F32 and V2F32.
TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active,
even though such case should be very rare.
Reviewers:
tstellarAMD
Differential Revision:
http://reviews.llvm.org/D26472
llvm-svn: 286860
The Stack slot coloring pass removes a store that is followed by a load
that deal with the same stack slot. The function isLoadFromStackSlot
is supposed to consider the loads that have no side-effects. This
patch fixed the issue by removing the unsafe loads from this function
Eg:
%vreg0<def> = L2_loadruh_io <fi#15>, 0
S2_storeri_io <fi#15>, 0, %vreg0
In this case, we load an unsigned extended half word and store this in to
the same stack slot. The Stack slot coloring pass considers safe to remove
the store. This patch marked all the non-vector byte and half word loads as
unsafe.
llvm-svn: 286843
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.
This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)
llvm-svn: 286832
-Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions.
-Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions.
-Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax.
-Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing.
This should fix at least some of PR28850.
llvm-svn: 286787
nThis avoids the nasty problems caused by using
memory instructions that read the exec mask while
spilling / restoring registers used for control flow
masking, but only for VI when these were added.
This always uses the scalar stores when enabled currently,
but it may be better to still try to spill to a VGPR
and use this on the fallback memory path.
The cache also needs to be flushed before wave termination
if a scalar store is used.
llvm-svn: 286766
These will be used to replace the masked intrinsics so that InstCombineCalls can optimize the AVX-512 variable shifts the same way it does for AVX2.
llvm-svn: 286754
After this I'll add the unmasked intrinsics to InstCombineCalls to finish making our handling of these types of shuffles consistent between AVX-512 and the legacy intrinsics.
llvm-svn: 286725
Summary:
This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend.
This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang.
Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts.
Reviewers: RKSimon, zvi, delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26333
llvm-svn: 286711
Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases.
Reviewers: delena, RKSimon
Subscribers: Farhana, llvm-commits
Differential Revision: https://reviews.llvm.org/D26297
llvm-svn: 286709
Summary:
This pass was assuming that when a PHI instruction defined a register
used by another PHI instruction that the defining insstruction would
be legalized before the using instruction.
This assumption was causing the pass to not legalize some PHI nodes
within divergent flow-control.
This fixes a bug that was uncovered by r285762.
Reviewers: nhaehnle, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D26303
llvm-svn: 286676
This patch corresponds to review:
https://reviews.llvm.org/D26480
Adds all the intrinsics used for various permute builtins that will
be added to altivec.h.
llvm-svn: 286638
Summary:
Fix off-by-one indexing error in loop checking that inserted value was a
splat vector.
Add code to check that INSERT_VECTOR_ELT nodes constructing the splat
vector have the expected constant index values.
Reviewers: t.p.northover, jmolloy, mcrosier
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D26409
llvm-svn: 286616
This patch corresponds to review:
https://reviews.llvm.org/D26307
Adds all the intrinsics used for various conversion builtins that will
be added to altivec.h. These are type conversions between various types of
vectors.
llvm-svn: 286596
This adds support for the compare logical and trap (memory)
instructions that were added as part of the miscellaneous
instruction extensions feature with zEC12.
llvm-svn: 286587
This adds support for the LZRF/LZRG/LLZRGF instructions that were
added on z13, and uses them for code generation were appropriate.
SystemZDAGToDAGISel::tryRISBGZero is updated again to prefer LLZRGF
over RISBG where both would be possible.
llvm-svn: 286586
This adds support for the 31-to-64-bit zero extension instructions
LLGT and LLGTR and uses them for code generation where appropriate.
Since this operation can also be performed via RISBG, we have to
update SystemZDAGToDAGISel::tryRISBGZero so that we prefer LLGT
over RISBG in case both are possible. The patch includes some
simplification to the tryRISBGZero code; this is not intended
to cause any (further) functional change in codegen.
llvm-svn: 286585
The NamedRegionTimer initializer without a group name puts the Timer
into the "Misc" group and is (nearly) unused. Remove it.
The only user of this constructor appears to be the HexagonGenInsert pass,
which creates a counter without group to count the complete execution
time of that pass, however since every pass gets a counter by the
PassManager anyway this should be unnecessary. Also removed the
pointless TimerGroup there.
Differential Revision: https://reviews.llvm.org/D25582
llvm-svn: 286524
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself. However, the original code did not properly consider this condition
if returned by a target. This patch addresses the issues to allow a target
to compute the series on its own.
Differential revision: https://reviews.llvm.org/D22975
llvm-svn: 286523
Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata.
However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html).
This patch lets AMDGPU backend emits runtime metadata as a note element in .note section.
Differential Revision: https://reviews.llvm.org/D25781
llvm-svn: 286502
The version of this instruction with the .w suffix already correctly accepts
this, but the alias without the .w did not.
Differential Revision: https://reviews.llvm.org/D26499
llvm-svn: 286446
Suspected to be the cause of a sanitizer-windows bot failure:
Assertion failed: isImm() && "Wrong MachineOperand accessor", file C:\b\slave\sanitizer-windows\llvm\include\llvm/CodeGen/MachineOperand.h, line 420
llvm-svn: 286385
A relocatable immediate is either an immediate operand or an operand that
can be relocated by the linker to an immediate, such as a regular symbol
in non-PIC code.
Start using relocImm for 32-bit and 64-bit MOV instructions, and for operands
of type "imm32_su". Remove a number of now-redundant patterns.
Differential Revision: https://reviews.llvm.org/D25812
llvm-svn: 286384
For pairs of 32-bit registers: isub_lo, isub_hi.
For pairs of vector registers: vsub_lo, vsub_hi.
Add generic subreg indices: ps_sub_lo, ps_sub_hi, and a function
HexagonRegisterInfo::getHexagonSubRegIndex(RegClass, GenericSubreg)
that returns the appropriate subreg index for RegClass.
llvm-svn: 286377
The name/comment of the third argument to the ScheduleDAGMI constructor
is RemoveKillFlags and not IsPostRA. Only the comments are changed.
Review: A Trick
llvm-svn: 286350
This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions.
If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result.
It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result.
Differential Revision: https://reviews.llvm.org/D26331
llvm-svn: 286345
Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves.
Reviewers: delena, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26330
llvm-svn: 286344
Summary:
This is needed to make the v64i8 and v32i16 types legal for the 512-bit VBMI instructions. Fixes PR30912.
Reviewers: delena, zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26322
llvm-svn: 286339
This completes assembler / disassembler support for all BFP
instructions provided by the floating-point extensions facility.
The instructions added here are not currently used for codegen.
llvm-svn: 286285
Add several instructions that operate on the program mask
or the addressing mode. These are not really needed for
code generation under Linux, but are provided for completeness
for the assembler/disassembler.
llvm-svn: 286284
Add the 16 access registers as LLVM registers. This allows removing
a lot of special cases in the assembler and disassembler where we
were handling access registers; this can all just use the generic
register code now.
Also add a bunch of instructions to operate on access registers,
for assembler/disassembler use only. No change in code generation
intended.
llvm-svn: 286283
Since IMPLIFIT_DEF instructions are omitted in the output, when the output
of an IMPLICIT_DEF instruction is stackified, the resulting register lacks
an explicit push, leading to a push/pop mismatch. Fix this by converting
such IMPLICIT_DEFs into CONST_I32 0 instructions so that they have explicit
pushes.
llvm-svn: 286274
Define a couple of additional semantic classes and use them
throughout the .td files to make them more consistent and
more easily readable.
No functional change.
llvm-svn: 286268
This changes the InstRR (and related) patterns to no longer
automatically add an "r" at the end of the mnemonic. This
makes the .td files more obviously understandable, and also
allows using the patterns for those few instructions that
do not follow the *r scheme.
Also add some more sub-formats of the RRF format class, to
match operand names and sequence from the PoP better.
No functional change.
llvm-svn: 286267
Now that we've added instruction format subclasses like
InstRIb, it makes sense to rename the old InstRI to InstRIa.
Similar for InstRX, InstRXY, InstRS, InstRSY, and InstSS.
No functional change.
llvm-svn: 286266
Rework patterns for branches, call & return instructions,
compare-and-branch, compare-and-trap, and conditional move
instructions.
In particular, simplify creation of patterns for the extended
opcodes of instructions that take a CC mask.
Also, use semantical instruction classes for all the instructions
instead of open-coding them in SystemZInstrInfo.td.
Adds a couple of the basic branch instructions (that are unused
for codegen) for the assembler/disassembler.
llvm-svn: 286263
Summary: In addition, the branch instructions will have proper BB destinations, not offsets, like before.
Reviewers: asl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23718
llvm-svn: 286252
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.
This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.
Differential Revision: https://reviews.llvm.org/D25910
llvm-svn: 286233
Under -enable-unsafe-fp-math, SELECT_CC lowering in AArch64
transforms floating point comparisons of the form "a == 0.0 ? 0.0 : x" to
"a == 0.0 ? a : x". But it incorrectly assumes that 'x' and 'a' have
the same type which can lead to a wrong CSEL node that crashes later
due to nonsensical copies.
Differential Revision: https://reviews.llvm.org/D26394
llvm-svn: 286231
Self-referencing PHI nodes need their destination operands to be constrained
because nothing else is likely to do so. For now we just pick a register class
naively.
Patch mostly by Ahmed again.
llvm-svn: 286183
Codegen prepare sinks comparisons close to a user is we have only one register
for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions.
Changed BE to report we have many condition registers. That way IR LICM pass
would hoist an invariant comparison out of a loop and codegen prepare will not
sink it.
With that done a condition is calculated in one block and used in another.
Current behavior is to store workitem's condition in a VGPR using v_cndmask
and then restore it with yet another v_cmp instruction from that v_cndmask's
result. To mitigate the issue a forward propagation of a v_cmp 64 bit result
to an user is implemented. Additional side effect of this is that we may
consume less VGPRs in a cost of more SGPRs in case if holding of multiple
conditions is needed, and that is a clear win in most cases.
llvm-svn: 286171
Summary:
Some vector loads and stores generated from AArch64 intrinsics alias each other
unnecessarily, preventing better scheduling. We just need to transfer memory
operands during lowering.
Reviewers: mcrosier, t.p.northover, jmolloy
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26313
llvm-svn: 286168
Because we shift the stack pointer by an unknown amount, we need an
additional pointer. In the case where we have variable-size objects
as well, we can't reuse the frame pointer, thus three pointers.
Patch by Jacob Gravelle
Differential Revision: https://reviews.llvm.org/D26263
llvm-svn: 286160
If the branch was on a read-undef of vcc, passes that used
analyzeBranch to invert the branch condition wouldn't preserve
the undef flag resulting in a verifier error.
Fixes verifier failures in a future commit.
Also fix verifier error when inserting copy for vccz
corruption bug.
llvm-svn: 286133
* Use a generic vector unit to model the issue unit more accurately.
* Update some vector instructions that actually use the vector unit for more
than one cycle.
Review: Ulrich Weigand
llvm-svn: 286112
IssueWidth updated to reflect the capacity of the issue unit correctly.
Correct number of FX and LS units modelled (2, was 1).
Review: Ulrich Weigand
llvm-svn: 286109
When the base register (register pointing to the jump table) is the PC, we expect the jump table to directly follow the jump sequence with no intervening padding.
If there is intervening padding, the calculated offsets will not be correct. One solution would be to account for any padding in the emitted LDRB instruction, but at the moment we don't support emitting MCExprs for the load offset.
In the meantime, it's correct and only a slight amount worse to just move the padding up, from just before the jump table to just before the jump instruction sequence. We can do that by emitting code alignment before the jump sequence, as we know the number of instructions in the sequence is always 4.
llvm-svn: 286107
Cmake has not recognized that Hexagon.td has a new dependency in
HexagonPatterns.td. All changes to that file were not visible to
the build bots.
llvm-svn: 286084
This handles the last case of the builtin function calls that we would
generate code which differed from Microsoft's ABI. Rather than
generating a call to `__pow{d,s}i2` we now promote the parameter to a
float or double and invoke `powf` or `pow` instead.
Addresses PR30825!
llvm-svn: 286082
The clr/set/toggle-bit instructions (with the bit index given as an
immediate operand) had both, custom selection code that generated them,
and selection patterns at the same time. The selection patterns were
not used, because the custom selection code was executed first.
This patch removes the custom code in favor of the selection patterns.
The custom code handled 64-bit registers as well with an immediate bit
index, and so new patterns were added to implement that.
It was also the same case for the instruction "Rd += asr(Rs, Rt)",
except that the custom code did not offer any additional functionality,
and was simply removed.
llvm-svn: 286080
We are repeatedly calling computeZeroableShuffleElements in many shuffle lowering calls for the same shuffle mask/inputs.
This is a first step towards reusing the zeroable result, initially just for lowerVectorShuffleAsShift calls.
llvm-svn: 286037
behind the test that the MachineModuleInfo analysis was
actually available and can be used.
While the MachO bits may well be reasonable to assume in the darwin
assembly printer, the analysis isn't constructively guaranteed anywhere
I could find so it seems safest to avoid crashing here.
This issue was found with PVS-Studio. Pretty sure the Clang Static
Anaylzer flags similar issues but we've probably never pointed it at
this code effectively.
llvm-svn: 285972
Summary: ARMv6m supports dmb etc fench instructions but not ldrex/strex etc. So for some atomic load/store, LLVM should inline instructions instead of lowering to __sync_ calls.
Reviewers: rengolin, efriedma, t.p.northover, jmolloy
Subscribers: efriedma, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D26120
llvm-svn: 285969
This file is unused as of r285939, but we need to keep it around
for bots that don't do full rebuilds. We should be able to delete this
again in a few days.
llvm-svn: 285948
Quite sad we still aren't really using aggressive dead code warnings
from Clang that we could potentially use to catch this and so many other
things.
llvm-svn: 285936
hange explores the fact that LDS reads may be reordered even if access
the same location.
Prior the change, algorithm immediately stops as soon as any memory
access encountered between loads that are expected to be merged
together. Although, Read-After-Read conflict cannot affect execution
correctness.
Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%.
Also improvement expected on any massive sequences of reads from LDS.
Differential Revision: https://reviews.llvm.org/D25944
llvm-svn: 285919
This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk.
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.
llvm-svn: 285893
This fixes selection of KANDN instructions and allows us to remove an extra set of patterns for KNOT and KXNOR.
Reviewers: delena, igorb
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26134
llvm-svn: 285878
2 new intrinsics covering AVX-512 compress/expand functionality.
This implementation includes syntax, DAG builder, operation lowering and tests.
Does not include: handling of illegal data types, codegen prepare pass and the cost model.
llvm-svn: 285876
Summary:
The post-RA scheduler occasionally uses additional implicit operands when
the vector implicit operand as a whole is killed, but some subregisters
are still live because they are directly referenced later. Unfortunately,
this seems incredibly subtle to reproduce.
Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test
and others.
Reviewers: arsenm, tstellarAMD
Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D25656
llvm-svn: 285835
Summary:
Remove this pass from addMachineSSAOptimization() and register it unconditionally in through addPreRegAlloc(). This pass is required for generating correct PIC calls.
Reviewers: sdardis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26036
llvm-svn: 285814
GPRC and GPRC_NOR0 (or the 64bit equivalent) and not just the latter.
GPRC_NOR0 contains ZERO as alternative meaning of r0 and is therefore
not a true subclass of GPRC.
llvm-svn: 285813
While bootstrapping Clang with recent `gcc 6.2.0` I found a bug related to misleading indentation.
I believe, a pair of `{}` was forgotten, especially given the above similar piece of code:
```
if (!RDef || !HII->isPredicable(*RDef)) {
Done = coalesceRegisters(RD, RegisterRef(S1));
if (Done) {
UpdRegs.insert(RD.Reg);
UpdRegs.insert(S1.getReg());
}
}
```
Reviewers: kparzysz
Differential Revision: https://reviews.llvm.org/D26204
llvm-svn: 285794
This is enough to compile and link but doesn't yet do anything particularly
useful. Once an ASM parser and printer are added in the next two patches, the
whole thing can be usefully tested.
Differential Revision: https://reviews.llvm.org/D23562
llvm-svn: 285770
For now, only add instruction definitions for basic ALU operations. Our
initial target is a working MC layer rather than codegen, so appropriate
SelectionDAG patterns will come later.
Differential Revision: https://reviews.llvm.org/D23561
llvm-svn: 285769
This is the conservatively correct way because it's easy to
move or replace a scalar immediate. This was incorrect in the case
when the register class wasn't known from the static instruction
definition, but still needed to be an SGPR. The main example of this
is inlineasm has an SGPR constraint.
Also start verifying the register classes of inlineasm operands.
llvm-svn: 285762
Instructions with a 32-bit base encoding with an optional
32-bit literal encoded after them report their size as 4
for the disassembler. Consider these when computing the
MachineInstr size. This fixes problems caused by size estimate
consistency in BranchRelaxation.
llvm-svn: 285743
For example, rename s6Ext to s6_0Ext. The names for shifted integers
include the underscore and this will make the naming consistent. It
also exposed a few duplicates that were removed.
llvm-svn: 285728
This will prevent following regression when enabling i16 support (D18049):
test/CodeGen/AMDGPU/ctlz.ll
test/CodeGen/AMDGPU/ctlz_zero_undef.ll
Differential Revision: https://reviews.llvm.org/D25802
llvm-svn: 285716
This contains just enough for lib/Target/RISCV to compile. Notably a basic
RISCVTargetMachine and RISCVTargetInfo. At this point you can attempt llc
-march=riscv32 myinput.ll and will find it fails due to the lack of
MCAsmInfo.
See http://lists.llvm.org/pipermail/llvm-dev/2016-August/103748.html for
further discussion
Differential Revision: https://reviews.llvm.org/D23560
llvm-svn: 285712
As it stands, the OperandMatchResultTy is only included in the generated
header if there is custom operand parsing. However, almost all backends
make use of MatchOperand_Success and friends from OperandMatchResultTy for
e.g. parseRegister. This is a pain when starting an AsmParser for a new
backend that doesn't yet have custom operand parsing. Move the enum to
MCTargetAsmParser.h.
This patch is a prerequisite for D23563
Differential Revision: https://reviews.llvm.org/D23496
llvm-svn: 285705
I wanted to implement this as a target independent expansion, however when
targets say they want to expand FP_TO_FP16 what they actually want is
the unsafe math expansion when possible and expansion to a libcall in all
other cases.
The only way to make this work as a target independent would be to add logic
to target's TargetLowering construction to mark theses nodes as Expand when
LegalizeDAG can use the unsafe expansion and mark them as LibCall when it
cannot. I think this would be possible, but I think it would be too fragile
and complex as it would require targets to keep their expansion logic up
to date with the code in LegalizeDAG.
Reviewers: bogner, ab, t.p.northover, arsenm
Subscribers: wdng, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D25999
llvm-svn: 285704
[Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment]
The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.
It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.
TBB example:
Before: lsls r0, r0, #2 After: add r0, pc
adr r1, .LJTI0_0 ldrb r0, [r0, #6]
ldr r0, [r0, r1] lsls r0, r0, #1
mov pc, r0 add pc, r0
=> No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.
The only case that can increase dynamic instruction count is the TBH case:
Before: lsls r0, r4, #2 After: lsls r4, r4, #1
adr r1, .LJTI0_0 add r4, pc
ldr r0, [r0, r1] ldrh r4, [r4, #6]
mov pc, r0 lsls r4, r4, #1
add pc, r4
=> 1 more instruction in prologue. Jump table shrunk by a factor of 2.
So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)
llvm-svn: 285690
Generate the slowest possible codepath for noopt CodeGen. Even trying to be
clever with the negated jump can cause out-of-range jumps. Use a wide branch
instead. Although the code is modelled simplistically, the later optimizations
would recombine the branching into `cbz` if possible. This re-enables the
previous optimization as well as hopefully gives us working code in all cases.
Addresses PR30356!
llvm-svn: 285649
Summary:
This has been replaced by the NVPTXInferAddressSpaces pass. We've had
the new one as the default with the old one accessible via a flag for
some months now, and we've had no problems.
Reviewers: tra
Subscribers: llvm-commits, jholewinski, jingyue, mgorny
Differential Revision: https://reviews.llvm.org/D26165
llvm-svn: 285642
Introducing "k" and "Yk" constraints for extended inline assembly, enabling use of AVX512 masked vectorized instructions.
Commit on behalf of mharoush
Extending inline assembly support, compatible with GCC as folowing:
"k" constraint hints the compiler to select any of AVX512 k0-k7 registers.
"Yk" constraint is a subset of "k" excluding k0 which is not allowd to be used as a mask.
Reviewer: 1. rnk
Differential Revision: https://reviews.llvm.org/D25062
llvm-svn: 285591
Fixes Bug 30808.
Note that passing subtarget information to predicates seems too complicated, so gfx8-specific def smrd_offset_20 introduced.
Old gfx6/7-specific def renamed to smrd_offset_8 for clarity.
Lit tests updated.
Differential Revision: https://reviews.llvm.org/D26085
llvm-svn: 285590
This patch implements two changes:
- Move processor feature definition into a new file SystemZFeatures.td,
and provide explicit lists of supported and unsupported features for
each level of the z/Architecture. This allows specifying unsupported
features in the scheduler definition files for each processor.
- Add optional aliases for the -mcpu processor names according to the
level of the z/Architecture, for compatibility with other compilers
on the platform. The supported aliases are:
-mcpu=arch8 equals -mcpu=z10
-mcpu=arch9 equals -mcpu=z196
-mcpu=arch10 equals -mcpu=zEC12
-mcpu=arch11 equals -mcpu=z13
llvm-svn: 285577
The LEFR/LFER pseudos are aliases for vector instructions and should
therefore be guared by FeatureVector. If they aren't, the TableGen
scheduler definition checking might complain that there is no data
for those pseudos for pre-z13 machines.
No functional change intended.
llvm-svn: 285576
Currently, when using an instruction that is not supported on the
currently selected architecture, the LLVM assembler is likely to
diagnose an "invalid operand" instead of a "missing feature".
This is because many operands require a custom parser in order to
be processed correctly, and if an instruction is not available
according to the current feature set, the generated parser code
will also not detect the associated custom operand parsers.
Fixed by temporarily enabling all features while parsing operands.
The missing features will then be correctly detected when actually
parsing the instruction itself.
llvm-svn: 285575
LLVM currently treats the first operand of MVCK as if it were a
regular base+index+displacement address. However, it is in fact
a base+displacement combined with a length register field.
While the two might look syntactically similar, there are two
semantic differences:
- %r0 is a valid length register, even though it cannot be used
as an index register.
- In an expression with just a single register like 0(%rX), the
register is treated as base with normal addresses, while it is
treated as the length register (with an empty base) for MVCK.
Fixed by adding a new operand parser class BDRAddr and reworking
the assembler parser to distinguish between address + length
register operands and regular addresses.
llvm-svn: 285574
There is a bug describing poor cost model for floating point operations:
Bug 29083 - [X86][SSE] Improve costs for floating point operations. This
patch is the second one in series of patches dealing with cost model.
Differential Revision: https://reviews.llvm.org/D25722
llvm-svn: 285564
This removes a couple tablegen classes that become unused after this change. Another class gained an additional parameter to allow PMADDUBSW to specify a different result type from its input type.
llvm-svn: 285515
- Fix doxygen file comment
- reduce indentation in loop
- Factor out some common subexpressions
- Move independent helper function out of class
- Fix Changed flag (this is not strictly NFC but a bugfix, but the flag
seems ignored anyway)
llvm-svn: 285488
Summary:
Flat instruction can return out of order, so we need always need to wait
for all the outstanding flat operations.
Reviewers: tony-tye, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl
Differential Revision: https://reviews.llvm.org/D25998
llvm-svn: 285479
Also add glc bit to the scalar loads since they exist on VI
and change the caching behavior.
This currently has an assembler bug where the glc bit is incorrectly
accepted on SI/CI which do not have it.
llvm-svn: 285463
While trying to add the glc bit to SMEM instructions on VI
with the new refactoring I ran into some kind of shadowing
problem for the glc operand when using the pseudoinstruction
as a multiclass parameter.
Everywhere that currently uses it defines the operand to have the same
name as its type, i.e. glc:$glc which works. For some reason now it
conflicts, and its up evaluating to the wrong thing. For the
real encoding classes,
let Inst{16} = !if(ps.has_glc, glc, ?); was not being evaluated
and still visible in the Inst initializer in the expanded td file.
In other cases I got a a different error about an illegal operand
where this was using { 0 } initializer from the bits<1> glc initializer
instead of evaluating it as false in the if.
For consistency all of the operand types should probably
be captialized to avoid conflicting with the variable names
unless somebody has a better idea of how to fix this.
llvm-svn: 285462
Summary:
In isel, transform
Num % Den
into
Num - (Num / Den) * Den
if the result of Num / Den is already available.
Reviewers: tra
Subscribers: hfinkel, llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D26090
llvm-svn: 285461
It's possible to have a use of the private resource descriptor or
scratch wave offset registers even though there are no allocated
stack objects. This would result in continuing to use the maximum
number reserved registers. This could go over the number of SGPRs
available on VI, or violate the SGPR limit requested by
the function attributes.
llvm-svn: 285435
Do not use LiveIntervals to recalculate kills, because that cannot be
done accurately without implicit uses on predicated instructions.
llvm-svn: 285409
The Windows ARM target expects the compiler to emit a division-by-zero check.
The check would use the form of:
cmp r?, #0
cbz .Ltrap
b .Lbody
.Lbody:
...
.Ltrap:
udf #249 @ __brkdiv0
This works great most of the time. However, if the body of the function is
greater than 127 bytes, the branch target limitation of cbz becomes an issue.
This occurs in the unoptimized code generation cases sometimes (like in
compiler-rt).
Since this is a matter of correctness, possibly pay a small penalty instead. We
now form this slightly differently:
cbnz .Lbody
udf #249 @ __brkdiv0
.Lbody:
...
The positive case is through the branch instead of being the next instruction.
However, because of the basic block layout, the negated branch is going to be
a short distance always (2 bytes away, after the inserted __brkdiv0).
The new t__brkdiv0 instruction is required to explicitly mark the instruction as
a terminator as the generic UDF instruction is not a terminator.
Addresses PR30532!
llvm-svn: 285312
r282428 added the MipsOptimizePICCall as an opt-in pass that can be
skipped when using the -opt-bisect-limit option. However, this pass is
needed because it generates code that conforms to the o32 ABI
specification by using the $t9 register for PIC calls with JALR
instructions.
This bug was exposed by the fact that skipFunction() also checks for
the "optnone" attribute. This caused functions with that attribute to
break the requirements of the o32 ABI.
llvm-svn: 285305
With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq).
Updated cost table accordingly.
Differential Revision: https://reviews.llvm.org/D26011
llvm-svn: 285304
UMAAL is a DSP instruction and it is not available on thumbv7m
(Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong
CHECK prefix in longMAC.ll test.
Patch by Vadzim Dambrouski.
Differential Revision: https://reviews.llvm.org/D25890
llvm-svn: 285278
Summary:
When finding a match for a merge and collecting the instructions that must
be moved, keep in mind that the instruction we merge might actually use one
of the defs that are being moved.
Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect].
The fact that the ds_read in the test case is not eliminated suggests that
there might be another problem related to alias analysis, but that's a
separate problem: this pass should still work correctly even when earlier
optimization passes missed something or were disabled.
Reviewers: tstellarAMD, arsenm
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25829
llvm-svn: 285273
This patch corresponds to review:
https://reviews.llvm.org/D25896
It just eliminates the redundant ZExt after a count trailing zeros instruction.
llvm-svn: 285267
Since Exynos-M2 improved the FP square root unit a bit over the one in
Exynos-M1, it does not benefit from using the Newton series for such
operations.
llvm-svn: 285246
It would be a very nice invariant to rely on, but unfortunately it doesn't
necessarily hold (and the causes of mis-sorted reglists appear to be quite
varied) so to be robust the frame lowering code can't assume that the first
register in the list is also the first one that actually gets pushed.
Should fix an issue where we were turning something like:
push {r8, r4, r7, lr}
sub sp, #24
into nonsense like:
push {r2, r3, r4, r5, r6, r7, r8, r4, r7, lr}
llvm-svn: 285232
Update the README.txt with newer information, add a link to the Emscripten
page explaining the current easiest way to use the LLVM wasm backend, and
mention that other ways of using the LLVM wasm backend are in development.
llvm-svn: 285215
Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend.
Refactor processor definition to use ISA version features.
Fixed ISA version for stoney.
Based on Laurent Morichetti's patch.
Differential Revision: https://reviews.llvm.org/D25919
llvm-svn: 285210
Summary:
In the case where of 'select i1 , f32, f32' or select i1, f64, f64 prefer lowering to masked-moves over branches.
Fixes pr30561
Reviewers: igorb, aymanmus, delena
Differential Revision: https://reviews.llvm.org/D25310
llvm-svn: 285196
Summary: Clang's intrinsic header currently tries to negate the third operand of a vfmadd mask3 in order to create vfmsub, but this fails isel. This patch adds scalar vfmsub and vfnmsub mask3 that we can use instead to avoid the negate. This is consistent with the packed instructions.
Reviewers: igorb, delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25933
llvm-svn: 285173
On SparcV8, it was previously the case that a variable-sized alloca
might overlap by 4-bytes the last fixed stack variable, effectively
because 92 (the number of bytes reserved for the register spill area) !=
96 (the offset added to SP for where to start a DYNAMIC_STACKALLOC).
It's not as simple as changing 96 to 92, because variables that should
be 8-byte aligned would then be misaligned.
For now, simply increase the allocation size by 8 bytes for each dynamic
allocation -- wastes space, but at least doesn't overlap. As the large
comment says, doing this more efficiently will require larger changes in
llvm.
Also adds some test cases showing that we continue to not support
dynamic stack allocation and over-alignment in the same function.
llvm-svn: 285131
It is not safe to use LOAD ON CONDITION to implement access to a memory
location marked "volatile", since the architecture leaves it unspecified
whether or not an access happens if the condition is false.
The current code already appears to care about that:
def LOC : CondUnaryRSY<"loc", 0xEBF2, nonvolatile_load, GR32, 4>;
Unfortunately, that "nonvolatile_load" operator is simply ignored
by the CondUnaryRSY class, and there was no test to catch it.
llvm-svn: 285077
We already have (V)PMOVZX* combining support, this is the beginning of handling (V)PMOVSX* similarly - other combines in combineVSZext can be generalized in future patches.
This unearthed an interesting bug in that we were generating illegal build vectors on 32-bit targets - it was proving difficult to create a test for it from PMOVZX, but it fired immediately with PMOVSX. I've created a more general form of the existing getConstVector to handle these cases - ideally this should be handled in non-target-specific code but I couldn't find an equivalent.
Differential Revision: https://reviews.llvm.org/D25874
llvm-svn: 285072
Summary: The one tricky thing about this is that the sign/zero_extend_inreg uses v64i8 as an input type which isn't legal without BWI support. Though the vpmovsxbq and vpmovzxbq instructions themselves don't require BWI. To support this we need to add custom lowering for ZERO_EXTEND_VECTOR_INREG with v64i8 input. This can mostly reuse the existing sign extend code with a couple checks for sign extend vs zero extend added.
Reviewers: delena, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25594
llvm-svn: 285053
This is a function to go backwards in a block to find the first
instruction in a bundle, so iterator is a more natural choice for
parameter/return rather than a reference to a MachineInstruction.
llvm-svn: 285051
This changes locals from being declared by the emitLocal hook in
WebAssemblyTargetStreamer, rather than with an instruction. After exploring
the infastructure in LLVM more, this seems to make more sense since
declaring locals doesn't use an encoded opcode.
This also adds more 0xd opcodes, type encodings, and miscellaneous
binary encoding bits.
llvm-svn: 285040
Passing a MachineFunction as argument is more natural and avoids an
unnecessary round-trip through the logic determining the correct
Subtarget because MachineFunction already has a reference anyway.
llvm-svn: 285039
(Const)?MIOperands is equivalent to the C++ style
MachineInstr::mop_iterator. Use the latter for consistency except for a
few callers of MIOperands::analyzePhysReg().
llvm-svn: 285029