Mark kernels that use certain features that require user
SGPRs to support with kernel attributes. We need to know
before instruction selection begins because it impacts
the kernel calling convention lowering.
For now this only detects the workitem intrinsics.
llvm-svn: 252323
For some reason VS_32 ends up factoring into the pressure heuristics
even though we should never see a virtual register with this class.
When SGPRs are reserved for register spilling, this for some reason
triggers reg-crit scheduling.
Setting isAllocatable = 0 may help with this since that seems to remove
it from the default implementation's generated table.
llvm-svn: 252321
Summary:
In this implementation, LiveIntervalAnalysis invents a few register
masks on basic block boundaries that preserve no registers. The nice
thing about this is that it prevents the prologue inserter from thinking
it needs to spill all XMM CSRs, because it doesn't see any explicit
physreg defs in the MI.
Reviewers: MatzeB, qcolombet, JosephTremoulet, majnemer
Subscribers: MatzeB, llvm-commits
Differential Revision: http://reviews.llvm.org/D14407
llvm-svn: 252318
The benefit from converting narrow loads into a wider load (r251438) could be
micro-architecturally dependent, as it assumes that a single load with two bitfield
extracts is cheaper than two narrow loads. Currently, this conversion is
enabled only in cortex-a57 on which performance benefits were verified.
llvm-svn: 252316
We now create the .eh_frame section early, just like every other special
section.
This means that the special flags are visible in code that explicitly
asks for ".eh_frame".
llvm-svn: 252313
Summary:
Without these patterns we would generate a complete LL/SC sequence.
This would be problematic for memory regions marked as WRITE-only or
READ-only, as the instructions LL/SC would read/write to the protected
memory regions correspondingly.
Reviewers: dsanders
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D14397
llvm-svn: 252293
Windows EH funclets need to always return to a single parent funclet. However, it is possible for earlier optimizations to combine funclets (probably based on one funclet having an unreachable terminator) in such a way that this condition is violated.
These changes add code to the WinEHPrepare pass to detect situations where a funclet has multiple parents and clone such funclets, fixing up the unwind and catch return edges so that each copy of the funclet returns to the correct parent funclet.
Differential Revision: http://reviews.llvm.org/D13274?id=39098
llvm-svn: 252249
Previously, subprograms contained a metadata reference to the function they
described. Because most clients need to get or set a subprogram for a given
function rather than the other way around, this created unneeded inefficiency.
For example, many passes needed to call the function llvm::makeSubprogramMap()
to build a mapping from functions to subprograms, and the IR linker needed to
fix up function references in a way that caused quadratic complexity in the IR
linking phase of LTO.
This change reverses the direction of the edge by storing the subprogram as
function-level metadata and removing DISubprogram's function field.
Since this is an IR change, a bitcode upgrade has been provided.
Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is
attached to the PR.
Differential Revision: http://reviews.llvm.org/D14265
llvm-svn: 252219
We already had a test for this for 32-bit SEH catchpads, but those don't
actually create funclets. We had a bug that only appeared in funclet
prologues, where we would establish EBP and ESI as our FP and BP, and
then downstream prologue code would overwrite them.
While I was at it, I fixed Win64+funclets+stackrealign. This issue
doesn't come up as often there due to the ABI requring 16 byte stack
alignment, but now we can rest easy that AVX and WinEH will work well
together =P.
llvm-svn: 252210
This fixes the issue of wrong CFA calculation in the following case:
0x08048400 <+0>: push %ebx
0x08048401 <+1>: sub $0x8,%esp
0x08048404 <+4>: **call 0x8048409 <test+9>**
0x08048409 <+9>: **pop %eax**
0x0804840a <+10>: add $0x1bf7,%eax
0x08048410 <+16>: mov %eax,%ebx
0x08048412 <+18>: call 0x80483f0 <bar>
0x08048417 <+23>: add $0x8,%esp
0x0804841a <+26>: pop %ebx
0x0804841b <+27>: ret
The highlighted instructions are a product of movpc instruction. The call
instruction changes the stack pointer, and pop instruction restores its
value. However, the rule for computing CFA is not updated and is wrong on
the pop instruction. So, e.g. backtrace in gdb does not work when on the pop
instruction. This adds cfi instructions for both call and pop instructions.
cfi_adjust_cfa_offset** instruction is used with the appropriate offset for
setting the rules to calculate CFA correctly.
Patch by Violeta Vukobrat.
Differential Revision: http://reviews.llvm.org/D14021
llvm-svn: 252176
The operand layout is slightly different for the atomic
opcodes from the usual MUBUF loads and stores.
This should only fix it on SI/CI. VI is still broken
because it still emits the addr64 replacement.
llvm-svn: 252140
Summary:
The CLR's personality routine passes the pointer to the establisher frame
in RCX, not RDX.
Reviewers: pgavlin, majnemer, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14343
llvm-svn: 252135
Win64 has some strict requirements for the epilogue. As a result, we disable
shrink-wrapping for Win64 unless the block that gets the epilogue is already an
exit block.
Fixes PR24193.
llvm-svn: 252088
This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction.
The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening.
This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done.
It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted.
Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll
Differential Revision: http://reviews.llvm.org/D13988
llvm-svn: 252074
Patch by Slava Klochkov
The key difference between FMA* and FMA*_Int opcodes is that FMA*_Int opcodes are handled more conservatively. It is illegal to commute the 1st operand of FMA*_Int instructions as the upper bits of scalar FMA intrinsic result must be taken from the 1st operand, but such commute transformation would change those upper bits and invalidate the intrinsic's result.
Reviewers: Quentin Colombet, Elena Demikhovsky
Differential Revision: http://reviews.llvm.org/D13710
llvm-svn: 252060
If we have a CMOV, OR and AND combination such as:
if (x & CN)
y |= CM;
And:
* CN is a single bit;
* All bits covered by CM are known zero in y;
Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction).
llvm-svn: 252057
The x86 "sitofp i64 to double" dag combine, in 32-bit mode, lowers sitofp
directly to X86ISD::FILD (or FILD_FLAG). This should not be done in soft-float mode.
llvm-svn: 252042
There is no point in having invoke safepoints handled differently than the
call safepoints. All relevant decisions could be made by looking at whether
or not gc.result and gc.relocate lay in a same basic block. This change will
allow to lower call safepoints with relocates and results in a different
basic blocks. See test case for example.
Differential Revision: http://reviews.llvm.org/D14158
llvm-svn: 252028
Summary:
Add support for wasm's select operator, and lower LLVM's select DAG node
to it.
Reviewers: sunfish
Subscribers: dschuff, llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D14295
llvm-svn: 252002
XOP has the VPCMOV instruction that performs the common vector bit select operation OR( AND( SRC1, SRC3 ), AND( SRC2, ~SRC3 ) )
This patch adds tablegen pattern matching for this instruction.
Differential Revision: http://reviews.llvm.org/D8841
llvm-svn: 251975
When push instructions are being used to pass function arguments on
the stack, and either EH or debugging are enabled, we need to generate
.cfi_adjust_cfa_offset directives appropriately. For (synch) EH, it is
enough for the CFA offset to be correct at every call site, while
for debugging we want to be correct after every push.
Darwin does not support this well, so don't use pushes whenever it
would be required.
Differential Revision: http://reviews.llvm.org/D13767
llvm-svn: 251904
This was causing a variety of test failures when v2i64
is added as a legal type.
SIFixSGPRCopies should correctly handle the case of vector inputs
to a scalar reg_sequence, so this isn't necessary anymore. This
was hiding some deficiencies in how reg_sequence is handled later,
but this shouldn't be a problem anymore since the register class
copy of a reg_sequence is now done before the reg_sequence.
llvm-svn: 251860
I've found myself pointlessly debugging problems from running
graphics tests with an HSA triple a few times, so stop this from
happening again.
llvm-svn: 251858
In the current BB placement algorithm, a loop chain always contains all loop blocks. This has a drawback that cold blocks in the loop may be inserted on a hot function path, hence increasing branch cost and also reducing icache locality.
Consider a simple example shown below:
A
|
B⇆C
|
D
When B->C is quite cold, the best BB-layout should be A,B,D,C. But the current implementation produces A,C,B,D.
This patch filters those cold blocks off from the loop chain by comparing the ratio:
LoopBBFreq / LoopFreq
to 20%: if it is less than 20%, we don't include this BB to the loop chain. Here LoopFreq is the frequency of the loop when we reduce the loop into a single node. In general we have more cold blocks when the loop has few iterations. And vice versa.
Differential revision: http://reviews.llvm.org/D11662
llvm-svn: 251833
1) PR25154. This is basically a repeat of PR18102, which was fixed in
r200201, and broken again by r234430. The latter changed which of the
store nodes was merged into from the first to the last. Thus, we now
also need to prefer merging a later store at a given address into the
target node, instead of an earlier one.
2) While investigating that, I also realized I'd introduced a bug in
r236850. There, I removed a check for alignment -- not realizing that
nothing except the alignment check was ensuring that none of the stores
were overlapping! This is a really bogus way to ensure there's no
aliased stores.
A better solution to both of these issues is likely to always use the
code added in the 'if (UseAA)' branches which rearrange the chain based
on a more principled analysis. I'll look into whether that can be used
always, but in the interest of getting things back to working, I think a
minimal change makes sense.
llvm-svn: 251816
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. It turns out that the new code path taken due to
legalizing a scalar_to_vector of i64 -> v2i64 exposes a missing check in a
micro optimization to change a load followed by a scalar_to_vector into a
load and splat instruction on PPC.
llvm-svn: 251798
Optimized <8 x i32> to <8 x i16>
<4 x i64> to < 4 x i32>
<16 x i16> to <16 x i8>
All these oprtrations use now AVX512F set (KNL). Before this change it was implemented with AVX2 set.
Differential Revision: http://reviews.llvm.org/D14108
llvm-svn: 251764
Summary:
This reverts commit 79c37e1a4ff1e634da8f95322f080601b4c815fc.
This test passes locally but fails on the community buildbot. So we will let it
XFAIL for now.
Patched by Mandeep Singh Grang (mgrang@codeaurora.org)
Reviewers: kparzysz, weimingz
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D14189
llvm-svn: 251664
This patch generalizes the zeroing of vector elements with the BLEND instructions. Currently a zero vector will only blend if the shuffled elements are correctly inline, this patch recognises when a vector input is zero (or zeroable) and modifies a local copy of the shuffle mask to support a blend. As a zeroable vector input may not be all zeroes, the zeroable vector is regenerated if necessary.
Differential Revision: http://reviews.llvm.org/D14050
llvm-svn: 251659
Summary: Refer PR23377. This test was XFAIL'ed for Hexagon as well as ARM. But it has now started passing for ARM.
Reviewers: hans, rengolin, aemerson, kparzysz
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14155
llvm-svn: 251652
This was discovered to be necessary while running memchr-01.ll with
-verify-machinstrs, because it is not allowed to have a phys reg live
accross block boundaries while on SSA form, if the register is
allocatable (expect in entry block and landing pads).
In this test case, stringRRE pseudos are expanded after isel by adding
a loop block which produces a live out CC register. To make the test
pass, it was also necessary to not say that StringRRELoop pseudo uses
R0L, this is only true for the StringRRE opcode.
-verify-machineinstrs added to memchr-01.ll test.
New test case int-cmp-51.ll to test that MachineCSE can eliminate
an identical compare (which it couldn't do before).
Reviewed by Ulrich Weigand
llvm-svn: 251634
Summary:
This commit resolves wrong opcodes for ll and sc instructions for r6 architecutres, which were generated in method MipsTargetLowering::emitAtomicBinary.
Author: Jelena.Losic
Reviewers: dsanders
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D13593
llvm-svn: 251629
Summary:
The microMIPS register class GPRMM16 does not contain the $zero register.
However, MipsSEDAGToDAGISel::replaceUsesWithZeroReg() would replace uses
of the $dst register:
[d]addiu, $dst, $zero, 0
with the $zero register, without checking for membership in the register
class of the target machine operand.
Reviewers: dsanders
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D13984
llvm-svn: 251622
Since the verifier will give false reports if it incorrectly thinks MI is
loading or storing using an FI, it is necessary to scan memoperands and
find out how the FI is used in the instruction. This should be relatively
rare.
Needed to make CodeGen/SystemZ/spill-01.ll pass, which now runs with this flag.
Reviewed by Quentin Colombet.
llvm-svn: 251620
Summary:
Conversion opcode name format should be f64.convert_u/i64 not f64_convert_u
Author: s3ththompson
Reviewers: jfb
Subscribers: sunfish, jfb, llvm-commits, dschuff
Differential Revision: http://reviews.llvm.org/D14160
llvm-svn: 251613
We cannot form ctr-based loops around function calls, including calls to
__tls_get_addr used for PIC TLS variables. References to such TLS variables,
however, might be buried within constant expressions, and so we need to search
the entire constant expression to be sure that no references to such TLS
variables exist.
Fixes PR25256, reported by Eric Schweitz. This is a slightly-modified version
of the patch suggested by Eric in the bug report, and a test case I created.
llvm-svn: 251582
As a follow-up to r251566, do the same for the other optionally-supported
register classes (mostly for vector registers). Don't return an unavailable
register class (which would cause an assert later), but fail cleanly when
provided an unsupported inline asm constraint.
llvm-svn: 251575
At the LLVM level this ABI is essentially a minimal modification of AAPCS to
support 16-byte alignment for vector types and the stack.
llvm-svn: 251570
cntlz is the old POWER mnemonic. cntlzw is the PowerPC mnemonic.
This change fixes an issue when -no-integrated-as: The opcode cntlz is
unrecognized by gas
Alias the POWER mnemonic cntlz[.] to the PowerPC mnemonic cntlzw[.]
This is done for because the POWER cntlz mnemonic has be used by LLVM for
a very long time. We need to make sure that assembly programs
that are using the cntlz[.] do not break with this change.
Change PowerPC tests to reflect the insn change from cntlz to cntlzw.
Add assembly test to verify cntlz[.] is encoded correctly.
Patch by Tom Rix!
llvm-svn: 251489
Summary:
Don't call `computeKnownBitsFromRangeMetadata` for extended loads --
this can cause a mismatch between the width of the !range metadata and
the width of the APInt's accumulating `KnownZero` (and `KnownOne` in the
future). This isn't a problem now, but will be after a future change.
Note: this can be made more aggressive in the future.
Reviewers: nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14107
llvm-svn: 251486
This is a usage of the IR-level fast-math-flags now that they are propagated to SDNodes.
This was originally part of D8900.
Removing the global 'enable-unsafe-fp-math' checks will require auto-upgrade and
possibly other changes.
Differential Revision: http://reviews.llvm.org/D9708
llvm-svn: 251450
This recommits r250719, which caused a failure in SPEC2000.gcc
because of the incorrect insert point for the new wider load.
Convert two halfword loads into a single 32-bit word load with bitfield extract
instructions. For example :
ldrh w0, [x2]
ldrh w1, [x2, #2]
becomes
ldr w0, [x2]
ubfx w1, w0, #16, #16
and w0, w0, #ffff
llvm-svn: 251438
When optimization is disabled, edge weights that are stored in MBB won't be used so that we don't have to store them. Currently, this is done by adding successors with default weight 0, and if all successors have default weights, the weight list will be empty. But that the weight list is empty doesn't mean disabled optimization (as is stated several times in MachineBasicBlock.cpp): it may also mean all successors just have default weights.
We should discourage using default weights when adding successors, because it is very easy for users to forget update the correct edge weights instead of using default ones (one exception is that the MBB only has one successor). In order to detect such usages, it is better to differentiate using default weights from the case when optimizations is disabled.
In this patch, a new interface addSuccessorWithoutWeight(MBB*) is created for when optimization is disabled. In this case, MBB will try to maintain an empty weight list, but it cannot guarantee this as for many uses of addSuccessor() whether optimization is disabled or not is not checked. But it can guarantee that if optimization is enabled, then the weight list always has the same size of the successor list.
Differential revision: http://reviews.llvm.org/D13963
llvm-svn: 251429
convert float to half with mask/maskz for the reg to reg version and mask for the reg to mem version (there is no maskz version for reg to mem).
Differential Revision: http://reviews.llvm.org/D14113
llvm-svn: 251409
Summary: After D13851 landed, we saw backend crashes when compiling the reduced test case included in this patch. The right fix seems to be to allow these vector types for expansion in instruction selection.
Reviewers: rengolin, t.p.northover
Subscribers: RKSimon, t.p.northover, aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14082
llvm-svn: 251401
GNU tools require elfiamcu to take up the entire OS field, so, e.g.
i?86-*-linux-elfiamcu is not considered a legal triple.
Make us compatible.
Differential Revision: http://reviews.llvm.org/D14081
llvm-svn: 251390
When taking the remainder of a value divided by a constant, visitREM()
attempts to convert the REM to a longer but faster sequence of instructions.
This conversion calls combine() on a speculative DIV instruction. Commit
rL250825 may cause this combine() to return a DIVREM, corrupting nearby nodes.
Flow eventually hits unreachable().
This patch adds a test case and a check to prevent visitREM() from trying
to convert the REM instruction in cases where a DIVREM is possible.
See http://reviews.llvm.org/D14035
llvm-svn: 251373
Both VLDRS and VLDRD fault if the memory is not 4 byte aligned, which wasn't
really being checked before, leading to faults at runtime.
llvm-svn: 251352
In PIC mode we were previously computing global variable addresses (or GOT
entry addresses) by adding the PC, the PC-relative GOT displacement and
the GOT-relative symbol/GOT entry displacement. Because the latter two
displacements are fixed, we ended up performing one more addition than
necessary.
This change causes us to compute addresses using a single PC-relative
displacement, resulting in a shorter code sequence. This reduces code size
by about 4% in a recent build of Chromium for Android.
As a result of this change we no longer need to compute the GOT base address
in the ARM backend, which allows us to remove the Global Base Reg pass and
SDAG lowering for the GOT.
We also now no longer use the GOT when addressing a symbol which is known
to be defined in the same linkage unit. Specifically, the symbol must have
either hidden visibility or a strong definition in the current module in
order to not use the the GOT.
This is a change from the previous behaviour where we would use the GOT to
address externally visible symbols defined in the same module. I think the
only cases where this could matter are cases involving symbol interposition,
but we don't really support that well anyway.
Differential Revision: http://reviews.llvm.org/D13650
llvm-svn: 251322
Instead of XFAIL-ing the tests with the wrong usage of the "interrupt"
attribute, we should check that we emit the correct error messages to
the user.
llvm-svn: 251295
Summary:
This patch adds support for using the "interrupt" attribute on Mips
for interrupt handling functions. At this time only mips32r2+ with the
o32 ABI with the static relocation model is supported. Unsupported
configurations will be rejected
Patch by Simon Dardis (+ clang-format & some trivial changes to follow the
LLVM coding standards by me).
Reviewers: mpf, dsanders
Subscribers: dsanders, vkalintiris, llvm-commits
Differential Revision: http://reviews.llvm.org/D10768
llvm-svn: 251286
When the target does not support these intrinsics they should be converted to a chain of scalar load or store operations.
If the mask is not constant, the scalarizer will build a chain of conditional basic blocks.
I added isLegalMaskedGather() isLegalMaskedScatter() APIs.
Differential Revision: http://reviews.llvm.org/D13722
llvm-svn: 251237
When using the MCU psABI, compiler-generated library calls should pass
some parameters in-register. However, since inreg marking for x86 is currently
done by the front end, it will not be applied to backend-generated calls.
This is a workaround for PR3997, which describes a similar issue for -mregparm.
Differential Revision: http://reviews.llvm.org/D13977
llvm-svn: 251223
We don't need a mask of a rotation result to be a constant splat - any constant scalar/vector can be usefully folded.
Followup to D13851.
llvm-svn: 251197
This patch adds support for lowering to the XOP VPROT / VPROTI vector bit rotation instructions.
This has required changes to the DAGCombiner rotation pattern matching to support vector types - so far I've only changed it to support splat vectors, but generalising this further is feasible in the future.
Differential Revision: http://reviews.llvm.org/D13851
llvm-svn: 251188
Summary:
The logic here isn't straightforward because our support for
TargetOptions::GuaranteedTailCallOpt.
Also fix a bug where we were allowing tail calls to cdecl functions from
fastcall and vectorcall functions. We were special casing thiscall and
stdcall callers rather than checking for any convention that requires
clearing stack arguments before returning.
Reviewers: hans
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14024
llvm-svn: 251137
Summary:
When ARMFrameLowering::emitPopInst generates a "pop" instruction to restore the callee saved registers, it checks if the LR register is among them. If so, the function may decide to remove the basic block's terminator and replace it with a "pop" to the PC register instead of LR.
This leads to a problem when the block's terminator is preceded by a "llvm.debugtrap" call. The MI iterator points to the trap in such a case, which is also a terminator. If the function decides to restore LR to PC, it erroneously removes the trap.
Reviewers: asl, rengolin
Subscribers: aemerson, jfb, rengolin, dschuff, llvm-commits
Differential Revision: http://reviews.llvm.org/D13672
llvm-svn: 251123
Summary:
This ensures that BranchFolding (and similar) won't remove these blocks.
Also allow AsmPrinter::EmitBasicBlockStart to process MBBs which are
address-taken but do not have BBs that are address-taken, since otherwise
its call to getAddrLabelSymbolTableToEmit would fail an assertion on such
blocks. I audited the other callers of getAddrLabelSymbolTableToEmit
(and getAddrLabelSymbol); they all have BBs known to be address-taken
except for the call through getAddrLabelSymbol from
WinException::create32bitRef; that call is actually now unreachable, so
I've removed it and updated the signature of create32bitRef.
This fixes PR25168.
Reviewers: majnemer, andrew.w.kaylor, rnk
Subscribers: pgavlin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13774
llvm-svn: 251113
When we fold "mul ((add x, c1), c1)" -> "add ((mul x, c2), c1*c2)", we bail if (add x, c1) has multiple
users which would result in an extra add instruction.
In such cases, this patch adds a check to see if we can eliminate a multiply instruction in exchange for the extra add.
I also added the capability of doing the existing optimization with non-splatted vectors (splatted also works).
Differential Revision: http://reviews.llvm.org/D13740
llvm-svn: 251028
Summary:
Hyphens were missing from the triple, causing it to be parsed
incorrectly. This patch updates the triple and makes necessary
changes to the expected output.
Patch is from Vinicius Tinti.
Reviewers: ab, tinti
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D13792
llvm-svn: 251020
These were the cause of a verifier error when building 7zip with
-verify-machineinstrs. Running 'make check' with the verifier
triggered the same error on the test here so i've updated the test
to run the verifier on one of its runs instead of adding a new one.
While looking at this code, there was a stale comment that these
instructions were only used for disassembly. This probably used to
be the case, but they are now used in the 'ARM load / store optimization pass' too.
This reapplies r242300 which was reverted in r242428 due to bot failures.
Ultimately those failures were spurious and completely unrelated to this commit. I reverted this
at the time because it was thought to be at fault.
llvm-svn: 250969
There may be other use operands that also need their kill flags cleared.
This happens in a few tests when SIFoldOperands is moved after
PeepholeOptimizer.
PeepholeOptimizer rewrites cases that look like:
%vreg0 = ...
%vreg1 = COPY %vreg0
use %vreg1<kill>
%vreg2 = COPY %vreg0
use %vreg2<kill>
to use the earlier source to
%vreg0 = ...
use %vreg0
use %vreg0
Currently SIFoldOperands sees the copied registers, so there is
only one use. So far I haven't managed to come up with a test
that currently has multiple uses of a foldable VGPR -> VGPR copy.
llvm-svn: 250960
Summary:
Previously, we were inserting an InlineAsm statement for each line of the
inline assembly. This works for GAS but it triggers prologue/epilogue
emission when IAS is in use. This caused:
.set noreorder
.cpload $25
to be emitted as:
.set push
.set reorder
.set noreorder
.set pop
.set push
.set reorder
.cpload $25
.set pop
which led to assembler errors and caused the test to fail.
The whitespace-after-comma changes included in this patch are necessary to
match the output when IAS is in use.
Reviewers: vkalintiris
Subscribers: rkotler, llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D13653
llvm-svn: 250895
When we have to convert the masked.load, masked.store to scalar code, we generate a chain of conditional basic blocks.
I added optimization for constant mask vector.
Differential Revision: http://reviews.llvm.org/D13855
llvm-svn: 250893
Summary:
The forwards compatibility strategy employed by MIPS is to consider registers
to be infinitely sign-extended. Then on ISA's with a wider register, the result
of existing instructions are sign-extended to register width and zero-extended
counterparts are added. copy_u.w on MSA32 and copy_u.w on MSA64 violate this
strategy and we have therefore corrected the MSA specs to fix this.
We still keep track of sign/zero-extension during legalization but we now
match copy_s.[wd] where required.
No change required to clang since __builtin_msa_copy_u_[wd] will map to
copy_s.[wd] where appropriate for the target.
Reviewers: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13472
llvm-svn: 250887
A mem-to-mem instruction (that both loads and stores), which store to an
FI, cannot pass the verifier since it thinks it is loading from the FI.
For the mem-to-mem instruction, do a looser check in visitMachineOperand()
and only check liveness at the reg-slot while analyzing a frame index operand.
Needed to make CodeGen/SystemZ/xor-01.ll pass with -verify-machineinstrs,
which now runs with this flag.
Reviewed by Evan Cheng and Quentin Colombet.
llvm-svn: 250885
Do not tail duplicate blocks where the successor has a phi node,
and the corresponding value in that phi node uses a subregister.
http://reviews.llvm.org/D13922
llvm-svn: 250877
C/C++ code can declare an extern function, which will show up as an import in WebAssembly's output. It's expected that the linker will resolve these, and mark unresolved imports as call_import (I have a patch which does this in wasmate).
llvm-svn: 250875
Summary:
TargetLoweringBase::Expand is defined as "Try to expand this to other ops,
otherwise use a libcall." For ISD::UDIV and ISD::SDIV, the choice between
the two possibilities was defined in a rather convoluted way:
- if DIVREM is legal, expand to DIVREM
- if DIVREM has a custom lowering, expand to DIVREM
- if DIVREM libcall is defined and a remainder from the same division is
computed elsewhere, expand to a DIVREM libcall
- else, expand to a DIV libcall
This had the undesirable effect that if both DIV and DIVREM are implemented
as libcalls, then ISD::UDIV and ISD::SDIV are expanded to the heavier DIVREM
libcall, even when the remainder isn't used.
The new code adds a new LegalizeAction, TargetLoweringBase::LibCall, so that
backends can directly control whether they prefer an expansion or a conversion
to a libcall. This makes the generic lowering code even more generic,
allowing its reuse in a wider range of target-specific configurations.
The useful effect is that ARM backend will now generate a call
to __aeabi_{i,u}div rather than __aeabi_{i,u}divmod in cases where
it doesn't need the remainder. There's no functional change outside
the ARM backend.
Reviewers: t.p.northover, rengolin
Subscribers: t.p.northover, llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D13862
llvm-svn: 250826
The mask value type for maskload/maskstore GCC builtins is never a vector of
packed floats/doubles.
This patch fixes the following issues:
1. The mask argument for builtin_ia32_maskloadpd and builtin_ia32_maskstorepd
should be of type llvm_v2i64_ty and not llvm_v2f64_ty.
2. The mask argument for builtin_ia32_maskloadpd256 and
builtin_ia32_maskstorepd256 should be of type llvm_v4i64_ty and not
llvm_v4f64_ty.
3. The mask argument for builtin_ia32_maskloadps and builtin_ia32_maskstoreps
should be of type llvm_v4i32_ty and not llvm_v4f32_ty.
4. The mask argument for builtin_ia32_maskloadps256 and
builtin_ia32_maskstoreps256 should be of type llvm_v8i32_ty and not
llvm_v8f32_ty.
Differential Revision: http://reviews.llvm.org/D13776
llvm-svn: 250817
This wasn't doing anything useful. They weren't explicitly used
anywhere, and the RegScavenger ignores reserved registers.
This for some reason caused a random scheduling change in the test.
Getting the check lines to pass is too frustrating, and there's probably
not too much value in checking the vector case's operands N times.
llvm-svn: 250794
Currently, in MachineBlockPlacement pass the loop is rotated to let the best exit to be the last BB in the loop chain, to maximize the fall-through from the loop to outside. With profile data, we can determine the cost in terms of missed fall through opportunities when rotating a loop chain and select the best rotation. Basically, there are three kinds of cost to consider for each rotation:
1. The possibly missed fall through edge (if it exists) from BB out of the loop to the loop header.
2. The possibly missed fall through edges (if they exist) from the loop exits to BB out of the loop.
3. The missed fall through edge (if it exists) from the last BB to the first BB in the loop chain.
Therefore, the cost for a given rotation is the sum of costs listed above. We select the best rotation with the smallest cost. This is only for PGO mode when we have more precise edge frequencies.
Differential revision: http://reviews.llvm.org/D10717
llvm-svn: 250754
Convert two halfword loads into a single 32-bit word load with bitfield extract
instructions. For example :
ldrh w0, [x2]
ldrh w1, [x2, #2]
becomes
ldr w0, [x2]
ubfx w1, w0, #16, #16
and w0, w0, #ffff
llvm-svn: 250719
Emit the CFI instructions after all code transformation have been done.
This will avoid any interference between CFI instructions and packetization.
llvm-svn: 250714
The mapping of these two intrinsics in ARMInstrInfo.td had a small
omission which lead to their operands not being validated/transformed
before being lowered into usat and ssat instructions. This can cause
incorrect instructions to be emitted.
I've also added tests for the remaining two saturating arithmatic
intrinsics @llvm.arm.qadd and @llvm.arm.qsub as they are missing
codegen tests.
llvm-svn: 250697
Add FastISel support for SSE4A scalar float / double non-temporal stores
Follow up to D13698
Differential Revision: http://reviews.llvm.org/D13773
llvm-svn: 250610