Emulated TLS is enabled by llc flag -emulated-tls,
which is passed by clang driver.
When llc is called explicitly or from other drivers like LTO,
missing -emulated-tls flag would generate wrong TLS code for targets
that supports only this mode.
Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether
emulated TLS code should be generated.
Unit tests are modified to run with and without the -emulated-tls flag.
Differential Revision: https://reviews.llvm.org/D42999
llvm-svn: 326341
Summary:
Expressions of the form x < 0 ? 0 : x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves
In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.
Patch by Martin Svanfeldt
Reviewers: fhahn, pbarrio, rogfer01
Reviewed By: pbarrio, rogfer01
Subscribers: chrib, yroux, eugenis, efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42574
llvm-svn: 326333
We were always setting the block alignment to 2 bytes in Thumb mode
and 4-bytes in ARM mode (r325754, and r325012), but this could cause
reducing the block alignment when it already had been aligned (e.g.
in Thumb mode when the block is a CPE that was already 4-byte aligned).
Patch by Momchil Velikov, I've only added a test.
Differential Revision: https://reviews.llvm.org/D43777
llvm-svn: 326232
Re-enable commit r323991 now that r325931 has been committed to make
MachineOperand::isRenamable() check more conservative w.r.t. code
changes and opt-in on a per-target basis.
llvm-svn: 326208
In r322867, we introduced IsStandalone when printing MIR in -debug
output. The default behaviour for that was:
1) If any of MBB, MI, or MO are -debug-printed separately, don't omit any
redundant information.
2) When -debug-printing a MF entirely, don't print any redundant
information.
3) When printing MIR, don't print any redundant information.
I'd like to change 2) to:
2) When -debug-printing a MF entirely, don't omit any redundant information.
Differential Revision: https://reviews.llvm.org/D43337
llvm-svn: 326094
This is a follow up of r325012, that allowed half types in constant pools.
Proper alignment was enforced when a big basic block was split up, but not when
a CPE was placed before/after a block; the successor block had the wrong
alignment.
Differential Revision: https://reviews.llvm.org/D43580
llvm-svn: 325754
Follow up of Clang commit r325351; this adds the LLVM tests, which
were also missing.
Differential Revision: https://reviews.llvm.org/D43395
llvm-svn: 325443
This reverts commit r323991.
This commit breaks target that don't model all the register constraints
in TableGen. So far the workaround was to set the
hasExtraXXXRegAllocReq, but it proves that it doesn't cover all the
cases.
For instance, when mutating an instruction (like in the lowering of
COPYs) the isRenamable flag is not properly updated. The same problem
will happen when attaching machine operand from one instruction to
another.
Geoff Berry is working on a fix in https://reviews.llvm.org/D43042.
llvm-svn: 325421
Enable multiple COPY hints to eliminate more COPYs during register allocation.
Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.
Review: Eli Friedman
llvm-svn: 325327
Summary:
Currently when expanding a SETCC node into a SELECT_CC, LLVM uses
an incorrect type for determining BooleanContent of the result. This
patch fixes the issue.
Fixes PR36079.
Reviewers: rogfer01, javed.absar, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43282
llvm-svn: 325325
This patch combines some cases of ARMISD::CMOV for integers that arise in comparisons of the form
a != b ? x : 0
a == b ? 0 : x
and that currently (e.g. in Thumb1) are emitted as branches.
Differential Revision: https://reviews.llvm.org/D34515
llvm-svn: 325323
Summary:
In LLVM, 't' selects a floating-point/SIMD register and only supports
32-bit values. This is appropriately documented in the LLVM Language
Reference Manual. However, this behaviour diverges from that of GCC, where
't' selects the s0-s31 registers and its qX and dX variants depending on
additional operand modifiers (q/P).
For example, the following C code:
#include <arm_neon.h>
float32x4_t a, b, x;
asm("vadd.f32 %0, %1, %2" : "=t" (x) : "t" (a), "t" (b))
results in the following assembly if compiled with GCC:
vadd.f32 s0, s0, s1
whereas LLVM will show "error: couldn't allocate output register for
constraint 't'", since a, b, x are 128-bit variables, not 32-bit.
This patch extends the use of 't' to mean that of GCC, thus allowing
selection of the lower Q vector regs and their D/S variants. For example,
the earlier code will now compile as:
vadd.f32 q0, q0, q1
This behaviour still differs from that of GCC but I think it is actually
more correct, since LLVM picks up the right register type based on the
datatype of x, while GCC would need an extra operand modifier to achieve
the same result, as follows:
asm("vadd.f32 %q0, %q1, %q2" : "=t" (x) : "t" (a), "t" (b))
Since this is only an extension of functionality, existing code should not
be affected by this change. Note that operand modifiers q/P are already
supported by LLVM, so this patch should suffice to support inline
assembly with constraint 't' originally built for GCC.
Reviewers: grosbach, rengolin
Reviewed By: rengolin
Subscribers: rogfer01, efriedma, olista01, aemerson, javed.absar, eraman, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42962
llvm-svn: 325244
Change ARMConstantIslandPass to:
- accept f16 literals as litpool entries,
- if the litpool needs to be inserted in the middle of a big block, then we
need to 4-byte align the next instruction in ARM mode.
Differential Revision: https://reviews.llvm.org/D42784
llvm-svn: 325012
If merging them, the dllexport attribute needs to be brought along
to the new GlobalAlias.
Differential Revision: https://reviews.llvm.org/D43192
llvm-svn: 324937
Add a common -trap-unreachable option, similar to the target
specific hexagon equivalent, which has been replaced. This
turns unreachable instructions into traps, which is useful for
debugging.
Differential Revision: https://reviews.llvm.org/D42965
llvm-svn: 324880
* Use uleb128 for code offsets in the LSDA call site table.
* Omit the TTBase offset if the type table is empty.
This change can reduce the size of the DWARF/Itanium LSDA by about half.
Patch by Ryan Prichard!
llvm-svn: 324750
Rely on the assembler to finalize the layout of the DWARF/Itanium
exception-handling LSDA. Rather than calculate the exact size of each
thing in the LSDA, use assembler directives:
To emit the offset to the TTBase label:
.uleb128 .Lttbase0-.Lttbaseref0
.Lttbaseref0:
To emit the size of the call site table:
.uleb128 .Lcst_end0-.Lcst_begin0
.Lcst_begin0:
... call site table entries ...
.Lcst_end0:
To align the type info table:
... action table ...
.balign 4
.long _ZTIi
.long _ZTIl
.Lttbase0:
Using assembler directives simplifies the compiler and allows switching
the encoding of offsets in the call site table from udata4 to uleb128 for
a large code size savings. (This commit does not change the encoding.)
The combination of the uleb128 followed by a balign creates an unfortunate
dependency cycle that the assembler must sometimes resolve either by
padding an LEB or by inserting zero padding before the type table. See
PR35809 or GNU as bug 4029.
Patch by Ryan Prichard!
llvm-svn: 324749
Instead of:
%bb.1: derived from LLVM BB %for.body
print:
bb.1.for.body:
Also use MIR syntax for MBB attributes like "align", "landing-pad", etc.
llvm-svn: 324563
This is a follow up of r324321, adding a match pattern for mov with a FP16
immediate (also fixing operand vfp_f16imm that wasn't even compiling).
Differential Revision: https://reviews.llvm.org/D42973
llvm-svn: 324456
Following up on the discussion from
http://lists.llvm.org/pipermail/llvm-dev/2017-April/112305.html, undef
values are now placed in the .bss as well as null values. This prevents
undef global values taking up potentially huge amounts of space in the
.data section.
The following two lines now both generate equivalent .bss data:
@vals1 = internal unnamed_addr global [20000000 x i32] zeroinitializer, align 4
@vals2 = internal unnamed_addr global [20000000 x i32] undef, align 4 ; previously unaccounted for
This is primarily motivated by the corresponding issue in the Rust
compiler (https://github.com/rust-lang/rust/issues/41315).
Differential Revision: https://reviews.llvm.org/D41705
Patch by varkor!
llvm-svn: 324424
See D42509 for the original version of this.
Basically, there are two significant changes to behavior here:
- addLiveOuts always adds all pristine registers (even if a block has
no successors).
- addLiveOuts and addLiveOutsNoPristines always add all callee-saved
registers for return blocks (including conditional return blocks).
I cleaned up the functions a bit to make it clear these properties hold.
Differential Revision: https://reviews.llvm.org/D42655
llvm-svn: 324422
This is a follow up of r324321, adding f16 <-> f32 and f16 <-> f64 conversion
match patterns.
Differential Revision: https://reviews.llvm.org/D42954
llvm-svn: 324360
This adds most of the FP16 codegen support, but these areas need further work:
- FP16 literals and immediates are not properly supported yet (e.g. literal
pool needs work),
- Instructions that are generated from intrinsics (e.g. vabs) haven't been
added.
This will be addressed in follow-up patches.
Differential Revision: https://reviews.llvm.org/D42849
llvm-svn: 324321
Example situation:
```
BB0:
%0 = ...
use %0
; ...
condjump BB1
jmp BB2
BB1:
%0 = ... ; rematerialized def from above (from earlier split step)
jmp BB2
BB2:
; ...
use %0
```
%0 will have a live interval with 3 value numbers (for the BB0, BB1 and
BB2 parts). Now SplitKit tries and succeeds in rematerializing the value
number in BB2 (This only works because it is a secondary split so
SplitKit is can trace this back to a single original def).
We need to recompute all live ranges affected by a value number that we
rematerialize. The case that we missed before is that when the value
that is rematerialized is at a join (Phi VNI) then we also have to
recompute liveness for the predecessor VNIs.
rdar://35699130
Differential Revision: https://reviews.llvm.org/D42667
llvm-svn: 324039
Summary:
This change extends MachineCopyPropagation to do COPY source forwarding
and adds an additional run of the pass to the default pass pipeline just
after register allocation.
This version of this patch uses the newly added
MachineOperand::isRenamable bit to avoid forwarding registers is such a
way as to violate constraints that aren't captured in the
Machine IR (e.g. ABI or ISA constraints).
This change is a continuation of the work started in D30751.
Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar
Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits
Differential Revision: https://reviews.llvm.org/D41835
llvm-svn: 323991
Commit r323512 introduced an optimisation in LowerReturn for half-precision
return values. A missing check caused a crash when the return value is "undef"
(i.e. a node that has no operands).
Differential Revision: https://reviews.llvm.org/D42743
llvm-svn: 323968
Discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html
In preparation for adding support for named vregs we are changing the sigil for
physical registers in MIR to '$' from '%'. This will prevent name clashes of
named physical register with named vregs.
llvm-svn: 323922
Summary:
Expressions of the form x < 0 ? 0 : x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves
In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.
Patch by Marten Svanfeldt.
Reviewers: fhahn, pbarrio
Reviewed By: pbarrio
Subscribers: efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42574
llvm-svn: 323869
Half-precision arguments and return values are passed as if it were an int or
float for ARM. This results in truncates and bitcasts to/from i16 and f16
values, which are legalized very early to stack stores/loads. When FullFP16 is
enabled, we want to avoid codegen for these bitcasts as it is unnecessary and
inefficient.
Differential Revision: https://reviews.llvm.org/D42580
llvm-svn: 323861
Legal if we have hardware support for floating point, libcalls
otherwise.
Also add the necessary support for libcalls in the legalizer helper.
llvm-svn: 323726
Summary:
Apparently, we missed on constraining register classes of VReg-operands of all the instructions
built from a destination pattern but the root (top-level) one. The issue exposed itself
while selecting G_FPTOSI for armv7: the corresponding pattern generates VTOSIZS wrapped
into COPY_TO_REGCLASS, so top-level COPY_TO_REGCLASS gets properly constrained,
while nested VTOSIZS (or rather its destination virtual register to be exact) does not.
Fixing this by issuing GIR_ConstrainSelectedInstOperands for every nested GIR_BuildMI.
https://bugs.llvm.org/show_bug.cgi?id=35965
rdar://problem/36886530
Patch by Roman Tereshin
Reviewers: dsanders, qcolombet, rovka, bogner, aditya_nandakumar, volkan
Reviewed By: dsanders, qcolombet, rovka
Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42565
llvm-svn: 323692
load instruction
The function `Thumb1InstrInfo::loadRegFromStackSlot` accepts only the `tGPR`
register class. The function serves to emit a `tLDRspi` instruction and
certainly any subset of the `tGPR` register class is a valid destination of the
load.
Differential revision: https://reviews.llvm.org/D42535
llvm-svn: 323514
This is the groundwork for Armv8.2-A FP16 code generation .
Clang passes and returns _Float16 values as floats, together with the required
bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318.
We will implement half-precision argument passing/returning lowering in the ARM
backend soon, but for now this means that this:
_Float16 sub(_Float16 a, _Float16 b) {
return a + b;
}
gets lowered to this:
define float @sub(float %a.coerce, float %b.coerce) {
entry:
%0 = bitcast float %a.coerce to i32
%tmp.0.extract.trunc = trunc i32 %0 to i16
%1 = bitcast i16 %tmp.0.extract.trunc to half
<SNIP>
%add = fadd half %1, %3
<SNIP>
}
When FullFP16 is *not* supported, we don't make f16 a legal type, and we get
legalization for "free", i.e. nothing changes and everything works as before.
And also f16 argument passing/returning is handled.
When FullFP16 is supported, we do make f16 a legal type, and have 2 places that
we need to patch up: f16 argument passing and returning, which involves minor
tweaks to avoid unnecessary code generation for some bitcasts.
As a "demonstrator" that this works for the different FP16, FullFP16, softfp
modes, etc., I've added match rules to the VSUB instruction description showing
that we can codegen this instruction from IR, but more importantly, also to
some conversion instructions. These conversions were causing issue before in
the FP16 and FullFP16 cases.
I've also added match rules to the VLDRH and VSTRH desriptions, so that we can
actually compile the entire half-precision sub code example above. This showed
that these loads and stores had the wrong addressing mode specified: AddrMode5
instead of AddrMode5FP16, which turned out not be implemented at all, so that
has also been added.
This is the minimal patch that shows all the different moving parts. In patch
2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the
remaining Armv8.2-A FP16 instruction descriptions.
Thanks to Sam Parker and Oliver Stannard for their help and reviews!
Differential Revision: https://reviews.llvm.org/D38315
llvm-svn: 323512
Summary: For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1.
Reviewers: samparker
Reviewed By: samparker
Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42401
llvm-svn: 323354
This matches what MSVC does for alloca() function calls on ARM.
Even if MSVC doesn't support VLAs at the language level, it does
support the alloca function.
On the clang level, both the _alloca() (when emulating MSVC, which is
what the alloca() function expands to) and __builtin_alloca() builtin
functions, and VLAs, map to the same LLVM IR "alloca" function - so
within LLVM they're not distinguishable from each other.
Differential Revision: https://reviews.llvm.org/D42292
llvm-svn: 323308
Merging such globals loses the dllexport attribute. Add a test
to check that normal globals still are merged.
Differential Revision: https://reviews.llvm.org/D42127
llvm-svn: 323307
https://reviews.llvm.org/D42402
A lot of these copies are useless (copies b/w VRegs having the same
regclass) and should be cleaned up.
llvm-svn: 323291
1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains).
2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings.
3. BreakFalseDeps - Breaks false dependencies.
4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks.
This also included the following changes to ExcecutionDepsFix original logic:
1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class.
2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class.
Additional changes in affected files:
1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate.
2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate.
Additional refactoring changes will follow.
This commit is (almost) NFC.
The only functional change is that now BreakFalseDeps will break dependency for all register classes.
Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet.
In a future commit several instructions (and tests) will be added.
This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.
Additional relevant reviews:
https://reviews.llvm.org/D40331https://reviews.llvm.org/D40332https://reviews.llvm.org/D40333https://reviews.llvm.org/D40334
Differential Revision: https://reviews.llvm.org/D40330
Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275
llvm-svn: 323087
Fix a performance regression caused by r322737.
While trying to make it easier to replace compares with existing adds and
subtracts, I accidentally stopped it from doing so in some cases. This should
fix that. I'm also fixing another potential bug in that commit.
Differential Revision: https://reviews.llvm.org/D42263
llvm-svn: 322972
Previously, these parts weren't ever checked. The label patterns
need to be extended to match successfully on macho.
Differential Revision: https://reviews.llvm.org/D42126
llvm-svn: 322900
r322086 removed the trailing information describing reg classes for each
register.
This patch adds printing reg classes next to every register when
individual operands/instructions/basic blocks are printed. In the case
of dumping MIR or printing a full function, by default don't print it.
Differential Revision: https://reviews.llvm.org/D42239
llvm-svn: 322867
Every known PE COFF target emits /EXPORT: linker flags into a .drective
section. The AsmPrinter should handle this.
While we're at it, use global_values() and emit each export flag with
its own .ascii directive. This should make the .s file output more
readable.
llvm-svn: 322788
The code wasn't zero-extending correctly, so the comparison could
spuriously fail.
Adds some AArch64 tests to cover this case.
Inspired by D41791.
Differential Revision: https://reviews.llvm.org/D41798
llvm-svn: 322767
It appears that we haven't been prioritizing rules that contain nested
instructions properly. InstructionOperandMatcher didn't override
isHigherPriorityThan so it never compared the instructions/operands/predicates
inside nested instructions.
Fixes PR35926. Thanks to Diana Picus for the bug report.
llvm-svn: 322754
This extends my previous patches to also optimize overflow-checked multiplies during SelectionDAG.
Differential revision: https://reviews.llvm.org/D40922
llvm-svn: 322738
The ARM backend contains code that tries to optimize compares by replacing them with an existing instruction that sets the flags the same way. This allows it to replace a "cmp" with a "adds", generalizing the code that replaces "cmp" with "sub". It also heuristically disables sinking of instructions that could potentially be used to replace compares (currently only if they're next to each other).
Differential revision: https://reviews.llvm.org/D38378
llvm-svn: 322737
Mark G_FPEXT and G_FPTRUNC as legal or libcall, depending on hardware
support, but only for conversions between float and double.
Also add the necessary boilerplate so that the LegalizerHelper can
introduce the required libcalls. This also works only for float and
double, but isn't too difficult to extend when the need arises.
llvm-svn: 322651
Change symbol values in the stack_size section from being 8 bytes, to being a target dependent size.
Differential Revision: https://reviews.llvm.org/D42108
llvm-svn: 322619
For hard float with VFP4, it is legal. Otherwise, we use libcalls.
This needs a bit of support in the LegalizerHelper for soft float
because we didn't handle G_FMA libcalls yet. The support is trivial, as
the only difference between G_FMA and other libcalls that we already
handle is that it has 3 input operands rather than just 2.
llvm-svn: 322366
This patch teaches the Arm back-end to generate the SMMULR, SMMLAR and SMMLSR
instructions from equivalent IR patterns.
Differential Revision: https://reviews.llvm.org/D41775
llvm-svn: 322361
The PeepholeOptimizer would fail for vregs without a definition. If this
was caused by an undef operand abort to keep the code simple (so we
don't need to add logic everywhere to replicate the undef flag).
Differential Revision: https://reviews.llvm.org/D40763
llvm-svn: 322319
When replacing a PHI the PeepholeOptimizer currently takes the register
class of the register at the first operand. This however is not correct
if this argument has a subregister index.
As there is currently no API to query the register class resulting from
applying a subregister index to all registers in a class, we can only
abort in these cases and not perform the transformation.
This changes findNextSource() to require the end of all copy chains to
not use a subregister if there is any PHI in the chain. I had to rewrite
the overly complicated inner loop there to have a good place to insert
the new check.
This fixes https://llvm.org/PR33071 (aka rdar://32262041)
Differential Revision: https://reviews.llvm.org/D40758
llvm-svn: 322313
For hard float, it is legal.
For soft float, we need to lower to 0 - x first, and then we can use the
libcall for G_FSUB. This is undoing some of the canonicalization
performed by the IRTranslator (which introduces G_FNEG when it sees a
0 - x). Ideally, that canonicalization would be performed by a
pre-legalizer pass that would allow targets to opt out of this behaviour
rather than dance around it in the legalizer.
llvm-svn: 322168
Planning to add support for named vregs. This puts is in a conundrum since
physregs are named as well. To rectify this we need to use a sigil other than
'%' for physregs in MIR. We've settled on using '$' for physregs but first we
must repurpose it from external symbols using it, which is what this commit is
all about. We think '&' will have familiar semantics for C/C++ users.
llvm-svn: 322146
In -debug output we print "pred:" whenever a MachineOperand is a
predicate operand in the instruction descriptor, and "opt:" whenever a
MachineOperand is an optional def in the instruction descriptor.
Differential Revision: https://reviews.llvm.org/D41870
llvm-svn: 322096
Since register classes and banks are already printed with the register
definition, don't print it at the end of every instruction anymore.
This follows MIR in this regard and is another step to the unification
of the two formats.
llvm-svn: 322086
Summary:
This commit updates the BufferByteStreamer, used by DebugLocStream
to buffer bytes/comments to put in the debug_loc section, to
make sure that the Buffer and Comments vectors are synced.
Previously, when an SLEB128 or ULEB128 was emitted together with
a comment, the vectors could be out-of-sync if the LEB encoding
added several entries to the Buffer vectors, while we only added
a single entry to the Comments vector.
The goal with this is to get the comments in the debug_loc
section in the .s file correctly aligned.
Example (using ARM as target):
Instead of
.byte 144 @ sub-register DW_OP_regx
.byte 128 @ 256
.byte 2 @ DW_OP_piece
.byte 147 @ 8
.byte 8 @ sub-register DW_OP_regx
.byte 144 @ 257
.byte 129 @ DW_OP_piece
.byte 2 @ 8
.byte 147 @
.byte 8 @
we now get
.byte 144 @ sub-register DW_OP_regx
.byte 128 @ 256
.byte 2 @
.byte 147 @ DW_OP_piece
.byte 8 @ 8
.byte 144 @ sub-register DW_OP_regx
.byte 129 @ 257
.byte 2 @
.byte 147 @ DW_OP_piece
.byte 8 @ 8
Reviewers: JDevlieghere, rnk, aprantl
Reviewed By: aprantl
Subscribers: davide, Ka-Ka, uabelho, aemerson, javed.absar, kristof.beyls, llvm-commits, JDevlieghere
Differential Revision: https://reviews.llvm.org/D41763
llvm-svn: 321907
Select G_PHI to PHI and manually constrain the result register. This is
very similar to how COPY is handled, so extract and reuse some of that
code.
llvm-svn: 321797
We used to handle G_CONSTANT with pointer type by forcing the type of
the result register to s32 and then letting TableGen handle it.
Unfortunately, setting the type only works for generic virtual
registers, that haven't yet been constrained to a register class (e.g.
those used only by a COPY later on). If the result register has already
been constrained as a use of a previously selected instruction, then
setting the type will assert.
It would be nice to be able to teach TableGen to select pointer
constants the same as integer constants, but since it's such an edge
case (at the moment the only pointer constant that we're generally
interested in is 0, and that is mostly used for comparisons and selects,
which are also not supported by TableGen) it's probably not worth the
effort right now. Instead, handle pointer constants with some trivial
handwritten code.
llvm-svn: 321793
Pointer constants are pretty rare, since we usually represent them as
integer constants and then cast to pointer. One notable exception is the
null pointer constant, which is represented directly as a G_CONSTANT 0
with pointer type. Mark it as legal and make sure it is selected like
any other integer constant.
llvm-svn: 321354
If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.
Differential Revision: https://reviews.llvm.org/D41350
llvm-svn: 321259
If a block has N predecessors, then the current algorithm will try to
sink common code to this block N times (whenever we visit a
predecessor). Every attempt to sink the common code includes going
through all predecessors, so the complexity of the algorithm becomes
O(N^2).
With this patch we try to sink common code only when we visit the block
itself. With this, the complexity goes down to O(N).
As a side effect, the moment the code is sunk is slightly different than
before (the order of simplifications has been changed), that's why I had
to adjust two tests (note that neither of the tests is supposed to test
SimplifyCFG):
* test/CodeGen/AArch64/arm64-jumptable.ll - changes in this test mimic
the changes that previous implementation of SimplifyCFG would do.
* test/CodeGen/ARM/avoid-cpsr-rmw.ll - in this test I disabled common
code sinking by a command line flag.
llvm-svn: 321236
The AArch64 backend contains code to optimize {s,u}{add,sub}.with.overflow during SelectionDAG. This commit ports that code to the ARM backend.
Differential revision: https://reviews.llvm.org/D35635
llvm-svn: 321224
We get an assertion in RegBankSelect for code along the lines of
my_32_bit_int = my_64_bit_int, which tends to translate into a 64-bit
load, followed by a G_TRUNC, followed by a 32-bit store. This appears in
a couple of places in the test-suite.
At the moment, the legalizer doesn't distinguish between integer and
floating point scalars, so a 64-bit load will be marked as legal for
targets with VFP, and so will the rest of the sequence, leading to a
slightly bizarre G_TRUNC reaching RegBankSelect.
Since the current support for 64-bit integers is rather immature, this
patch works around the issue by explicitly handling this case in
RegBankSelect and InstructionSelect. In the future, we may want to
revisit this decision and make sure 64-bit integer loads are narrowed
before reaching RegBankSelect.
llvm-svn: 321165
Summary:
Implement lower of unsigned saturation on an interval [0, k] where k + 1 is a power of two using USAT instruction in a similar way to how [~k, k] is lowered using SSAT on ARM models that supports it.
Patch by Marten Svanfeldt
Reviewers: t.p.northover, pbarrio, eastig, SjoerdMeijer, javed.absar, fhahn
Reviewed By: fhahn
Subscribers: fhahn, aemerson, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D41348
llvm-svn: 321164
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.
Differential Revision: https://reviews.llvm.org/D41177
llvm-svn: 320962
Work towards the unification of MIR and debug output by printing
`@foo` instead of `<ga:@foo>`.
Also print target flags in the MIR format since most of them are used on
global address operands.
Only debug syntax is affected.
llvm-svn: 320682
Recommitting rL319773, which was reverted due to a recursive issue
causing timeouts. This happened because I failed to check whether
the discovered loads could be narrowed further. In the case of a tree
with one or more narrow loads, that could not be further narrowed, as
well as a node that would need masking, an AND could be introduced
which could then be visited and recombined again with the same load.
This could again create the masking load, with would be combined
again... We now check that the load can be narrowed so that this
process stops.
Original commit message:
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.
Differential Revision: https://reviews.llvm.org/D41177
llvm-svn: 320679
Work towards the unification of MIR and debug output by printing
`%const.0 + 8` instead of `<cp#0+8>` and `%const.0 - 8` instead of
`<cp#0-8>`.
Only debug syntax is affected.
Differential Revision: https://reviews.llvm.org/D41116
llvm-svn: 320564
This is due to PR26161 needing to be resolved before we can fix
big endian bugs like PR35359. The work to split aggregates into smaller LLTs
instead of using one large scalar will take some time, so in the mean time
we'll fall back to SDAG.
Some ARM BE tests xfailed for now as a result.
Differential Revision: https://reviews.llvm.org/D40789
llvm-svn: 320388
This is a preparatory step for D34515.
This change:
- makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
- lowering is done by first converting the boolean value into the carry flag
using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
operations does the actual addition.
- for subtraction, given that ISD::SUBCARRY second result is actually a
borrow, we need to invert the value of the second operand and result before
and after using ARMISD::SUBE. We need to invert the carry result of
ARMISD::SUBE to preserve the semantics.
- given that the generic combiner may lower ISD::ADDCARRY and
ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
as well otherwise i64 operations now would require branches. This implies
updating the corresponding test for unsigned.
- add new combiner to remove the redundant conversions from/to carry flags
to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
- fixes PR34045
- fixes PR34564
- fixes PR35103
Differential Revision: https://reviews.llvm.org/D35192
llvm-svn: 320355
Test (some of) the patterns for selecting PKHBT and PKHTB. The others
are just very similar to the ones we're testing and there would be
little value in covering them as well.
llvm-svn: 320352
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.
Differential Revision: https://reviews.llvm.org/D39604
llvm-svn: 319773
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.
The MIR printer prints the IR name of a MBB only for block definitions.
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix
Differential Revision: https://reviews.llvm.org/D40422
llvm-svn: 319665
Summary:
The compiler fails with the following error message:
fatal error: error in backend: ran out of registers during
register allocation
Tail call optimization for Armv8-M.base fails to meet all the required
constraints when handling calls to function pointers where the
arguments take up r0-r3. This is because the pointer to the
function to be called can only be stored in r0-r3, but these are
all occupied by arguments. This patch makes sure that tail call
optimization does not try to handle this type of calls.
Reviewers: chill, MatzeB, olista01, rengolin, efriedma
Reviewed By: olista01, efriedma
Subscribers: efriedma, aemerson, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D40706
llvm-svn: 319664
This matches how it is done on X86.
This allows using emulated tls on windows; in MinGW environments,
native tls isn't supported at the moment.
Differential Revision: https://reviews.llvm.org/D40769
llvm-svn: 319643
output
As part of the unification of the debug format and the MIR format,
always use `printReg` to print all kinds of registers.
Updated the tests using '_' instead of '%noreg' until we decide which
one we want to be the default one.
Differential Revision: https://reviews.llvm.org/D40421
llvm-svn: 319445
As part of the unification of the debug format and the MIR format, avoid
printing "vreg" for virtual registers (which is one of the current MIR
possibilities).
Basically:
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/%vreg([0-9]+)/%\1/g"
* grep -nr '%vreg' . and fix if needed
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/ vreg([0-9]+)/ %\1/g"
* grep -nr 'vreg[0-9]\+' . and fix if needed
Differential Revision: https://reviews.llvm.org/D40420
llvm-svn: 319427
Partially reverting enabling of post-legalization store merge
(r319036) for just ARM backend as it is causing incorrect code
in some Thumb2 cases.
llvm-svn: 319331
When lowering a G_BRCOND, we generate a TSTri of the condition against
1, which sets the flags, and then a Bcc which branches based on the
value of the flags.
Unfortunately, we were using the wrong condition code to check whether
we need to branch (EQ instead of NE), which caused all our branches to
do the opposite of what they were intended to do. This patch fixes the
issue by using the correct condition code.
llvm-svn: 319313
This will allow compilation of assembly files targeting armv7e-m without having
to specify the Tag_CPU_arch attribute as a workaround.
Differential revision: https://reviews.llvm.org/D40370
Patch by Ian Tessier!
llvm-svn: 319303
Summary:
From the bug report:
> The problem is that it fails when trying to compare -65536 (or 4294901760) to 0xFFFF,0000. This is because the
> constant in the instruction is sign extended to 64 bits (0xFFFF,FFFF,FFFF,0000) and then compared to the non
> extended 64 bit version expected by TableGen.
>
> In contrast, the DAGISelEmitter generates special code for AND immediates (OPC_CheckAndImm), which does not
> sign extend.
This patch doesn't introduce the special case for AND (and OR) immediates since the majority of it is related to handling known bits that have no effect on the result and GlobalISel doesn't detect known-bits at this time. Instead this patch just ensures that the immediate is extended consistently on both sides of the check.
Thanks to Diana Picus for the detailed bug report.
Reviewers: rovka
Reviewed By: rovka
Subscribers: kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D40532
llvm-svn: 319252
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.
* Only debug printing is affected. It now follows MIR.
Differential Revision: https://reviews.llvm.org/D40417
llvm-svn: 319187
Unoptimized IR can have linear sequences of stores to an array, where the
initial GEP for the first store is formed from the pointer to the array, and the
GEP for each store after the first is formed from the previous GEP with some
offset in an inductive fashion.
The (large) resulting DAG when analyzed by DAGCombine undergoes an excessive
number of combines as each store node is examined every time its' offset node
is combined with any child of the offset. One of the transformations is
findBetterNeighborChains which assists MergeConsecutiveStores. The former
relies on repeated chain walking to do its' work, however MergeConsecutiveStores
is disabled at O0 which makes the transformation redundant.
Any optimization level other than O0 would invoke InstCombine which would
resolve the chain of GEPs into flat base + offset GEP for each store which
does not exhibit the repeated examination of each store to the array.
Disabling this optimization fixes an excessive compile time issue (30~ minutes
for the test case provided) at O0.
Reviewers: niravd, craig.topper, t.p.northover
Differential Revision: https://reviews.llvm.org/D40193
llvm-svn: 319142
Summary:
Now that store-merge is only generates type-safe stores, do a second
pass just before instruction selection to allow lowered intrinsics to
be merged as well.
Reviewers: jyknight, hfinkel, RKSimon, efriedma, rnk, jmolloy
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33675
llvm-svn: 319036
The commit https://reviews.llvm.org/rL318143 computes incorrectly to offset to
restore LR from.
The number of tPOP operands is 2 (condition) + 2 (implicit def and use of SP) +
count of the popped registers. We need to load LR from just past the last
register, hence the correct offset should be either getNumOperands() - 4 and
getNumExplicitOperands() - 2 (multiplied by 4).
Differential revision: https://reviews.llvm.org/D40305
llvm-svn: 319014
TableGen already generates code for selecting a G_FDIV, so we only need
to add a test.
For the legalizer and reg bank select, we do the same thing as for the
other floating point binary operations: either mark as legal if we have
a FP unit or lower to a libcall, and map to the floating point
registers.
llvm-svn: 318915
TableGen already generates code for selecting a G_FMUL, so we only need
to add a test for that part.
For the legalizer and reg bank select, we do the same thing as the other
floating point binary operators: either mark as legal if we have a FP
unit or lower to a libcall, and map to the floating point registers.
llvm-svn: 318910
Add instruction selector test for RSBri, which is derived from
AsI1_rbin_irs, and make sure it doesn't get mistaken for SUBri, which is
derived from the very similar AsI1_bin_irs pattern.
llvm-svn: 318643
Remove some of the instruction selector tests for binary operators (and,
or, xor). These are all derived from the same kind of TableGen pattern,
AsI1_bin_irs, so there's no point in testing all of them.
llvm-svn: 318642
Enabling and using dwarf exceptions seems like an easier path
to take, than to make the COFF/ARM backend output EHABI directives.
Previously, no EH model was enabled at all on this target.
There's no point in setting UseIntegratedAssembler to false since
GNU binutils doesn't support Windows on ARM, and since we don't
need to support external assembler, we don't need to use register
numbers in cfi directives.
Differential Revision: https://reviews.llvm.org/D39532
llvm-svn: 318510
't' constraint normally only accepts f32 operands, but for VCVT the
operands can be i32. LLVM is overly restrictive and rejects asm like:
float foo() {
float result;
__asm__ __volatile__(
"vcvt.f32.s32 %[result], %[arg1]\n"
: [result]"=t"(result)
: [arg1]"t"(0x01020304) );
return result;
}
Relax the value type for 't' constraint to either f32 or i32.
Differential Revision: https://reviews.llvm.org/D40137
llvm-svn: 318472
Change the calculation for the desired ValueType for non-sign
extending loads, as in those cases we don't care about the
higher bits. This creates a smaller ExtVT and allows for such
combinations as:
(srl (zextload i16, [addr]), 8) -> (zextload i8, [addr + 1])
Differential Revision: https://reviews.llvm.org/D40034
llvm-svn: 318390
artifacts along with DCE
Legalization Artifacts are all those insts that are there to make the
type system happy. Currently, the target needs to say all combinations
of extends and truncs are legal and there's no way of verifying that
post legalization, we only have *truly* legal instructions. This patch
changes roughly the legalization algorithm to process all illegal insts
at one go, and then process all truncs/extends that were added to
satisfy the type constraints separately trying to combine trivial cases
until they converge. This has the added benefit that, the target
legalizerinfo can only say which truncs and extends are okay and the
artifact combiner would combine away other exts and truncs.
Updated legalization algorithm to roughly the following pseudo code.
WorkList Insts, Artifacts;
collect_all_insts_and_artifacts(Insts, Artifacts);
do {
for (Inst in Insts)
legalizeInstrStep(Inst, Insts, Artifacts);
for (Artifact in Artifacts)
tryCombineArtifact(Artifact, Insts, Artifacts);
} while(!Insts.empty());
Also, wrote a simple wrapper equivalent to SetVector, except for
erasing, it avoids moving all elements over by one and instead just
nulls them out.
llvm-svn: 318210
Because the block-splitting code is multi-purpose, we have to meddle with the
branches when using it to fixup a conditional branch destination. We got the
code right, but forgot to update the CFG so the verifier complained when
expensive checks were on.
Probably harmless since constant-islands comes so late, but best to fix it
anyway.
llvm-svn: 318148
Get rid of the handwritten instruction selector code for handling
G_CONSTANT. This code wasn't checking all the preconditions correctly
anyway, so it's better to leave it to TableGen, which can handle at
least some cases correctly (e.g. MOVi, MOVi16, folding into binary
operations). Also add tests to cover those cases.
llvm-svn: 318146
When we emit a tail call for Armv8-M, but then discover that the caller needs to
save/restore `LR`, we convert the tail call to an ordinary one, since restoring
`LR` takes extra instructions, which may negate the benefits of the tail
call. If the callee, however, takes stack arguments, this conversion is
incorrect, since nothing has been done to pass the stack arguments.
Thus the patch reverts https://reviews.llvm.org/rL294000
Also, we improve the instruction sequence for popping `LR` in the case when we
couldn't immediately find a scratch low register, but we can use as a temporary
one of the callee-saved low registers and restore `LR` before popping other
callee-saves.
Differential Revision: https://reviews.llvm.org/D39599
llvm-svn: 318143
Summary:
This fixes PR35221.
Use pseudo-instructions to let MachineCSE hoist global address computation.
Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D39871
llvm-svn: 318081
Make one of the legalizer tests a bit more robust by making sure all
values we're interested in are used (either in a store or a return) and
by using loads instead of constants for obtaining values on fewer than
32 bits. This should make the test less fragile to changes in the
legalize combiner, since those loads are legal (as opposed to the
constants, which were being widened and thus produced opportunities for
the legalize combiner).
llvm-svn: 318047
When generating table jump code for switch statements, place the jump
table label as the first operand in the various addition instructions
in order to enable addressing mode selectors to better match index
computation and possibly fold them into the addressing mode of the
table entry load instruction.
Differential revision: https://reviews.llvm.org/D39752
llvm-svn: 318033
This changes the interface of how targets describe how to legalize, see
the below description.
1. Interface for targets to describe how to legalize.
In GlobalISel, the API in the LegalizerInfo class is the main interface
for targets to specify which types are legal for which operations, and
what to do to turn illegal type/operation combinations into legal ones.
For each operation the type sizes that can be legalized without having
to change the size of the type are specified with a call to setAction.
This isn't different to how GlobalISel worked before. For example, for a
target that supports 32 and 64 bit adds natively:
for (auto Ty : {s32, s64})
setAction({G_ADD, 0, s32}, Legal);
or for a target that needs a library call for a 32 bit division:
setAction({G_SDIV, s32}, Libcall);
The main conceptual change to the LegalizerInfo API, is in specifying
how to legalize the type sizes for which a change of size is needed. For
example, in the above example, how to specify how all types from i1 to
i8388607 (apart from s32 and s64 which are legal) need to be legalized
and expressed in terms of operations on the available legal sizes
(again, i32 and i64 in this case). Before, the implementation only
allowed specifying power-of-2-sized types (e.g. setAction({G_ADD, 0,
s128}, NarrowScalar). A worse limitation was that if you'd wanted to
specify how to legalize all the sized types as allowed by the LLVM-IR
LangRef, i1 to i8388607, you'd have to call setAction 8388607-3 times
and probably would need a lot of memory to store all of these
specifications.
Instead, the legalization actions that need to change the size of the
type are specified now using a "SizeChangeStrategy". For example:
setLegalizeScalarToDifferentSizeStrategy(
G_ADD, 0, widenToLargerAndNarrowToLargest);
This example indicates that for type sizes for which there is a larger
size that can be legalized towards, do it by Widening the size.
For example, G_ADD on s17 will be legalized by first doing WidenScalar
to make it s32, after which it's legal.
The "NarrowToLargest" indicates what to do if there is no larger size
that can be legalized towards. E.g. G_ADD on s92 will be legalized by
doing NarrowScalar to s64.
Another example, taken from the ARM backend is:
for (unsigned Op : {G_SDIV, G_UDIV}) {
setLegalizeScalarToDifferentSizeStrategy(Op, 0,
widenToLargerTypesUnsupportedOtherwise);
if (ST.hasDivideInARMMode())
setAction({Op, s32}, Legal);
else
setAction({Op, s32}, Libcall);
}
For this example, G_SDIV on s8, on a target without a divide
instruction, would be legalized by first doing action (WidenScalar,
s32), followed by (Libcall, s32).
The same principle is also followed for when the number of vector lanes
on vector data types need to be changed, e.g.:
setAction({G_ADD, LLT::vector(8, 8)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(16, 8)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(4, 16)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(8, 16)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(2, 32)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(4, 32)}, LegalizerInfo::Legal);
setLegalizeVectorElementToDifferentSizeStrategy(
G_ADD, 0, widenToLargerTypesUnsupportedOtherwise);
As currently implemented here, vector types are legalized by first
making the vector element size legal, followed by then making the number
of lanes legal. The strategy to follow in the first step is set by a
call to setLegalizeVectorElementToDifferentSizeStrategy, see example
above. The strategy followed in the second step
"moreToWiderTypesAndLessToWidest" (see code for its definition),
indicating that vectors are widened to more elements so they map to
natively supported vector widths, or when there isn't a legal wider
vector, split the vector to map it to the widest vector supported.
Therefore, for the above specification, some example legalizations are:
* getAction({G_ADD, LLT::vector(3, 3)})
returns {WidenScalar, LLT::vector(3, 8)}
* getAction({G_ADD, LLT::vector(3, 8)})
then returns {MoreElements, LLT::vector(8, 8)}
* getAction({G_ADD, LLT::vector(20, 8)})
returns {FewerElements, LLT::vector(16, 8)}
2. Key implementation aspects.
How to legalize a specific (operation, type index, size) tuple is
represented by mapping intervals of integers representing a range of
size types to an action to take, e.g.:
setScalarAction({G_ADD, LLT:scalar(1)},
{{1, WidenScalar}, // bit sizes [ 1, 31[
{32, Legal}, // bit sizes [32, 33[
{33, WidenScalar}, // bit sizes [33, 64[
{64, Legal}, // bit sizes [64, 65[
{65, NarrowScalar} // bit sizes [65, +inf[
});
Please note that most of the code to do the actual lowering of
non-power-of-2 sized types is currently missing, this is just trying to
make it possible for targets to specify what is legal, and how non-legal
types should be legalized. Probably quite a bit of further work is
needed in the actual legalizing and the other passes in GlobalISel to
support non-power-of-2 sized types.
I hope the documentation in LegalizerInfo.h and the examples provided in the
various {Target}LegalizerInfo.cpp and LegalizerInfoTest.cpp explains well
enough how this is meant to be used.
This drops the need for LLT::{half,double}...Size().
Differential Revision: https://reviews.llvm.org/D30529
llvm-svn: 317560
The GlobalISel TableGen backend didn't check for predicates on the
source children. This caused it to generate code for ARM patterns such
as SMLABB or similar, but without properly checking for the sext_16_node
part of the operands. This in turn meant that we would select SMLABB
instead of MLA for simple sequences such as s32 + s32 * s32, which is
wrong (we want a MLA on the full operands, not just their bottom 16
bits).
This patch forces TableGen to skip patterns with predicates on the src
children, so it doesn't generate code for SMLABB and other similar ARM
instructions at all anymore. AArch64 and X86 are not affected.
Differential Revision: https://reviews.llvm.org/D39554
llvm-svn: 317313
The generic dag combiner will fold:
(shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
(shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2)
This can create constants which are too large to use as an immediate.
Many ALU operations are also able of performing the shl, so we can
unfold the transformation to prevent a mov imm instruction from being
generated.
Other patterns, such as b + ((a << 1) | 510), can also be simplified
in the same manner.
Differential Revision: https://reviews.llvm.org/D38084
llvm-svn: 317197
Remove the G_FADD testcases from arm-legalizer.mir, they are covered by
arm-legalizer-fp.mir (I probably forgot to delete them when I created
that test).
llvm-svn: 316573
We were generating BLX for all the calls, which was incorrect in most
cases. Update ARMCallLowering to generate BL for direct calls, and BLX,
BX_CALL or BMOVPCRX_CALL for indirect calls.
llvm-svn: 316570
Separate the test cases that deal with calls from the rest of the IR
Translator tests.
We split into 2 different files, one for testing parameter and result
lowering, and one for testing the various different kinds of calls that
can occur (BL, BLX, BX_CALL etc).
llvm-svn: 316569
Swap the compare operands if the lhs is a shift and the rhs isn't,
as in arm and T2 the shift can be performed by the compare for its
second operand.
Differential Revision: https://reviews.llvm.org/D39004
llvm-svn: 316562
This updates the MIRPrinter to include the regclass when printing
virtual register defs, which is already valid syntax for the
parser. That is, given 64 bit %0 and %1 in a "gpr" regbank,
%1(s64) = COPY %0(s64)
would now be written as
%1:gpr(s64) = COPY %0(s64)
While this change alone introduces a bit of redundancy with the
registers block, it allows us to update the tests to be more concise
and understandable and brings us closer to being able to remove the
registers block completely.
Note: We generally only print the class in defs, but there is one
exception. If there are uses without any defs whatsoever, we'll print
the class on all uses. I'm not completely convinced this comes up in
meaningful machine IR, but for now the MIRParser and MachineVerifier
both accept that kind of stuff, so we don't want to have a situation
where we can print something we can't parse.
llvm-svn: 316479
This is in preparation for a verifier check that makes sure
copies are of the same size (when generic virtual registers are involved).
llvm-svn: 316388
This patch implements dynamic stack (re-)alignment for 16-bit Thumb. When
targeting processors, which support only the 16-bit Thumb instruction set
the compiler ignores the alignment attributes of automatic variables and may
silently generate incorrect code.
Differential revision: https://reviews.llvm.org/D38143
llvm-svn: 316289
This converts a large and somewhat arbitrary set of tests to use
update_mir_test_checks. I ran the script on all of the tests I expect
to need to modify for an upcoming mir syntax change and kept the ones
that obviously didn't change the tests in ways that might make it
harder to understand.
llvm-svn: 316137
We end up creating COPY's that are either truncating/extending and this
should be illegal.
https://reviews.llvm.org/D37640
Patch for X86 and ARM by igorb, rovka
llvm-svn: 315240
Unfortunately TableGen doesn't handle this yet:
Unable to deduce gMIR opcode to handle Src (which is a leaf).
Just add some temporary hand-written code to generate the proper MOVsr.
llvm-svn: 315071
Issues addressed since original review:
- Avoid bug in regalloc greedy/machine verifier when forwarding to use
in an instruction that re-defines the same virtual register.
- Fixed bug when forwarding to use in EarlyClobber instruction slot.
- Fixed incorrect forwarding to register definitions that showed up in
explicit_uses() iterator (e.g. in INLINEASM).
- Moved removal of dead instructions found by
LiveIntervals::shrinkToUses() outside of loop iterating over
instructions to avoid instructions being deleted while pointed to by
iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
llvm-svn: 314729
LR is an untypical callee saved register in that it is restored into a
different register (PC) and thus does not live-out of the return block.
This case requires the `Restored` flag in CalleeSavedInfo to be cleared.
This fixes a number of cases where this wasn't handled correctly yet.
llvm-svn: 314471
In setupEntryBlockAndCallSites in CodeGen/SjLjEHPrepare.cpp,
we fetch and store the actual frame pointer, but on return via
the longjmp intrinsic, it always was restored into the r7 variable.
On windows, the frame pointer should be restored into r11 instead of r7.
On Darwin (where sjlj exception handling is used by default), the frame
pointer is always r7, both in arm and thumb mode, and likewise, on
windows, the frame pointer always is r11.
On linux however, if sjlj exception handling is enabled (which it isn't
by default), libcxxabi and the user code can be built in differing modes
using different registers as frame pointer. Therefore, when restoring
registers on a platform where we don't always use the same register
depending on code mode, restore both r7 and r11.
Differential Revision: https://reviews.llvm.org/D38253
llvm-svn: 314451
Without this, we could end up trying to get the Nth (0-indexed) element
from a subvector of size N.
Differential Revision: https://reviews.llvm.org/D37880
llvm-svn: 314380
It leads to some improvements, but also a regression for the simple
case, so it's not clearly a good idea.
test/CodeGen/ARM/vcvt.ll now has test coverage to show the difference.
Ultimately, the right solution is probably to custom-lower fp-to-int
conversions, to something like ARMISD::VCVT_F32_S32 plus a bitcast.
It's hard to do the right thing when the implicit bitcast isn't visible
to DAG transforms.
llvm-svn: 314169
This teach simplifyDemandedBits to handle constant splat vector shifts.
This required changing some uses of getZExtValue to getLimitedValue since we can't rely on legalization using getShiftAmountTy for the shift amount.
I believe there may have been a bug in the ((X << C1) >>u ShAmt) handling where we didn't check if the inner shift was too large. I've fixed that here.
I had to add new patterns to ARM because the zext/sext the patterns were trying to look for got turned into an any_extend with this patch. Happy to split that out too, but not sure how to test without this change.
Differential Revision: https://reviews.llvm.org/D37665
llvm-svn: 314139
For the following function:
double fn1(double d0, double d1, double d2) {
double a = -d0 - d1 * d2;
return a;
}
on ARM, LLVM generates code along the lines of
vneg.f64 d0, d0
vmls.f64 d0, d1, d2
i.e., a negate and a multiply-subtract.
The attached patch adds instruction selection patterns to allow it to generate the single instruction
vnmla.f64 d0, d1, d2
(multiply-add with negation) instead, like GCC does.
Committed on behalf of @gergo- (Gergö Barany)
Differential Revision: https://reviews.llvm.org/D35911
llvm-svn: 313972