Recall that MSVC always gives enums the type 'int', nothing else. MSVC
2015 does not appear to have this problem anymore.
Clang-cl -Wmicrosoft-enum-value flags this, FWIW, so now I have a true
positive for my warning. :)
llvm-svn: 278762
This currently breaks the greendragon clang-stage1-configure-RA/ and
brotli. It is probably just uncovering a pre-existing problem. Reverting
temporarily to get the buildbots green again. A reduced testcase will
follow shortly.
This reverts commit r278659.
llvm-svn: 278711
This adds two new utility functions findLoopControlBlock and findLoopPreheader
to MachineLoop and MachineLoopInfo. These functions are refactored and taken
from the Hexagon target as they are target independent; thus this is intendend to
be a non-functional change.
Differential Revision: https://reviews.llvm.org/D22959
llvm-svn: 278661
Summary:
The assembler currently does not check the branch target for CBZ/CBNZ
instructions, which only permit branching forwards with a positive offset. This
adds validation for the branch target to ensure negative PC-relative offsets are
not encoded into the instruction, whether specified as a literal or as an
assembler symbol.
Reviewers: rengolin, t.p.northover
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D23312
llvm-svn: 278659
1. Use shuffle to insert element i1 into vector. The previous implementation was incorrect ( dest_bit OR src_bit , it doesn't clear the bit if src_bit=0 )
2. Improve shuffle i1 vector, use CVT2MASK if supported instead TRUNCATE.
Differential Revision: http://reviews.llvm.org/D23347
llvm-svn: 278623
LowerTargetConstantPool is not properly setting the TargetFlag to indicate
desired relocation. Coding error, the offset parameter was omitted, so the
TargetFlag was used as the offset, and the TargetFlag defaulted to zero.
This only affects -fpic compilation, and only those items created in a
Constant Pool, for example a vector of constants. Halide ran into this issue.
llvm-svn: 278614
On a Windows build of Chromium, r278532 (up to r278539)
X86FrameLowering::emitEpilogue because it wasn't wary enough of the
return of MachineBasicBlock::getFirstTerminator. Guard all the uses
here.
Note that r278532 *looks* like an NFC commit (just an API change), but
it removes a couple of layers of abstraction and is probably causing
optimization differences in MSVC.
llvm-svn: 278572
This bring LLVM-generated PTX closer to what nvcc generates and avoids
triggering issues in ptxas.
For instance, ptxas does not accept .s16 (or .u16) registers as operands
for .fp16 instructions.
Differential Revision: https://reviews.llvm.org/D23460
llvm-svn: 278568
Trunk would try to create something like "stp x9, x8, [x0], #512", which isn't actually a valid instruction.
Differential revision: https://reviews.llvm.org/D23368
llvm-svn: 278559
Summary: It triggers exponential behavior when the DAG has many branches.
Reviewers: hfinkel, kbarton
Subscribers: iteratee, nemanjai, echristo
Differential Revision: https://reviews.llvm.org/D23428
llvm-svn: 278548
Currently X86ISelLowering has a similar transformation for sexts:
sext(add_nsw(x, C)) --> add(sext(x), C_sext)
In this change I extend this code to handle zexts as well.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D23359
llvm-svn: 278520
This re-factoring could cause the following slight changes in generated
code, though none were observed during testing:
- MachineScheduler could decide not to cluster some loads/stores if
there are other load/stores with non-pairable opcodes that have the
same base register and offset as a pairable set of load/stores. One
case of different MachineScheduler pairing did show up in my testing,
but it wasn't due to this issue, but due
BaseMemOpClusterMutation::clusterNeighboringMemOps() being unstable
w.r.t. the order it considers memory operations. See PR28942.
- The ImplicitNullChecks optimization could be done for more load/store
opcodes. This optimization isn't done for C/C++ code, so it didn't
show up in my testing.
Reviewers: mcrosier, t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D23365
llvm-svn: 278515
...and the two followup commits:
Revert "[Sparc][Leon] Missed resetting option flags from check-in 278489."
Revert "[Sparc][Leon] Errata fixes for various errata in different
versions of the Leon variants of the Sparc 32 bit processor."
This reverts commit r274856, r278489, and r278492.
llvm-svn: 278511
The PALIGNR target shuffle decode was not taking into account that DecodePALIGNRMask (rather oddly) expects the operands to be in reverse order, nor was it detecting unary patterns, causing combines to combine with the incorrect input.
The cgbuiltin, auto upgrade and instruction comments code correctly swap the operands so are not affected.
llvm-svn: 278494
The nature of the errata are listed in the comments preceding the errata fix passes. Relevant unit tests are implemented for each of these.
These changes update older versions of these errata fixes with improvements to code and unit tests.
Differential Revision: https://reviews.llvm.org/D21960
llvm-svn: 278489
Remove all ilist_iterator to pointer casts. There were two reasons for
casts:
- Checking for an uninitialized (i.e., null) iterator. I added
MachineInstrBundleIterator::isValid() to check for that case.
- Comparing an iterator against the underlying pointer value while
avoiding converting the pointer value to an iterator. This is
occasionally necessary in MachineInstrBundleIterator, since there is
an assertion in the constructors that the underlying MachineInstr is
not bundled (but we don't care about that if we're just checking for
pointer equality).
To support the latter case, I rewrote the == and != operators for
ilist_iterator and MachineInstrBundleIterator.
- The implicit constructors now use enable_if to exclude
const-iterator => non-const-iterator conversions from overload
resolution (previously it was a compiler error on instantiation, now
it's SFINAE).
- The == and != operators are now global (friends), and are not
templated.
- MachineInstrBundleIterator has overloads to compare against both
const_pointer and const_reference. This avoids the implicit
conversions to MachineInstrBundleIterator that assert, instead just
checking the address (and I added unit tests to confirm this).
Notably, the only remaining uses of ilist_iterator::getNodePtrUnchecked
are in ilist.h, and no code outside of ilist*.h directly relies on this
UB end-iterator-to-pointer conversion anymore. It's still needed for
ilist_*sentinel_traits, but I'll clean that up soon.
llvm-svn: 278478
This helped to improved memory-folding and register coalescing optimizations.
Also, this patch fixed the tracker #17229.
Reviewer: Craig Topper.
Differential Revision: https://reviews.llvm.org/D23108
llvm-svn: 278431
From the point of view of register assignment, byval parameters are
ignored: a byval parameter is not going to be assigned to a register,
and it will not affect the assignments of subsequent parameters.
When matching registers with parameters in the bit tracker, make sure
to skip byval parameters before advancing the registers.
llvm-svn: 278375
Check for end() before skipping through debug values. This avoids
dereferencing end() when the instruction is the final one in the basic
block. (It still assumes that a debug value will not be the final
instruction in the basic block. No tests seemed to violate that.)
Many Hexagon tests trigger this, but they happen to magically pass right
now. I found this because WIP patches for PR26753 convert them into
crashes.
llvm-svn: 278355
The previous implementation (not custom) doesn't enforce zeroing off upper bits. The assumption is that i1 PRODUCER (truncate and extractelement) must zero all upper bits, so i1 CONSUMER instructions ( test, zext, save, etc) can be done without additional zeroing.
Make extractelement i1 lowering custom for all vector i1.
Differential Revision: http://reviews.llvm.org/D23246
llvm-svn: 278328
This patch helps avoid false dependencies on undef registers by updating the machine instructions' undef operand to use a register that the instruction is truly dependent on, or use a register with clearance higher than Pref.
Pseudo example:
loop:
xmm0 = ...
xmm1 = vcvtsi2sdl eax, xmm0<undef>
... = inst xmm0
jmp loop
In this example, selecting xmm0 as the undef register creates false dependency between loop iterations.
This false dependency cannot be solved by inserting an xor before vcvtsi2sdl because xmm0 is alive at the point of the vcvtsi2sdl instruction.
Selecting a different register instead of xmm0, especially a register that is not used in the loop, will eliminate this problem.
Differential Revision: https://reviews.llvm.org/D22466
llvm-svn: 278321
Summary:
This patch define and implement amdgcn image intrinsics with sampler.
1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty,
and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name
to have three suffixes to overload each of these three types;
2. D128 as well as two other flags are implied in the three types, for example,
if you use v8i32 as resource type, then r128 is 0!
3. don't expose TFE flag, and other flags are exposed in the instruction order:
unrm, glc, slc, lwe and da.
Differential Revision: http://reviews.llvm.org/D22838
Reviewed by:
arsenm and tstellarAMD
llvm-svn: 278291
Insert before the skip branch if one is created.
This is a somewhat more natural placement relative
to the skip branches, and makes it possible to implement
analyzeBranch for skip blocks.
The test changes are mostly due to a quirk where
the block label is not emitted if there is a terminator
that is not also a branch.
llvm-svn: 278273
Floating point instructions use general purpose registers, so the few
instructions that can put floating point immediates into registers are,
in fact, integer instruction. Use them explicitly instead of having
pseudo-instructions specifically for dealing with floating point values.
Simplify the constant loading instructions (from sdata) to have only two:
one for 32-bit values and one for 64-bit values: CONST32 and CONST64.
llvm-svn: 278244
In debug mode extra macros are enabled for several C++ algorithms. Some of them
may cause unfortunate build failures.
This commit adds a redundant operator() to work around one of those troublesome
macros which was hit accidentally by change r278012.
llvm-svn: 278241
isUndefOrEqual and isUndefOrInRange treated all -ve shuffle mask values as UNDEF, now it has to be SM_SentinelUndef (-1)
We already have asserts to check that lowered SHUFFLE_VECTOR indices are in the range -1 <= index < 2*masksize (or masksize for unary shuffles)
llvm-svn: 278218
Created a Thumb2 predicated pattern matcher that uses Thumb2 and
HasT2ExtractPack and used it to redefine the patterns for sxta{b|h}
and uxta{b|h}. Also used the similar patterns to fill in isel pattern
gaps for the corresponding instructions in the ARM backend.
The patch is mainly changes to tests since most of this functionality
appears not to have been tested.
Differential Revision: https://reviews.llvm.org/D23273
llvm-svn: 278207
This patch adds -emscripten-cxx-exceptions-whitelist option to
WebAssemblyLowerEmscriptenExceptions pass. This options is the list of
function names in which Emscripten-style exception handling is enabled.
This is to support emscripten's EXCEPTION_CATCHING_WHITELIST which
exists because of the performance impact of emscripten's non-zero-cost
EH method.
Patch by Heejin Ahn
Differential Revision: https://reviews.llvm.org/D23292
llvm-svn: 278171
A UD2 might make its way into the program via a call to @llvm.trap.
Obviously, calls are not terminators. However, we modeled the X86
instruction, UD2, as a terminator. Later on, this confuses the epilogue
insertion machinery which results in the epilogue getting inserted
before the UD2. For some platforms, like x64, the result is a
violation of the ABI.
Instead, model UD2/UD2B as a side effecting instruction which may
observe memory.
llvm-svn: 278144
This makes a trivial change in the emission of the per-function XRay
tables, and makes sure that the xray_instr_map section does show up in
the object file.
llvm-svn: 278113
We only had partial memory folding support for the intrinsic definitions, and (as noted on PR27481) was causing FR32/FR64/VR128 mismatch errors with the machine verifier.
This patch adds missing memory folding support for both intrinsics and the ffloor/fnearbyint/fceil/frint/ftrunc patterns and in doing so fixes the failing machine verifier stack folding tests from PR27481.
Differential Revision: https://reviews.llvm.org/D23276
llvm-svn: 278106
Previously SSE1 had a pattern that looked for integer types without bitcasts, but the type wasn't legal with only SSE1 and SSE2 add an identical pattern for the integer instructions.
llvm-svn: 278089
* Delete extra '_' prefixes from JS library function names. fixImports()
function in JS glue code deals with this for wasm.
* Change command-line option names in order to be consistent with
asm.js.
* Add missing lowering code for llvm.eh.typeid.for intrinsics
* Delete commas in mangled function names
* Fix a function argument attributes bug. Because we add the pointer to
the original callee as the first argument of invoke wrapper, all
argument attribute indices have to be incremented by one.
Patch by Heejin Ahn
Differential Revision: https://reviews.llvm.org/D23258
llvm-svn: 278081
This reverts commit r278048. Something changed between the last time I
built this--it takes awhile on my ridiculously slow and ancient
computer--and now that broke this.
llvm-svn: 278053
Summary:
Based on two patches by Michael Mueller.
This is a target attribute that causes a function marked with it to be
emitted as "hotpatchable". This particular mechanism was originally
devised by Microsoft for patching their binaries (which they are
constantly updating to stay ahead of crackers, script kiddies, and other
ne'er-do-wells on the Internet), but is now commonly abused by Windows
programs to hook API functions.
This mechanism is target-specific. For x86, a two-byte no-op instruction
is emitted at the function's entry point; the entry point must be
immediately preceded by 64 (32-bit) or 128 (64-bit) bytes of padding.
This padding is where the patch code is written. The two byte no-op is
then overwritten with a short jump into this code. The no-op is usually
a `movl %edi, %edi` instruction; this is used as a magic value
indicating that this is a hotpatchable function.
Reviewers: majnemer, sanjoy, rnk
Subscribers: dberris, llvm-commits
Differential Revision: https://reviews.llvm.org/D19908
llvm-svn: 278048
Moves of a value to a segment register from a 16-bit register is
equivalent to one from it's corresponding 32-bit register. Match gas's
behavior and rewrite instructions to the shorter of equivalent forms.
Reviewers: rnk, ab
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23166
llvm-svn: 278031
This patch adds support for some new relocation models to the ARM
backend:
* Read-only position independence (ROPI): Code and read-only data is accessed
PC-relative. The offsets between all code and RO data sections are known at
static link time. This does not affect read-write data.
* Read-write position independence (RWPI): Read-write data is accessed relative
to the static base register (r9). The offsets between all writeable data
sections are known at static link time. This does not affect read-only data.
These two modes are independent (they specify how different objects
should be addressed), so they can be used individually or together. They
are otherwise the same as the "static" relocation model, and are not
compatible with SysV-style PIC using a global offset table.
These modes are normally used by bare-metal systems or systems with
small real-time operating systems. They are designed to avoid the need
for a dynamic linker, the only initialisation required is setting r9 to
an appropriate value for RWPI code.
I have only added support to SelectionDAG, not FastISel, because
FastISel is currently disabled for bare-metal targets where these modes
would be used.
Differential Revision: https://reviews.llvm.org/D23195
llvm-svn: 278015
Summary:
Add support for the .insn directive.
.insn is an s390 specific directive that allows encoding of an instruction
instead of using a mnemonic. The motivating case is some code in node.js that
requires support for the .insn directive.
Reviewers: koriakin, uweigand
Subscribers: koriakin, llvm-commits
Differential Revision: https://reviews.llvm.org/D21809
llvm-svn: 278012
Summary:
The DAG combine transformation that was generating the
aarch64_neon_vcvtfp2fxs node was assuming that all
inputs where legal and wasn't accounting that the input
could be a v4f64 if we're trying to do the transformation
before legalization. We now bail out in this case.
All illegal types besides v4f64 were already rejected.
Fixes https://llvm.org/bugs/show_bug.cgi?id=28877.
Reviewers: jmolloy
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D23261
llvm-svn: 278002
Summary:
They are now lexed as a single token on targets where
MCAsmInfo::HasMipsExpressions is true and then parsed in a similar way to
the '~' operator as part of MCExpr::parseExpression.
As a result:
* expressions and immediates no longer have different parsing rules. The
difference is now solely down to whether evaluateAsAbsolute() succeeds.
* %hi(%neg(%gp_rel(x))) are no longer parsed as a single operator and
decomposed into the three MipsMCExpr nodes. They are parsed directly as
three MipsMCExpr nodes.
* parseMemOperand no longer needs to eat all the surrounding parenthesis
to get at the outermost operator to make this work
* %hi(%neg(%gp_rel(x))) and %lo(%neg(%gp_rel(x))) are no longer the only
3-in-1 relocs that parse for N64. They're still the only combinations that
are permitted in relocatable expressions though. Fixing that should be a
later patch.
* We no longer need to list all the tokens that can occur as the first token of
an expression or immediate.
test/MC/Mips/expr1.s:
This change also prevents the incorrect lowering of %lo(2*4)+foo to
%lo(8+foo) which is not an equivalent expression (the difference is
whether foo is truncated to 16-bit or not) and the test has been
updated to account for the macro expansion the correct expression requires.
Reviewers: sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: https://reviews.llvm.org/D23110
llvm-svn: 277988
Optimized lowering of BITCAST node. The BITCAST node can be replaced with COPY_TO_REG instead of KMOV.
It allows to suppress two opposite BITCAST operations and avoid redundant "movs".
Differential Revision: https://reviews.llvm.org/D23247
llvm-svn: 277958
Assuming SSE2 is available then we can safely commute between these, removing some unnecessary register moves and improving memory folding opportunities.
VEX encoded versions don't benefit so I haven't added support to them.
llvm-svn: 277930
Summary:
This is the setting of the Vulkan closed source driver.
It decreases the max wave count from 10 to 8.
26010 shaders in 14650 tests
Totals:
VGPRS: 829593 -> 808440 (-2.55 %)
Spilled SGPRs: 81878 -> 42226 (-48.43 %)
Spilled VGPRs: 367 -> 358 (-2.45 %)
Scratch VGPRs: 1764 -> 1748 (-0.91 %) dwords per thread
Code Size: 36677864 -> 35923932 (-2.06 %) bytes
There is a massive decrease in SGPR spilling in general and -7.4% spilled
VGPRs for DiRT Showdown (= SGPRs spilled to scratch?)
Reviewers: arsenm, tstellarAMD, nhaehnle
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23034
llvm-svn: 277867
Summary: Thumb2 supports encoding immediates with specific patterns into mov.w by splatting the low 8 bits into other bytes.
I'm resubmitting this patch. The test case in the original commit
r277610 does not specify triple, so builds with differnt default triple
will have different output.
This patch fixed trile as thumb-darwin-apple.
Reviewers: john.brawn, jmolloy, bruno
Subscribers: jmolloy, aemerson, rengolin, samparker, llvm-commits
Differential Revision: https://reviews.llvm.org/D23090
llvm-svn: 277865
There were two locations where fast-isel would generate a LFD instruction
with a target register class VSFRC instead of F8RC when VSX was enabled.
This can ccause invalid registers to be used in certain cases, like:
lfd 36, ...
instead of using a VSX load instruction. The wrong register number gets
silently truncated, causing invalid code to be generated.
The first place is PPCFastISel::PPCEmitLoad, which had multiple problems:
1.) The IsVSSRC and IsVSFRC flags are not initialized correctly, since they
are computed from resultReg, which is still zero at this point in many cases.
Fixed by changing the helper routines to operate on a register class instead
of a register and passing in UseRC.
2.) Even with this fixed, Is64VSXLoad is still wrong due to a typo:
bool Is32VSXLoad = IsVSSRC && Opc == PPC::LFS;
bool Is64VSXLoad = IsVSSRC && Opc == PPC::LFD;
The second line needs to use isVSFRC (like PPCEmitStore does).
3.) Once both the above are fixed, we're now generating a VSX instruction --
but an incorrect one, since generation of an indexed instruction with null
index is wrong. Fixed by copying the code handling the same issue in
PPCEmitStore.
The second place is PPCFastISel::PPCMaterializeFP, where we would emit an
LFD to load a constant from the literal pool, and use the wrong result
register class. Fixed by hardcoding a F8RC class even on systems
supporting VSX.
Fixes: https://llvm.org/bugs/show_bug.cgi?id=28630
Differential Revision: https://reviews.llvm.org/D22632
llvm-svn: 277823
Summary:
Add instruction formats E, RSI, SSd, SSE, and SSF.
Added BRXH, BRXLE, PR, MVCK, STRAG, and ECTG instructions to test out
those formats.
Reviewers: uweigand
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23179
llvm-svn: 277822
This patch fixes passing long double type arguments to function in
soft float mode. If there is less than 4 argument registers free
(long double type is mapped in 4 gpr registers in soft float mode)
long double type argument must be passed through stack.
Differential Revision: https://reviews.llvm.org/D20114.
llvm-svn: 277804
Shifts with a uniform but non-constant count were considered very expensive to
vectorize, because the splat of the uniform count and the shift would tend to
appear in different blocks. That made the splat invisible to ISel, and we'd
scalarize the shift at codegen time.
Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we
are able to select the appropriate vector shifts. This updates the cost model to
to take this into account by making shifts by a uniform cheap again.
Differential Revision: https://reviews.llvm.org/D23049
llvm-svn: 277782
These are the operations that are trivially identical. Division is omitted for
now because you need to use the correct sign/zero extension.
llvm-svn: 277775
Adding missing tests for OCL type names for half, float, double, char, short, long, and unknown.
Patch by Aaron En Ye Shi.
Differential Revision: https://reviews.llvm.org/D22964
llvm-svn: 277759
Previously, FastISel for WebAssembly wasn't checking the return value of
`getRegForValue` in certain cases, which would generate instructions
referencing NoReg. This patch fixes this behavior.
Patch by Dominic Chen
Differential Revision: https://reviews.llvm.org/D23100
llvm-svn: 277742
Summary:
TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses.
A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted.
Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures:
- x86 failing tests either did not have the right target, or the right alignment.
- NVPTX failing tests did not have the right alignment.
- AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info
considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks
for a maximum size allowed and relies on the next condition checking for %4 for correctness.
This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list).
Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure),
so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context.
Reviewers: arsenm, jlebar, tstellarAMD
Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D23068
llvm-svn: 277735
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.
The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.
Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.
Differential Revision: https://reviews.llvm.org/D21379
llvm-svn: 277725
Enable tail calls by default for (micro)MIPS(64).
microMIPS is slightly more tricky than doing it for MIPS(R6) or microMIPSR6.
microMIPS has two instruction encodings: 16bit and 32bit along with some
restrictions on the size of the instruction that can fill the delay slot.
For safe tail calls for microMIPS, the delay slot filler attempts to find
a correct size instruction for the delay slot of TAILCALL pseudos.
Reviewers: dsanders, vkalintris
Subscribers: jfb, dsanders, sdardis, llvm-commits
Differential Revision: https://reviews.llvm.org/D21138
llvm-svn: 277708
This should ensure that we can atomically write two bytes (on top of the
retq and the one past it) and have those two bytes not straddle cache
lines.
We also move the label past the alignment instruction so that we can refer
to the actual first instruction, as opposed to potential padding before the
aligned instruction.
Update the tests to allow us to reflect the new order of assembly.
Reviewers: rSerge, echristo, majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23101
llvm-svn: 277701
This patch fixes pr25548.
Current implementation of PPCBoolRetToInt doesn't handle CallInst correctly, so it failed to do the intended optimization when there is a CallInst with parameters. This patch fixed that.
llvm-svn: 277655
We currently only support combining target shuffles that consist of a single source input (plus elements known to be undef/zero).
This patch generalizes the recursive combining of the target shuffle to collect all the inputs, merging any duplicates along the way, into a full set of src ops and its shuffle mask.
We uncover a number of cases where we have failed to combine a unary shuffle because the input has been duplicated and separated during lowering.
This will allow us to combine to 2-input shuffles in a future patch.
Differential Revision: https://reviews.llvm.org/D22859
llvm-svn: 277631
Summary: Thumb2 supports encoding immediates with specific patterns into mov.w by splatting the low 8 bits into other bytes.
Reviewers: john.brawn, jmolloy
Subscribers: jmolloy, aemerson, rengolin, samparker, llvm-commits
Differential Revision: https://reviews.llvm.org/D23090
llvm-svn: 277610
When the same base address is used to load two different data types, LSR
would assume a memory type of "void". This type is not sized and has no
alignment information. Checking for it causes a crash.
llvm-svn: 277601
Summary:
We also add a test to show what currently happens when we create a
section per function and emit an xray_instr_map. This illustrates the
relationship (or lack thereof) between the per-function section and the
xray_instr_map section.
We also change the code generation slightly so that we don't always
create group sections, but rather only do so if a function where the
table is associated with is in a group.
Also in this change:
- Remove the "merge" flag on the xray_instr_map section.
- Test that we're generating the right table for comdat and non-comdat functions.
Reviewers: echristo, majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23104
llvm-svn: 277580
In this particular example we wouldn't want the smmls anyway (the value is
actually unused), but in general smmls does not provide the required flags
register so if that SUBE result is used we can't replace it.
llvm-svn: 277541
We were relying on the misleadingly-names $status result to actually be the
status. Actually it's just a scratch register that may or may not be valid (and
is the inverse of the real ststus anyway). Success can be determined by
comparing the value loaded against the one we wanted to see for "cmpxchg
strong" loops like this.
Should fix PR28819.
llvm-svn: 277513
Summary:
Two types of stores are possible in pixel shaders: stores to memory that are
explicitly requested at the API level, and stores that are an implementation
detail of register spilling or lowering of arrays.
For the first kind of store, we must ensure that helper pixels have no effect
and hence WQM must be disabled. The second kind of store must always be
executed, because the written value may be loaded again in a way that is
relevant for helper pixels as well -- and there are no externally visible
effects anyway.
This is a candidate for the 3.9 release branch.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D22675
llvm-svn: 277504
Summary:
There are cases where uniform branch conditions are computed in VGPRs, and
we didn't correctly mark those as WQM.
The stray change in basic-branch.ll is because invoking the LiveIntervals
analysis leads to the detection of a dead register that would otherwise not
be seen at -O0.
This is a candidate for the 3.9 branch, as it fixes a possible hang.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22673
llvm-svn: 277500
Identify patterns where the address is aligned to an 8-byte boundary,
but both the base address and the constant offset are both proper
multiples of 4. In such cases, extract Base+4 into a separate instruc-
tion, and use S2_storerd_io, instead of using S4_storerd_rr.
llvm-svn: 277497
Recommitting after fixing overaggressive fastpath return in parsing.
Fix intel syntax special case identifier operands that refer to a constant
(e.g. .set <ID> n) to be interpreted as immediate not memory in parsing.
Associated commit to fix clang test commited shortly.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22585
llvm-svn: 277489
We currently use and test these, and select most of them. Mark them
as legal even though we don't go through the full ir->asm flow yet.
This doesn't currently have standalone tests, but the verifier will
soon learn to check that the regbankselect/select tests are legal.
llvm-svn: 277471
Added (sra (shl x, 16), 16) to the sext_16_node PatLeaf for ARM to
simplify some pattern matching. This has allowed several patterns
for smul* and smla* to be removed as well as making it easier to add
the matching for the corresponding instructions for Thumb2 targets.
Also added two Pat classes that are predicated on Thumb2 with the
hasDSP flag and UseMulOps flags. Updated the smul codegen test with
the wider range of patterns plus the ThumbV6 and ThumbV6T2 targets.
Differential Revision: https://reviews.llvm.org/D22908
llvm-svn: 277450
These changes update the schedule model for the P5600 and includes the
rest of the MSA and MIPS32R5 instruction sets.
Reviewers: dsanders, vkalintris
Differential Revision: https://reviews.llvm.org/D21835
llvm-svn: 277441
Summary:
Commit 276701 requires that targets have the DSP extensions to use
certain saturating instructions. This requires some corrections.
For ARM ISA the instructions in question are available in all v6*
architectures.
For Thumb2, the instructions in question are available from v6T2.
SSAT and USAT are part of the base architecture while SSAT16 and
USAT16 require the DSP extensions.
Reviewers: rengolin
Subscribers: aemerson, rengolin, samparker, llvm-commits
Differential Revision: https://reviews.llvm.org/D23010
llvm-svn: 277439
Summary: This patch implements CFI for WebAssembly. It modifies the
LowerTypeTest pass to pre-assign table indexes to functions that are
called indirectly, and lowers type checks to test against the
appropriate table indexes. It also modifies the WebAssembly backend to
support a special ".indidx" assembly directive that propagates the table
index assignments out to the linker.
Patch by Dominic Chen
Differential Revision: https://reviews.llvm.org/D21768
llvm-svn: 277398
Summary: This patch includes asm.js-style exception handling support for
WebAssembly. The WebAssembly MVP does not have any support for
unwinding or non-local control flow. In order to support C++ exceptions,
emscripten currently uses JavaScript exceptions along with some support
code (written in JavaScript) that is bundled by emscripten with the
generated code.
This scheme lowers exception-related instructions for wasm such that
wasm modules can be compatible with emscripten's existing scheme and
share the support code.
Patch by Heejin Ahn
Differential Revision: https://reviews.llvm.org/D22958
llvm-svn: 277391
Scavenging slots were only reserved when pseudo-instruction expansion in
frame lowering created new virtual registers. It is possible to still
need a scavenging slot even if no virtual registers were created, in cases
where the stack is large enough to overflow instruction offsets.
llvm-svn: 277355
Summary:
Allocating an AFGR64 shadows two GPR32's instead of just one.
This fixes an LNT regression detected by our internal buildbots.
Reviewers: sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: https://reviews.llvm.org/D23012
llvm-svn: 277348
The branch relaxation pass is computing the wrong offsets because it assumes
TLSDESC_CALLSEQ eats up 4 bytes, when in fact it is lowered to an instruction
sequence taking up 16 bytes. This can become a problem in huge files with lots
of TLS accesses, as it may slowly move branch targets out of the range computed
by the branch relaxation pass.
Fixes PR24234 https://llvm.org/bugs/show_bug.cgi?id=24234
Differential Revision: https://reviews.llvm.org/D22870
llvm-svn: 277331
Initialize all AArch64-specific passes in the TargetMachine so they can be run
by llc. This can lead to conflicts in opt with some command line options that
share the same name as the pass, so I took this opportunity to do some cleanups:
* rename all relevant command line options from "aarch64-blah" to
"aarch64-enable-blah" and update the tests accordingly
* run clang-format on their declarations
* move all these declarations to a common place (the TargetMachine) as opposed
to having them scattered around (AArch64BranchRelaxation and
AArch64AddressTypePromotion were the only offenders)
llvm-svn: 277322
As discussed on PR14593, this patch adds support for lowering to SHLD/SHRD from the patterns generated by DAGTypeLegalizer::ExpandShiftWithKnownAmountBit.
Differential Revision: https://reviews.llvm.org/D23000
llvm-svn: 277299
Removed AssertZext node, which was inserted between X86ISD::SETCC and "truncate to i1".
Differential Revision: https://reviews.llvm.org/D22850
llvm-svn: 277289
These come in two variants for now: G_INTRINSIC and G_INTRINSIC_W_SIDE_EFFECTS.
We may decide to split the latter up with finer-grained restrictions later, if
necessary.
llvm-svn: 277224
Up until now, we only had code to match PSADBW patterns that look like what
comes out of the loop vectorizer - a partial reduction inside the loop body
that gets fed into a horizontal operation in a different basic block.
This adds support for straight-line patterns, like those generated by the
SLP vectorizer.
Differential Revision: https://reviews.llvm.org/D22889
llvm-svn: 277219
Support for lowering to VBROADCASTF128 etc. in D22460 was not correctly ensuring that the only users of the 128-bit vector load were the insertions of the vector into the lower/upper subvectors.
llvm-svn: 277214
The DAG combiner tries to merge stores to adjacent vector wide memory
locations by creating stores which are integral multiples of the vector
width. Discourage this by informing it that this is slow. This should
not affect legalization passes, because all of them ignore the "Fast"
argument.
Patch by Pranav Bhandarkar.
llvm-svn: 277178
For MachineInstrBuilder, having to manually use RegState::Define is ugly and
makes register definitions clunkier than they need to be, so this adds two
convenience functions: addDef and addUse.
For MachineIRBuilder, we want to avoid BuildMI's first-reg-is-def rule because
it's hidden away and causes bugs. So this patch switches buildInstr to
returning a MachineInstrBuilder and adding *all* operands via addDef/addUse.
NFC.
llvm-svn: 277176
Mostly straightforward as we ignore addressing modes and just
use the base + unsigned immediate offset (always 0) variants.
This currently fails to select extloads because we have yet to
agree on a representation.
llvm-svn: 277171
Software pipelining is an optimization for improving ILP by
overlapping loop iterations. Swing Modulo Scheduling (SMS) is
an implementation of software pipelining that attempts to
reduce register pressure and generate efficient pipelines with
a low compile-time cost.
This implementaion of SMS is a target-independent back-end pass.
When enabled, the pass should run just prior to the register
allocation pass, while the machine IR is in SSA form. If the pass
is successful, then the original loop is replaced by the optimized
loop. The optimized loop contains one or more prolog blocks, the
pipelined kernel, and one or more epilog blocks.
This pass is enabled for Hexagon only. To enable for other targets,
a couple of target specific hooks must be implemented, and the
pass needs to be called from the target's TargetMachine
implementation.
Differential Review: http://reviews.llvm.org/D16829
llvm-svn: 277169
If the mask of a vector shuffle has alternating odd or even numbers
starting with 1 or 0 respectively up to the largest possible index
for the given type in the given HVX mode (single of double) we can
generate vpacko or vpacke instruction respectively.
E.g.
%42 = shufflevector <32 x i16> %37, <32 x i16> %41,
<32 x i32> <i32 1, i32 3, ..., i32 63>
is %42.h = vpacko(%41.w, %37.w)
Patch by Pranav Bhandarkar.
llvm-svn: 277168
Rebalances address calculation trees and applies Hexagon-specific
optimizations to the trees to improve instruction selection.
Patch by Tobias Edler von Koch.
llvm-svn: 277151
The post register allocator scheduler can generate poor schedules
because the scoreboard hazard recognizer is unable to identify
hazards for Hexagon precisely. Instead, Hexagon should use a DFA
based hazard recognizer.
Patch by Brendon Cahoon.
llvm-svn: 277143
Summary:
Implements fastLowerArguments() to avoid the need to fall back on
SelectionDAG for 0-4 argument functions that don't do tricky things like
passing double in a pair of i32's.
This allows us to move all except one test to -fast-isel-abort=3. The
remaining one has function prototypes of the form 'i32 (i32, double, double)'
which requires floats to be passed in GPR's.
The previous commit had an uninitialized variable that caused the incoming
argument region to have undefined size. This has been fixed.
Reviewers: sdardis
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: https://reviews.llvm.org/D22680
llvm-svn: 277136
We currently default to using either generic shuffles or MASK+PACKUS/PACKSS to truncate all integer vectors. For vector comparisons, we know that the result will be either all or zero bits in every element, which can be efficiently truncated by directly using PACKSS to repeatedly halve the size of each element.
Due to the limited input values (-1 or 0) we don't need to account for vector element size, so for simplicity we just use the PACKSS(vXi16,vXi16) implementation in all cases. Additionally for AVX2 PACKSS of 256bit data we must perform a PERMQ shuffle to reorder the data into the correct order. I did investigate performing a single shuffle after all the PACKSS calls but the need to cross 128bit lanes makes this difficult to achieve efficiently.
We avoid performing this on AVX512 as it should have better alternative truncation instructions.
Differential Revision: https://reviews.llvm.org/D22814
llvm-svn: 277132
Summary:
The MOV/MOVT instructions being chosen for struct_byval predicates was
conditional only on Thumb2, resulting in an ARM MOV/MOVT instruction
being incorrectly emitted in Thumb1 mode. This is especially apparent
with v8-m.base targets. This patch ensures that Thumb instructions are
emitted in both Thumb modes.
Reviewers: rengolin, t.p.northover
Subscribers: llvm-commits, aemerson, rengolin
Differential Revision: https://reviews.llvm.org/D22865
llvm-svn: 277128
This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of
subclasses already implement.
Differential Revision: https://reviews.llvm.org/D22885
llvm-svn: 277126
A ConstantVector can have ConstantExpr operands and vice versa.
However, the folder had no ability to fold ConstantVectors which, in
some cases, was an optimization barrier.
Instead, rephrase the folder in terms of Constants instead of
ConstantExprs and teach callers how to deal with failure.
llvm-svn: 277099
I'm not convinced the patterns for the rm_Int was correct anyway. It had a tied source that should't exist for the unmasked version. The load form of MOVSS always zeros the most significant bits. I've left the patterns off the masked load instructions as I'm not sure what the correct pattern should be and we don't have any tests currently. Nor do we implement masked scalar load intrinsics in clang currently.
llvm-svn: 277098
Normally, CFI instructions should be inserted after allocframe, but
if allocframe is in the same packet with a call, the CFI instructions
should be inserted before that packet.
llvm-svn: 277020
LLT() has a particular meaning: it's one invalid type. But we really
want selected instructions to have no type whatsoever.
Also verify that types don't linger after ISel, and enable the verifier
on the AArch64 select test.
llvm-svn: 277001
Summary:
Implements fastLowerArguments() to avoid the need to fall back on
SelectionDAG for 0-4 argument functions that don't do tricky things like
passing double in a pair of i32's.
This allows us to move all except one test to -fast-isel-abort=3. The
remaining one has function prototypes of the form 'i32 (i32, double, double)'
which requires floats to be passed in GPR's.
Reviewers: sdardis
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: https://reviews.llvm.org/D22680
llvm-svn: 276982
Summary:
We were using reserved VGPRs for SGPR spilling and this was causing
some programs with a workgroup size of 1024 to use more than 64
registers, which is illegal.
Reviewers: arsenm, mareko, nhaehnle
Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22032
llvm-svn: 276980
Summary:
SI_ELSE is lowered into two parts:
s_or_saveexec_b64 dst, src (at the start of the basic block)
s_xor_b64 exec, exec, dst (at the end of the basic block)
The idea is that dst contains the exec mask of the preceding IF block. It can
happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside
the basic block that contains SI_ELSE, in which case it introduces an instruction
s_and_b64 exec, exec, s[...]
which masks out bits that can correspond to both the IF and the ELSE paths.
So the resulting sequence must be:
s_or_savexec_b64 dst, src
s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode
s_and_b64 dst, dst, exec <-- added by SILowerControlFlow
s_xor_b64 exec, exec, dst
Whether to add the additional s_and_b64 dst, dst, exec is currently determined
via the ExecModified tracking. With this change, it is instead determined by
an additional flag on SI_ELSE which is set by SIWholeQuadMode.
Finally: It also occured to me that an alternative approach for the long run
is for SILowerControlFlow to unconditionally emit
s_or_saveexec_b64 dst, src
...
s_and_b64 dst, dst, exec
s_xor_b64 exec, exec, dst
and have a pass that detects and cleans up the "redundant AND with exec"
pattern where possible. This could be useful anyway, because we also add
instructions
s_and_b64 vcc, exec, vcc
before s_cbranch_scc (in moveToALU), and those are often redundant. I have
some pending changes to how KILL is lowered that could also benefit from
such a cleanup pass.
In any case, this current patch could help in the short term with the whole
ExecModified business.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22846
llvm-svn: 276972
Add unittest to {ARM | AArch64}TargetParser,and by the way correct problems as below:
1.Correct a incorrect indexing problem in AArch64TargetParser. The architecture enumeration
is shared across ARM and AArch64 in original implementation.But In the code,I just used the
index which was offset by the ARM, and this would index into the array incorrectly. To make
AArch64 has its own arch enum,or we will do a lot of slowly iterating.
2.Correct a spelling error. The parameter of llvm::AArch64::getArchExtName.
3.Correct a writing mistake, in llvm::ARM::parseArchISA.
Differential Revision: https://reviews.llvm.org/D21785
llvm-svn: 276957
Before adding a new preheader block, check if there is a candidate block
where the loop setup could be placed speculatively. This will be off by
default.
llvm-svn: 276919
Fix intel syntax special case identifier operands that refer to a constant
(e.g. .set <ID> n) to be interpreted as immediate not memory in parsing.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22585
llvm-svn: 276895
The callee-saved registers that are saved in a function are not pristine,
and so they can be defined and used. In case of shrink-wrapping though,
there are blocks that are outside of the save/restore range, and in those
blocks the saved registers must be treated as pristine. To avoid any uses
of these registers, add them as live-in in all those blocks.
This was already done for blocks reaching function exits after restore,
add code that does the same for blocks reached from the function entry
before save.
llvm-svn: 276886
TargetOptions wants the ExceptionHandling enum. Move that to
MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in
clang. Now you can add a DWARF tag without a full rebuild of clang
semantic analysis.
llvm-svn: 276883
Summary:
This is one possible solution to the problem of ignoring constraints that Simon
raised in D21473 but it's a bit of a hack.
The integrated assembler currently ignores violations of the tied register
constraints when the operands involved in a tie are both present in the AsmText.
For example, 'dati $rs, $rt, $imm' with the '$rs = $rt' will silently replace
$rt with $rs. So 'dati $2, $3, 1' is processed as if the user provided
'dati $2, $2, 1' without any diagnostic being emitted.
This is difficult to solve properly because there are multiple parts of the
matcher that are silently forcing these constraints to be met. Tied operands are
rendered to instructions by cloning previously rendered operands but this is
unnecessary because the matcher was already instructed to render the operand it
would have cloned. This is also unnecessary because earlier code has already
replaced the MCParsedOperand with the one it was tied to (so the parsed input
is matched as if it were 'dati <RegIdx 2>, <RegIdx 2>, <Imm 1>'). As a result,
it looks like fixing this properly amounts to a rewrite of the tied operand
handling which affects all targets.
This patch however, merely inserts a checking hook just before the
substitution of MCParsedOperands and the Mips target overrides it. It's not
possible to accurately check the registers are the same this early (because
numeric registers haven't been bound to a register class yet) so it cheats a
bit and checks that the tokens that produced the operand are lexically
identical. This works because tied registers need to have the same register
class but it does have a flaw. It will reject 'dati $4, $a0, 1' for violating
the constraint even though $a0 ends up as the same register as $4.
Reviewers: sdardis
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: https://reviews.llvm.org/D21994
llvm-svn: 276867
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr* in the PowerPC backend, mainly by preferring MachineInstr&
over MachineInstr* when a pointer isn't nullable and using range-based
for loops.
There was one piece of questionable code in PPCInstrInfo::AnalyzeBranch,
where a condition checked a pointer converted from an iterator for
nullptr. Since this case is impossible (moreover, the code above
guarantees that the iterator is valid), I removed the check when I
changed the pointer to a reference.
Despite that case, there should be no functionality change here.
llvm-svn: 276864
Currently, for ARMCOFFMCAsmInfoMicrosoft, no comment character is set, thus the
idefault, '#', is used.
The hash character doesn't work as comment character in ARM assembly, since '#'
is used for immediate values.
The comment character is set to ';', which is the comment character used by MS
armasm.exe. (The microsoft armasm.exe uses a different directive syntax than
what LLVM currently supports though, similar to ARM's armasm.)
This allows inline assembly with immediate constants to be built (and brings the
assembly output from clang -S closer to being possible to assemble).
A test is added that verifies that ';' is correctly interpreted as comments in
this mode, and verifies that assembling code that includes literal constants
with a '#' works.
Patch by Martin Storsjö.
llvm-svn: 276859
Consider this case:
vreg1 = A2_zxth vreg0 (1)
...
vreg2 = A2_zxth vreg1 (2)
Redundant instruction elimination could delete the instruction (1)
because the user (2) only cares about the low 16 bits. Then it could
delete (2) because the input is already zero-extended. The problem
is that the properties allowing each individual instruction to be
deleted depend on the existence of the other instruction, so either
one can be deleted, but not both.
The existing check for this situation in RIE was insufficient. The
fix is to update all dependent cells when an instruction is removed
(replaced via COPY) in RIE.
llvm-svn: 276792
ABIArgOffset is a problem because properly fsetting the
KernArgSize requires that the reserved area before the
real kernel arguments be correctly aligned, which requires
fixing clover.
llvm-svn: 276766
When the packetizer wants to put a store to a stack slot in the same
packet with an allocframe, it updates the store offset to reflect the
value of SP before it is updated by allocframe. If the store cannot
be packetized with the allocframe after all, the offset needs to be
updated back to the previous value.
llvm-svn: 276749
- More informative message when extension name is not an identifier token.
- Stop parsing directive if extension is unknown (avoid duplicate error
messages).
- Report unsupported extensions with a source location, rather than
report_fatal_error.
Differential Revision: https://reviews.llvm.org/D22806
llvm-svn: 276748
This option, compatible with gas's -mimplicit-it, controls the
generation/checking of implicit IT blocks in ARM/Thumb assembly.
This option allows two behaviours that were not possible before:
- When in ARM mode, emit a warning when assembling a conditional
instruction that is not in an IT block. This is enabled with
-mimplicit-it=never and -mimplicit-it=thumb.
- When in Thumb mode, automatically generate IT instructions when an
instruction with a condition code appears outside of an IT block. This
is enabled with -mimplicit-it=thumb and -mimplicit-it=always.
The default option is -mimplicit-it=arm, which matches the existing
behaviour (allow conditional ARM instructions outside IT blocks without
warning, and error if a conditional Thumb instruction is outside an IT
block).
The general strategy for generating IT blocks in Thumb mode is to keep a
small list of instructions which should be in the IT block, and only
emit them when we encounter something in the input which means we cannot
continue the block. This could be caused by:
- A non-predicable instruction
- An instruction with a condition not compatible with the IT block
- The IT block already contains 4 instructions
- A branch-like instruction (including ALU instructions with the PC as
the destination), which cannot appear in the middle of an IT block
- A label (branching into an IT block is not legal)
- A change of section, architecture, ISA, etc
- The end of the assembly file.
Some of these, such as change of section and end of file, are parsed
outside of the ARM asm parser, so I've added a new virtual function to
AsmParser to ensure any previously-parsed instructions have been
emitted. The ARM implementation of this flushes the currently pending IT
block.
We now have to try instruction matching up to 3 times, because we cannot
know if the current IT block is valid before matching, and instruction
matching changes depending on the IT block state (due to the 16-bit ALU
instructions, which set the flags iff not in an IT block). In the common
case of not having an open implicit IT block and the instruction being
matched not needing one, we still only have to run the matcher once.
I've removed the ITState.FirstCond variable, because it does not store
any information that isn't already represented by CurPosition. I've also
updated the comment on CurPosition to accurately describe it's meaning
(which this patch doesn't change).
Differential Revision: https://reviews.llvm.org/D22760
llvm-svn: 276747
Fixed typo in the intrinsic definitions of (v)cvtsd2ss with memory folding.
This was only unearthed when rL276102 started using the intrinsic again.....
llvm-svn: 276740
MIPS64R6 compact branch support. As the MIPS LLVM backend uses distinct
MachineInstrs for certain 32 and 64 bit instructions (e.g. BEQ & BEQ64) that
map to the same instruction, extend compact branch support for the
corresponding 64bit branches.
Reviewers: dsanders
Differential Revision: https://reviews.llvm.org/D20164
llvm-svn: 276739
Add the instruction alias sgtu (register form only), two operand forms of
s[rl]l and sra, and missing single/two operand forms of dnegu/neg.
Reviewers: dsanders
Differential Revision: https://reviews.llvm.org/D22752
llvm-svn: 276736
The saturation instructions appeared in v6T2, with DSP extensions, but they
were being accepted / generated on any, with the new introduction of the
saturation detection in the back-end. This commit restricts the usage to
DSP-enable only cores.
Fixes PR28607.
llvm-svn: 276701
They're basically i64 for AArch64, but we'll leave them intact for stranger
targets. Also add some tests for the (very few) other cases we can handle right
now.
llvm-svn: 276689
Some targets, notably AArch64 for ILP32, have different relocation encodings
based upon the ABI. This is an enabling change, so a future patch can use the
ABIName from MCTargetOptions to chose which relocations to use. Tested using
check-llvm.
The corresponding change to clang is in: http://reviews.llvm.org/D16538
Patch by: Joel Jones
Differential Revision: https://reviews.llvm.org/D16213
llvm-svn: 276654
Avoid MipsAnalyzeImmediate usage if the constant fits in an 32-bit
integer. This allows us to generate the same instructions for the
materialization of the same constants regardless the width of their
type.
Patch by: Vasileios Kalintiris
Contributions by: Simon Dardis
Reviewers: Daniel Sanders
Differential Review: https://reviews.llvm.org/D21689
llvm-svn: 276628
Follow up to r276624. Changes bits 22-20 to be parameters to
instruction class.
Differential Revision: https://reviews.llvm.org/D22562
llvm-svn: 276626
This places the 132/213/231 form number in front of the SS/SD/PS/PD. Move the Y for 256-bit versions to be after the PS/PD. Change the AVX512 scalar forms to include a Z in the their name. This new format should be consistent with the general naming of instructions.
llvm-svn: 276559
This reverts commit r276298.
Data stored in .rodata can have a negative offset from .text, but we
don't support negative values in relocations yet.
This caused a regression in one of the amp conformance tests:
5_Data_Cont/5_2_a_v/5_2_3_m/Assignment/Test.02.01
llvm-svn: 276498
This adds the actual MachineLegalizeHelper to do the work and a trivial pass
wrapper that legalizes all instructions in a MachineFunction. Currently the
only transformation supported is splitting up a vector G_ADD into one acting on
smaller vectors.
llvm-svn: 276461
The size can exceed s_movk_i32's limit, and we don't
want to use it this early since it inhibits optimizations.
This should probably be merged to the release branch.
llvm-svn: 276438
- FuncNode::findBlock traverses the function every time. Avoid using it,
and keep a cache of block addresses in DataFlowGraph instead.
- The operator[] in the map of definition stacks was very slow. Replace
the map with unordered_map.
llvm-svn: 276429
As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector.
This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match.
We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts).
Reapplied with fix for PR28657 - removed intrinsic definitions (clang companion patch to be be submitted shortly).
Differential Revision: https://reviews.llvm.org/D22460
llvm-svn: 276416
functions so that the size computation is available not only in ConstantIslands
but in other passes as well.
Differential Revision: https://reviews.llvm.org/D22640
llvm-svn: 276399
This variant is (as documented in the TD) for disassembler use only, and should
not be used in patterns - it is longer, and is broken on 64-bit.
llvm-svn: 276347
when constraint "w" is used on a 32-bit operand.
This enables compiling the following code, which used to error out in
the backend:
void foo1(int a) {
asm volatile ("sqxtn h0, %s0\n" : : "w"(a):);
}
Fixes PR28633.
llvm-svn: 276344
Summary:
This change also changes findMatchingInsn and
findMatchingUpdateInsnForward to take DBG_VALUE opcodes into account
when tracking register defs and uses, which could potentially inhibit
these optimizations in the presence of debug information.
Reviewers: mcrosier
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22582
llvm-svn: 276293
Under normal circumstances we prefer the higher performance MOVD to extract the 0'th element of a v8i16 vector instead of PEXTRW.
But as detailed on PR27265, this prevents the SSE41 implementation of PEXTRW from folding the store of the 0'th element. Additionally it prevents us from making use of the fact that the (SSE2) reg-reg version of PEXTRW implicitly zero-extends the i16 element to the i32/i64 destination register.
This patch only preferentially lowers to MOVD if we will not be zero-extending the extracted i16, nor prevent a store from being folded (on SSSE41).
Fix for PR27265.
Differential Revision: https://reviews.llvm.org/D22509
llvm-svn: 276289
As requested on D22509, I've pulled out the v8i16 extraction lowering as the SSE41 and pre-SSE41 implementations are effectively the same.
llvm-svn: 276285
As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector.
This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match.
We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts).
Differential Revision: https://reviews.llvm.org/D22460
llvm-svn: 276281
classifyLEAReg() deals with switching operands from 32bit to 64bit in
order to use a LEA64_32 instruction (for three address code goodness).
It currently performs a liveness analysis to determine the kill/undef
flag for the newly added operand. This should not be necessary:
- If the previous operand had a kill flag, then the 32bit part of the
register gets killed, this will kill the super register as well.
- If the previous operand had an undef flag then we didn't care what
value we read, just use the same flag on the new operand.
(No matter what an operand with an undef flag won't affect liveness)
This makes the code independent of the presence of kill flags because it
avoids a call to MachineBasicBlock::computeRegisterLiveness().
Differential Revision: http://reviews.llvm.org/D22283
llvm-svn: 276222
At -O0, cmpxchg survives AtomicExpand: it's mostly straightforward
to select it in fast-isel, and let the pseudo be expanded later.
extractvalues on the result are the tricky part: the generic logic
only works for legal types (and it would be painful to make it
support illegal types), so we can only support i32/i64 cmpxchg.
llvm-svn: 276183
This should be all the low-level instruction selection needs to determine how
to implement an operation, with the remaining context taken from the opcode
(e.g. G_ADD vs G_FADD) or other flags not based on type (e.g. fast-math).
llvm-svn: 276158
Avoid unnecessary spills of byval arguments of device functions to
local space on SASS level and subsequent pointer conversion to generic
address space that follows. Instead, make a local copy in IR, provide
a way to access arguments directly, and let LLVM optimize the copy away
when possible.
Differential Review: https://reviews.llvm.org/D21421
llvm-svn: 276153
This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better.
Differential Revision: https://reviews.llvm.org/D22456
llvm-svn: 276104
Retry r275776 (no changes, we suspect the issue was with another commit).
The current logic for handling inline asm operands in DAGToDAGISel interprets
the operands by looking for constants, which should represent the flags
describing the kind of operand we're dealing with (immediate, memory, register
def etc). The operands representing actual data are skipped only if they are
non-const, with the exception of immediate operands which are skipped explicitly
when a flag describing an immediate is found.
The oversight is that memory operands may be const too (e.g. for device drivers
reading a fixed address), so we should explicitly skip the operand following a
flag describing a memory operand. If we don't, we risk interpreting that
constant as a flag, which is definitely not intended.
Fixes PR26038
Differential Revision: https://reviews.llvm.org/D22103
llvm-svn: 276101
Inference of the 'returned' attribute was fixed in r276008, lets try
turning the backend support back on.
This reverts commit r275677.
llvm-svn: 276081
If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.
Simplify the lowering tests by using per function
subtarget features.
llvm-svn: 276051
There's not much functional change, but it really is an architectural feature
(on v6T2, v7A, v7R and v7EM) rather than something each CPU implements
individually.
The main functional change is the default behaviour you get when specifying
only "-triple".
llvm-svn: 276013
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.
These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.
llvm-svn: 275988
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.
It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).
This patch changes both scalar and packed versions back to using x86-specific builtins.
It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.
A companion clang patch is at D22105
Differential Revision: https://reviews.llvm.org/D22106
llvm-svn: 275981
Recommitting after r274347 was reverted. This patch introduces some
classes to refactor the 3 and 4 register Thumb2 multiplication
instruction descriptions, plus improved tests for some of those
instructions.
Differential Revision: https://reviews.llvm.org/D21929
llvm-svn: 275979
The standard local dynamic model for TLS on ARM systems needs two
relocations:
- R_ARM_TLS_LDM32 (module idx)
- R_ARM_TLS_LDO32 (offset of object from origin of module TLS block)
In GNU style assembler we use symbol(tlsldm) and symbol(tlsldo) to
produce these relocations.
llvm-mc for ARM supports symbol(tlsldo) but does not support symbol(tlsldm).
This patch wires up the existing symbol(tlsldm) to R_ARM_TLS_LDM32.
TLS for ARM is defined in Addenda to, and Errata in, the ABI for the
ARM Architecture
Differential Revision: https://reviews.llvm.org/D22461
llvm-svn: 275977
Summary:
N32 and N64 follow the standard ELF conventions (.L) whereas O32 uses its own
($).
This fixes the majority of object differences between -fintegrated-as and
-fno-integrated-as.
Reviewers: sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: https://reviews.llvm.org/D22412
llvm-svn: 275967
The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets.
But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1).
I added the new sequence (truncate (a >>n) to i1) to the BT pattern.
Differential Revision: https://reviews.llvm.org/D22354
llvm-svn: 275950
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.
The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.
v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.
llvm-svn: 275934
Taking address of a byval variable in PTX is legal, but currently runs
into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042).
Work around the issue by enforcing minimum alignment on byval arguments
of device functions.
The change is a no-op on SASS level for sm_3x where ptxas already aligns
local copy by at least 4.
Differential Revision: https://reviews.llvm.org/D22428
llvm-svn: 275893
This is currently only called with GEP users. A direct
alloca would only happen with current typed pointers
for arrays which are a perverse case.
Also fix crashes on 0 x and 1 x arrays.
llvm-svn: 275869
Schedule a load and its use in the same packet in MISched. Previously,
isResourceAvailable was returning false for dependences in the same
packet, which prevented MISched from packetizing a load and its use in
the same packet for v60.
Patch by Ikhlas Ajbar.
llvm-svn: 275804
This patch corresponds to review:
https://reviews.llvm.org/D21354
We use direct moves for extracting integer elements from vectors. We also use
direct moves when converting integers to FP. When these operations are chained,
we get a direct move out of a VSR followed by a direct move back into a VSR.
These are redundant - all we need to do is line up the element and convert.
llvm-svn: 275796
The machine scheduler needs to account for available resources
more accurately in order to avoid scheduling an instruction that
forces a new packet to be created.
This occurs in two ways: First, an instruction without an available
resource may have a large priority due to other metrics and be
scheduled when there are other instructions with available resources.
Second, an instruction with a non-zero latency may become available
prematurely. In both these cases, we attempt change the priority
in order to allow a better instruction to be scheduled.
Patch by Brendon Cahoon.
llvm-svn: 275793
An instruction may have multiple predecessors that are candidates
for using .cur. However, only one of them can use .cur in the
packet. When this case occurs, we need to make sure that only
one of the dependences gets a 0 latency value.
Patch by Brendon Cahoon.
llvm-svn: 275790
When SelectionDAGISel transforms a node representing an inline asm
block, memory constraint information is not preserved. This can cause
constraints to be broken when a memory offset is of the form:
offset + frame index
when the frame is resolved.
By propagating the constraints all the way to the backend, targets can
enforce memory operands of inline assembly to conform to their constraints.
For MIPSR6, some instructions had their offsets reduced to 9 bits from
16 bits such as ll/sc. This becomes problematic when using inline assembly
to perform atomic operations, as an offset can generated that is too big to
encode in the instruction.
Reviewers: dsanders, vkalintris
Differential Review: https://reviews.llvm.org/D21615
llvm-svn: 275786
Summary:
The work item intrinsics are not available for the shader
calling conventions. And even if we did hook them up most
shader stages haves some extra restrictions on the amount
of available LDS.
Reviewers: tstellarAMD, arsenm
Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D20728
llvm-svn: 275779
The current logic for handling inline asm operands in DAGToDAGISel interprets
the operands by looking for constants, which should represent the flags
describing the kind of operand we're dealing with (immediate, memory, register
def etc). The operands representing actual data are skipped only if they are
non-const, with the exception of immediate operands which are skipped explicitly
when a flag describing an immediate is found.
The oversight is that memory operands may be const too (e.g. for device drivers
reading a fixed address), so we should explicitly skip the operand following a
flag describing a memory operand. If we don't, we risk interpreting that
constant as a flag, which is definitely not intended.
Fixes PR26038
Differential Revision: https://reviews.llvm.org/D22103
llvm-svn: 275776