Summary:
The parser is exercised by llvm-objdump using -print-fault-maps. As is
probably obvious, the code itself was "heavily inspired" by
http://reviews.llvm.org/D10434.
Reviewers: reames, atrick, JosephTremoulet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10491
llvm-svn: 240304
Now that pr23900 is fixed, we can bring it back with no changes.
Original message:
Make all temporary symbols unnamed.
What this does is make all symbols that would otherwise start with a .L
(or L on MachO) unnamed.
Some of these symbols still show up in the symbol table, but we can just
make them unnamed.
In order to make sure we produce identical results when going thought assembly,
all .L (not just the compiler produced ones), are now unnamed.
Running llc on llvm-as.opt.bc, the peak memory usage goes from 208.24MB to
205.57MB.
llvm-svn: 240302
This commit implements initial machine instruction serialization. It
serializes machine instruction names. The instructions are represented
using a YAML sequence of string literals and are a part of machine
basic block YAML mapping.
This commit introduces a class called 'MIParser' which will be used to
parse the machine instructions and operands.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10481
llvm-svn: 240295
Summary: The code responsible for shl folding in the DAGCombiner was assuming incorrectly that all constants are less than 64 bits. This patch simply changes the way values are compared.
Test Plan: A regression test included.
Reviewers: andreadb
Reviewed By: andreadb
Subscribers: andreadb, test, llvm-commits
Differential Revision: http://reviews.llvm.org/D10602
llvm-svn: 240291
Summary: In this case, we're supposed to load the immediate in AT and then ADDu it with the source register and put it in the destination register.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9367
llvm-svn: 240278
Summary:
In this case, we're supposed to load the address of the symbol in AT and then ADDu it with the source register and
put it in the destination register.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9366
llvm-svn: 240273
This allows more call sequences to use pushes instead of movs when optimizing for size.
In particular, calling conventions that pass some parameters in registers (e.g. thiscall) are now supported.
Differential Revision: http://reviews.llvm.org/D10500
llvm-svn: 240257
If we don't know how to represent a .debug_loc entry, skip the entry
entirely rather than emitting an empty one. Similarly, if a .debug_loc
list has no entries, don't create the list.
We still want to create the variables, just in an optimized-out form
that doesn't have a DW_AT_location.
llvm-svn: 240244
Sparse switches with profile info are lowered as weight-balanced BSTs. For
example, if the node weights are {1,1,1,1,1,1000}, the right-most node would
end up in a tree by itself, bringing it closer to the top.
However, a leaf in this BST can contain up to 3 cases, and having a single
case in a leaf node as in the example means the tree might become
unnecessarily high.
This patch adds a heauristic to the pivot selection algorithm that moves more
cases into leaf nodes unless that would lower their rank. It still doesn't
yield the optimal tree in every case, but I believe it's conservatibely correct.
llvm-svn: 240224
Merged separate (but equivalent) SSE2/AVX512F tests.
Removed codegen tests since these are already done better in test/CodeGen/X86.
The actual cost values still need to be updated to match recent codegen improvements.
llvm-svn: 240219
Summary:
Since FunctionMap has llvm::Function pointers as keys, the order in
which the traversal happens can differ from run to run, causing spurious
FileCheck failures. Have CallGraph::print sort the CallGraphNodes by
name before printing them.
Reviewers: bogner, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10575
llvm-svn: 240191
This patch changes getRelocationAddend to use ErrorOr and considers it an error
to try to get the addend of a REL section.
If, for example, a x86_64 file has a REL section, that file is corrupted and
we should reject it.
Using ErrorOr is not ideal since we check the section type once per relocation
instead of once per section.
Checking once per section would involve getRelocationAddend just asserting and
callers checking the section before iterating over the relocations.
In any case, this is an improvement and includes a test.
llvm-svn: 240176
This commit implements the initial serialization of machine basic blocks in a
machine function. Only the simple, scalar MBB attributes are serialized. The
reference to LLVM IR's basic block is preserved when that basic block has a name.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10465
llvm-svn: 240145
What this does is make all symbols that would otherwise start with a .L
(or L on MachO) unnamed.
Some of these symbols still show up in the symbol table, but we can just
make them unnamed.
In order to make sure we produce identical results when going thought assembly,
all .L (not just the compiler produced ones), are now unnamed.
Running llc on llvm-as.opt.bc, the peak memory usage goes from 208.24MB to
205.57MB.
llvm-svn: 240130
Currently, we canonicalize shuffles that produce a result larger than
their operands with:
shuffle(concat(v1, undef), concat(v2, undef))
->
shuffle(concat(v1, v2), undef)
because we can access quad vectors (see PerformVECTOR_SHUFFLECombine).
This is useful in the general case, but there are special cases where
native shuffles produce larger results: the two-result ops.
We can look through the concat when lowering them:
shuffle(concat(v1, v2), undef)
->
concat(VZIP(v1, v2):0, :1)
This lets us generate the native shuffles instead of scalarizing to
dozens of VMOVs.
Differential Revision: http://reviews.llvm.org/D10424
llvm-svn: 240118
The test 'llvm/test/CodeGen/MIR/machine-function.mir' was disabled on
x86 msc18 in r239805 as it failed. My commit r240054 have fixed the
problem, so this commit reverts the commit that disabled the test as
it should pass now.
llvm-svn: 240074
In a relocation target can take 3 basic forms
* A r_value in scattered relocations.
* A symbol in external relocations.
* A section is non-external relocations.
Have the dump reflect that. With this change we go from
CHECK-NEXT: Extern: 0
CHECK-NEXT: Type: X86_64_RELOC_SUBTRACTOR (5)
CHECK-NEXT: Symbol: 0x2
CHECK-NEXT: Scattered: 0
To just
// CHECK-NEXT: Type: X86_64_RELOC_SUBTRACTOR (5)
// CHECK-NEXT: Section: __data (2)
Since the relocation is with a section, we print the seciton name and don't
need to say that it is not scattered or external.
Someone motivated can add further special cases for things like
ARM64_RELOC_ADDEND and ARM_RELOC_PAIR.
llvm-svn: 240073
To same compile time, the analysis to find dense case-clusters in switches is
not done at -O0. However, when the whole switch is dense enough, it is easy to
turn it into a jump table, resulting in much faster code with no extra effort.
llvm-svn: 240071
1. Used update_llc_test_checks.py to tighten checks
2. Fixed triple (nothing Darwin-specific here)
3. Replaced CPU specifiers with attributes
4. Fixed comments
5. Removed IvyBridge run because it did not add any coverage
llvm-svn: 240058
Summary:
Currently intrinsics don't affect the creation of the call graph.
This is not accurate with respect to statepoint and patchpoint
intrinsics -- these do call (or invoke) LLVM level functions.
This change fixes this inconsistency by adding a call to the external
node for call sites that call these non-leaf intrinsics. This coupled
with the fact that these intrinsics also escape the function pointer
they call gives us a conservatively correct call graph.
Reviewers: reames, chandlerc, atrick, pgavlin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10526
llvm-svn: 240039
They had been getting emitted as a section + offset reference, which
is bogus since the value needs to be the offset within the GOT, not
the actual address of the symbol's object.
Differential Revision: http://reviews.llvm.org/D10441
llvm-svn: 240020
- zext the value to alloc size first, then check if the value repeats
with zero padding included. If so we can still emit a .space
- Do the checking with APInt.isSplat(8), which handles non-pow2 types
- Also handle large constants (bit width > 64)
- In a ConstantArray all elements have the same type, so it's sufficient
to check the first constant recursively and then just compare if all
following constants are the same by pointer compare
llvm-svn: 239977
Added explicit sign extension for v4i16/v8i16 to v4i32/v8i32 before conversion to floats. Matches existing support for v4i8/v8i8.
Follow up to D10433
llvm-svn: 239966
Summary:
This is done by first adding two additional instructions to convert the
alloca returned address to local and convert it back to generic. Then
replace all uses of alloca instruction with the converted generic
address. Then we can rely NVPTXFavorNonGenericAddrSpace pass to combine
the generic addresscast and the corresponding Load, Store, Bitcast, GEP
Instruction together.
Patched by Xuetian Weng (xweng@google.com).
Test Plan: test/CodeGen/NVPTX/lower-alloca.ll
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: meheff, broune, eliben, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10483
llvm-svn: 239964
The personality routine currently lives in the LandingPadInst.
This isn't desirable because:
- All LandingPadInsts in the same function must have the same
personality routine. This means that each LandingPadInst beyond the
first has an operand which produces no additional information.
- There is ongoing work to introduce EH IR constructs other than
LandingPadInst. Moving the personality routine off of any one
particular Instruction and onto the parent function seems a lot better
than have N different places a personality function can sneak onto an
exceptional function.
Differential Revision: http://reviews.llvm.org/D10429
llvm-svn: 239940
It's been used before to avoid infinite loops caused by separate CGP
optimizations undoing one another. We found one more such issue
caused by r238054. To avoid it, generalize the "InsertedTruncs"
set to any inst, and use it to avoid touching those again.
llvm-svn: 239938
If globals can be unnamed, there is no reason for aliases to be different.
The restriction was there since the original implementation in r36435. I
can only guess it was there because of the old bison parser for the old
alias syntax.
llvm-svn: 239921
Summary:
This does not include support for the immediate variants of these pseudo-instructions.
Fixes llvm.org/PR20968.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: seanbruno, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D8537
llvm-svn: 239905
Summary:
Call MCSymbolRefExpr::create() with a MCSymbol* argument, not with a StringRef
of the Symbol's name, in order to avoid creating invalid temporary symbols for
relative labels (e.g. {$,.L}tmp00, {$,.L}tmp10 etc.).
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10498
llvm-svn: 239901
Summary:
Previously, MCSymbolRefExpr::create() was called with a StringRef of the symbol
name, which it would then search for in the Symbols StringMap (from MCContext).
However, relative labels (which are temporary symbols) are apparently not stored
in the Symbols StringMap, so we end up creating a new {$,.L}tmp symbol
({$,.L}tmp00, {$,.L}tmp10 etc.) each time we create an MCSymbolRefExpr by
passing in the symbol name as a StringRef.
Fortunately, there is a version of MCSymbolRefExpr::create() which takes an
MCSymbol* and we already have an MCSymbol* at that point, so we can just pass
that in instead of the StringRef.
I also removed the local StringRef calls to MCSymbolRefExpr::create() from
expandMemInst(), as those cases can be handled by evaluateRelocExpr() anyway.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9938
llvm-svn: 239897
Change builtin function name and signature ( add third parameter - rounding mode ).
Added tests for intrinsics.
Differential Revision: http://reviews.llvm.org/D10473
llvm-svn: 239888
The patch triggers a miscompile on SPEC 2006 403.gcc with the (ref)
200.i and scilab.i inputs. I opened PR23866 to track analysis of this.
This reverts commit r238793.
llvm-svn: 239880
This patch enables support for the conversion of v2i32 to v2f64 to use the CVTDQ2PD xmm instruction and stay on the SSE unit instead of scalarizing, sign extending to i64 and using CVTSI2SDQ scalar conversions.
Differential Revision: http://reviews.llvm.org/D10433
llvm-svn: 239855
The original change broke clang side tests. I will be submitting those momentarily. This change includes post commit feedback on the original change from from Pete Cooper.
Original Submission comments:
If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129.
Differential Revision: http://reviews.llvm.org/D9132
llvm-svn: 239849
Before this patch the bitcode reader would read a module from a file
that contained in order:
* Any number of non MODULE_BLOCK sub blocks.
* One MODULE_BLOCK
* Any number of non MODULE_BLOCK sub blocks.
* 4 '\n' characters to handle OS X's ranlib.
Since we support lazy reading of modules, any information that is relevant
for the module has to be in the MODULE_BLOCK or before it. We don't gain
anything from checking what is after.
This patch then changes the reader to stop once the MODULE_BLOCK has been
successfully parsed.
This avoids the ugly special case for .bc files in an archive and makes it
easier to embed bitcode files.
llvm-svn: 239845
Summary:
When propagating mass through irregular loops, the mass flowing through
each loop header may not be equal. This was causing wrong frequencies
to be computed for irregular loop headers.
Fixed by keeping track of masses flowing through each of the headers in
an irregular loop. To do this, we now keep track of per-header backedge
weights. After the loop mass is distributed through the loop, the
backedge weights are used to re-distribute the loop mass to the loop
headers.
Since each backedge will have a mass proportional to the different
branch weights, the loop headers will end up with a more approximate
weight distribution (as opposed to the current distribution that assumes
that every loop header is the same).
Reviewers: dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10348
llvm-svn: 239843
This commit reports an error when a machine function from a MIR file that contains
LLVM IR can't find a function with the same name in the loaded LLVM IR module.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10468
llvm-svn: 239831
The mftb instruction was incorrectly marked as deprecated in the PPC
Backend. Instead, it should not be treated as deprecated, but rather be
implemented using the mfspr instruction. A similar patch was put into GCC last
year. Details can be found at:
https://sourceware.org/ml/binutils/2014-11/msg00383.html.
This change will replace instances of the mftb instruction with the mfspr
instruction for all CPUs except 601 and pwr3. This will also be the default
behaviour.
Additional details can be found in:
https://llvm.org/bugs/show_bug.cgi?id=23680
Phabricator review: http://reviews.llvm.org/D10419
llvm-svn: 239827
Reapply r239539. Don't assume the collected number of
stores is the same vector size. Just take the first N
stores to fill the vector.
llvm-svn: 239825
Any combination of +-inf/+-inf is NaN so it's already ignored with
nnan and we can skip checking for ninf. Also rephrase logic in comments
a bit.
llvm-svn: 239821
Summary:
Relocs that can be converted from absolute to PC-relative now do so if IsPCRel
is true. Relocs that require PC-relative now call llvm_unreachable() if IsPCRel
is false and similarly those that require absolute assert that IsPCRel is false.
Note that while it looks like some relocs (e.g. R_MIPS_26) can be converted into
the MIPS32r6/MIPS64r6 relocs (R_MIPS_PC*_S2), it isn't actually valid to do so.
Placeholders have been left in the testcase for unsupported relocs and relocs
that cannot be generated at the moment.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits, rafael
Differential Revision: http://reviews.llvm.org/D10184
llvm-svn: 239817
Summary:
GetTarget() may modify TripleName without also updating TheTriple.
This can lead to situations where the MCObjectStreamer has a different triple
to the rest of LLVM.
This inconsistency caused sparc-little-endian.s to pass on Windows because most
of LLVM had sparcel-pc-win32 while MCObjectStreamer had "". I believe the same
kind of thing was also true of Darwin.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: llvm-commits, rengolin, rafael
Differential Revision: http://reviews.llvm.org/D10450
llvm-svn: 239808
When we multiply two 64-bit vectors, we extract lower and upper part and use the PMULUDQ instruction.
When one of the operands is a constant, the upper part may be zero, we know this at compile time.
Example: %a = mul <4 x i64> %b, <4 x i64> < i64 5, i64 5, i64 5, i64 5>.
I'm checking the value of the upper part and prevent redundant "multiply", "shift" and "add" operations.
llvm-svn: 239802
These are really immediate DUPs, and suffer from the same problem
with long instructions with a high/2 variant (e.g. smull).
By extending a MOVI (or DUP, before this patch), we can avoid an ext
on the other operand of the long instruction, e.g. turning:
ext.16b v0, v0, v0, #8
movi.4h v1, #0x53
smull.4s v0, v0, v1
into:
movi.8h v1, #0x53
smull2.4s v0, v0, v1
While there, add a now-necessary combine to fold (VT NVCAST (VT x)).
llvm-svn: 239799
If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129.
Differential Revision: http://reviews.llvm.org/D9132
llvm-svn: 239795
This commit serializes the simple, scalar attributes from the
'MachineFunction' class.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10449
llvm-svn: 239790
This commit creates a dummy LLVM IR function with one basic block and an unreachable
instruction for each parsed machine function when the MIR file doesn't have LLVM IR.
This change is required as the machine function analysis pass creates machine
functions only for the functions that are defined in the current LLVM module.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10135
llvm-svn: 239778
This commit reports an error when the MIR parser encounters a machine
function with the name that is the same as the name of a different
machine function.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10130
llvm-svn: 239774
constants in commented-out part of LLVMAttribute enum. Add tests that verify
that the safestack attribute is only allowed as a function attribute.
llvm-svn: 239772
This patch adds the safe stack instrumentation pass to LLVM, which separates
the program stack into a safe stack, which stores return addresses, register
spills, and local variables that are statically verified to be accessed
in a safe way, and the unsafe stack, which stores everything else. Such
separation makes it much harder for an attacker to corrupt objects on the
safe stack, including function pointers stored in spilled registers and
return addresses. You can find more information about the safe stack, as
well as other parts of or control-flow hijack protection technique in our
OSDI paper on code-pointer integrity (http://dslab.epfl.ch/pubs/cpi.pdf)
and our project website (http://levee.epfl.ch).
The overhead of our implementation of the safe stack is very close to zero
(0.01% on the Phoronix benchmarks). This is lower than the overhead of
stack cookies, which are supported by LLVM and are commonly used today,
yet the security guarantees of the safe stack are strictly stronger than
stack cookies. In some cases, the safe stack improves performance due to
better cache locality.
Our current implementation of the safe stack is stable and robust, we
used it to recompile multiple projects on Linux including Chromium, and
we also recompiled the entire FreeBSD user-space system and more than 100
packages. We ran unit tests on the FreeBSD system and many of the packages
and observed no errors caused by the safe stack. The safe stack is also fully
binary compatible with non-instrumented code and can be applied to parts of
a program selectively.
This patch is our implementation of the safe stack on top of LLVM. The
patches make the following changes:
- Add the safestack function attribute, similar to the ssp, sspstrong and
sspreq attributes.
- Add the SafeStack instrumentation pass that applies the safe stack to all
functions that have the safestack attribute. This pass moves all unsafe local
variables to the unsafe stack with a separate stack pointer, whereas all
safe variables remain on the regular stack that is managed by LLVM as usual.
- Invoke the pass as the last stage before code generation (at the same time
the existing cookie-based stack protector pass is invoked).
- Add unit tests for the safe stack.
Original patch by Volodymyr Kuznetsov and others at the Dependable Systems
Lab at EPFL; updates and upstreaming by myself.
Differential Revision: http://reviews.llvm.org/D6094
llvm-svn: 239761
This commit connects the machine function analysis pass (which creates machine
functions) to the MIR parser, which will initialize the machine functions
with the state from the MIR file and reconstruct the machine IR.
This commit introduces a new interface called 'MachineFunctionInitializer',
which can be used to provide custom initialization for the machine functions.
This commit also introduces a new diagnostic class called
'DiagnosticInfoMIRParser' which is used for MIR parsing errors.
This commit modifies the default diagnostic handling in LLVMContext - now the
the diagnostics are printed directly into llvm::errs() so that the MIR parsing
errors can be printed with colours.
Reviewers: Justin Bogner
Differential Revision: http://reviews.llvm.org/D9928
llvm-svn: 239753
LLVM targeting aarch64 doesn't correctly produce aligned accesses for non-aligned
data at -O0/fast-isel (-mno-unaligned-access).
The root cause seems to be in fast-isel not producing unaligned access correctly
for -mno-unaligned-access.
The patch just aborts fast-isel for loads and stores when -mno-unaligned-access is
present.
The regression test is updated to check this new test case (-mno-unaligned-access
together with fast-isel).
Differential Revision: http://reviews.llvm.org/D10360
llvm-svn: 239732
Summary:
ValueTracking used to overwrite the analysis results computed from
assumes and dominating conditions. This patch fixes this issue.
Test Plan: test/Analysis/ValueTracking/assume.ll
Reviewers: hfinkel, majnemer
Reviewed By: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10283
llvm-svn: 239718
Re-commit after adding "-aarch64-neon-syntax=generic" to fix the failure on OS X.
This patch was firstly committed in r239514, then reverted in r239544 because of a syntax incompatible failure on OS X.
llvm-svn: 239711
- Add glc, slc, and tfe operands to flat instructions
- Add missing flat instructions
- Fix the encoding of flat_load_dwordx3 and flat_store_dwordx3.
llvm-svn: 239637
ARMTargetParser::getFPUFeatures should disable fp16 whenever it
disables vfp4, as otherwise something like -mcpu=cortex-a7 -mfpu=none
leaves us with fp16 enabled (though the only effect that will have is
a wrong build attribute).
Differential Revision: http://reviews.llvm.org/D10397
llvm-svn: 239599
It is valid for globals to be unnamed, but aliases must have a name. To avoid
creating invalid IR, we need to assign names to any aliases we create that
point to unnamed objects that have been moved into combined globals.
llvm-svn: 239590
Summary:
A side effect of this change is that it IRBuilder now automatically
created debug info locations for new instructions, which is the
same as debug location of insertion point. This is fine for the
functions in questions (GetStoreValueForLoad and
GetMemInstValueForLoad), as they are used in two situations:
* GVN::processLoad, which tries to eliminate a load. In this case
new instructions would have the same debug location as the load they
eventually replace;
* MaterializeAdjustedValue, which adds new instructions to the end
of the basic blocks, which could later be used to replace the load
definition. In this case we don't yet know the way the load would
be eventually replaced (either by assembling the precomputed values
via PHI, or by using them directly), so just using the basic block
strategy seems to be reasonable. There is also a special case
in the code that *would* adjust the location of the last
instruction replacing the load definition to the location of the
load.
Test Plan: regression test suite
Reviewers: echristo, dberlin, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10405
llvm-svn: 239585
We were putting them in the filter field, which is correct for 64-bit
but wrong for 32-bit.
Also switch the order of scope table entry emission so outermost entries
are emitted first, and fix an obvious state assignment bug.
llvm-svn: 239574
This intrinsic is like framerecover plus a load. It recovers the EH
registration stack allocation from the parent frame and loads the
exception information field out of it, giving back a pointer to an
EXCEPTION_POINTERS struct. It's designed for clang to use in SEH filter
expressions instead of accessing the EXCEPTION_POINTERS parameter that
is available on x64.
This required a minor change to MC to allow defining a label variable to
another absolute framerecover label variable.
llvm-svn: 239567
We cannot prepend __imp_ in the IR mangler because a function reference may
be emitted unmangled in a constant initializer. The linker is expected to
resolve such references to thunks. This is covered by the new test case.
Strictly speaking we ought to emit two undefined symbols, one with __imp_ and
one without, as we cannot know which symbol the final object file will refer
to. However, this would require rather intrusive changes to IRObjectFile,
and lld works fine without it for now.
This reimplements r239437, which was reverted in r239502.
Differential Revision: http://reviews.llvm.org/D10400
llvm-svn: 239560
This improves debug locations in passes that do a lot of basic block
transformations. Important case is LoopUnroll pass, the test for correct
debug locations accompanies this change.
Test Plan: regression test suite
Reviewers: dblaikie, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10367
llvm-svn: 239551
Revert "[AArch64] Match interleaved memory accesses into ldN/stN instructions."
Revert "Fixing MSVC 2013 build error."
The test/CodeGen/AArch64/aarch64-interleaved-accesses.ll test was failing on OS X.
llvm-svn: 239544
This only updates one of the uses. The other is used in cases
that may never touch memory, so I'm not sure why this is even
calling it. Perhaps there should be a new, similar hook for such
cases or pass -1 for unknown address space.
llvm-svn: 239540
Now actually stores the non-zero constant instead of 0.
I somehow forgot to include this part of r238108.
The test change was just an independent instruction order swap,
so just add another check line to satisfy CHECK-NEXT.
llvm-svn: 239539
This patch ensures that SHL/SRL/SRA shifts for i8 and i16 vectors avoid scalarization. It builds on the existing i8 SHL vectorized implementation of moving the shift bits up to the sign bit position and separating the 4, 2 & 1 bit shifts with several improvements:
1 - SSE41 targets can use (v)pblendvb directly with the sign bit instead of performing a comparison to feed into a VSELECT node.
2 - pre-SSE41 targets were masking + comparing with an 0x80 constant - we avoid this by using the fact that a set sign bit means a negative integer which can be compared against zero to then feed into VSELECT, avoiding the need for a constant mask (zero generation is much cheaper).
3 - SRA i8 needs to be unpacked to the upper byte of a i16 so that the i16 psraw instruction can be correctly used for sign extension - we have to do more work than for SHL/SRL but perf tests indicate that this is still beneficial.
The i16 implementation is similar but simpler than for i8 - we have to do 8, 4, 2 & 1 bit shifts but less shift masking is involved. SSE41 use of (v)pblendvb requires that the i16 shift amount is splatted to both bytes however.
Tested on SSE2, SSE41 and AVX machines.
Differential Revision: http://reviews.llvm.org/D9474
llvm-svn: 239509
This patch corresponds to review:
http://reviews.llvm.org/D10096
This is the back end portion of the patch related to D10095.
The patch adds the instructions and back end intrinsics for:
vbpermq
vgbbd
llvm-svn: 239505
This reverts commit r239437.
This broke clang-cl self-hosts. We'd end up calling the __imp_ symbol
directly instead of using it to do an indirect function call.
llvm-svn: 239502
If the first argument to a function is a 'this' argument and the second
has the sret attribute, the ArgumentPromotion pass may promote the 'this'
argument to more than one argument, violating the IR constraint that 'sret'
may only be applied to the first or second argument.
Although this IR constraint is arguably unnecessary, it highlighted the fact
that ArgPromotion does not need to preserve this attribute. Dropping the
attribute reduces register pressure in the backend by avoiding the register
copy required by sret. Because sret implies noalias, we also replace the
former with the latter.
Differential Revision: http://reviews.llvm.org/D10353
llvm-svn: 239488
This is a reimplementation of D9780 at the machine instruction level rather than the DAG.
Use the MachineCombiner pass to reassociate scalar single-precision AVX additions (just a
starting point; see the TODO comments) to increase ILP when it's safe to do so.
The code is closely based on the existing MachineCombiner optimization that is implemented
for AArch64.
This patch should not cause the kind of spilling tragedy that led to the reversion of r236031.
Differential Revision: http://reviews.llvm.org/D10321
llvm-svn: 239486
Determining proper debug locations for instructions created in
PHITransAddr is tricky. We use a simple approach here and simply copy
debug locations from instructions computing load address to
"corresponding" instructions re-creating the address computation
in predecessor basic blocks.
This may not always be correct, given all the rearrangement and
simplification going on, and debug locations may jump around a lot,
as the basic blocks we copy locations between may be very far from
each other.
Still, this would work good in most simple cases (e.g. when chain
of address computing instruction is short, or our mapping turns out
to be 1-to-1), and we desire to have *some* reasonable debug locations
associated with newly inserted instructions.
See http://reviews.llvm.org/D10351 review thread for more details.
Test Plan: regression test suite
Reviewers: spatel, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10351
llvm-svn: 239479
During statepoint lowering we can sometimes avoid spilling of the value if we know that it was already spilled for previous statepoint.
We were doing this by checking if incoming statepoint value was lowered into load from stack slot. This was working only in boundaries of one basic block.
But instead of looking at the lowered node we can look directly at the llvm-ir value and if it was gc.relocate (or some simple modification of it) look up stack slot for it's derived pointer and reuse stack slot from it. This allows us to look across basic block boundaries.
Differential Revision: http://reviews.llvm.org/D10251
llvm-svn: 239472
Use a "safeseh" string attribute to do this. You would think we chould
just accumulate the set of personalities like we do on dwarf, but this
fails to account for the LSDA-loading thunks we use for
__CxxFrameHandler3. Each of those needs to make it into .sxdata as well.
The string attribute seemed like the most straightforward approach.
llvm-svn: 239448
Test Plan: regression test suite
Reviewers: eugenis, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10343
llvm-svn: 239438
Summary:
We used to assume V->RAUW only modifies the operand list of V's user.
However, if V and V's user are Constants, RAUW may replace and invalidate V's
user entirely.
This patch fixes the above issue by letting the caller replace the
operand instead of calling RAUW on Constants.
Test Plan: @nested_const_expr and @rauw in access-non-generic.ll
Reviewers: broune, jholewinski
Reviewed By: broune, jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10345
llvm-svn: 239435
llvm-lib is intended to be a lib.exe compatible utility that also
understands bitcode. The implementation lives in a library so that
lld can use it to implement /lib.
Differential Revision: http://reviews.llvm.org/D10297
llvm-svn: 239434
This gets all the handler info through to the asm printer and we can
look at the .xdata tables now. I've convinced one small catch-all test
case to work, but other than that, it would be a stretch to say this is
functional.
The state numbering algorithm avoids doing any scope reconstruction as
we do for C++ to simplify the implementation.
llvm-svn: 239433
Store instructions do not modify register values and therefore it's safe
to form a store pair even if the source register has been read in between
the two store instructions.
Previously, the read of w1 (see below) prevented the formation of a stp.
str w0, [x2]
ldr w8, [x2, #8]
add w0, w8, w1
str w1, [x2, #4]
ret
We now generate the following code.
stp w0, w1, [x2]
ldr w8, [x2, #8]
add w0, w8, w1
ret
All correctness tests with -Ofast on A57 with Spec200x and EEMBC pass.
Performance results for SPEC2K were within noise.
llvm-svn: 239432
that was resetting it.
Remove the uses of DisableTailCalls in subclasses of TargetLowering and use
the value of function attribute "disable-tail-calls" instead. Also,
unconditionally add pass TailCallElim to the pipeline and check the function
attribute at the start of runOnFunction to disable the pass on a per-function
basis.
This is part of the work to remove TargetMachine::resetTargetOptions, and since
DisableTailCalls was the last non-fast-math option that was being reset in that
function, we should be able to remove the function entirely after the work to
propagate IR-level fast-math flags to DAG nodes is completed.
Out-of-tree users should remove the uses of DisableTailCalls and make changes
to attach attribute "disable-tail-calls"="true" or "false" to the functions in
the IR.
rdar://problem/13752163
Differential Revision: http://reviews.llvm.org/D10099
llvm-svn: 239427
array of bytes. The generation of this byte arrays was expecting
the host to be little endian, which prevents big endian hosts to be
used in the generation of the PTX code. This patch fixes the
problem by changing the way the bytes are extracted so that it
works for either little and big endian.
llvm-svn: 239412
Summary:
For some branches, GAS accepts an immediate instead of the 2nd register operand.
We only implement this for BNE and BEQ for now. Other branch instructions can be added later, if needed.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: seanbruno, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D9666
llvm-svn: 239396
Summary: I noticed an object file with `DW_OP_reg4 DW_OP_breg4 0` as a DWARF expression,
which I traced to a missing break (and `++I`) in this code snippet.
While I was at it, I also added support for a few other corner cases
along the same lines that I could think of.
Test Plan: Hand-crafted test case to exercises these cases is included.
Reviewers: echristo, dblaikie, aprantl
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10302
llvm-svn: 239380
The following code triggers a fatal error in the compiler instrumentation
of ASan on Darwin because we place the attribute into llvm.metadata section,
which does not have the proper MachO section name.
void foo() __attribute__((annotate("custom")));
void foo() {;}
This commit reorders the checks so that we skip everything in llvm.metadata
first. It also removes the hard failure in case the section name does not
parse. That check will be done lower in the compilation pipeline anyway.
(Reviewed in http://reviews.llvm.org/D9093.)
llvm-svn: 239379
Summary:
This cleans up most allocas NVPTXLowerKernelArgs emits for byval
parameters.
Test Plan: makes bug21465.ll more stronger to verify no redundant local load/store.
Reviewers: eliben, jholewinski
Reviewed By: eliben, jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10322
llvm-svn: 239368
We don't want to replace function A by Function B in one module and Function B
by Function A in another module.
If these functions are marked with linkonce_odr we would end up with a function
stub calling B in one module and a function stub calling A in another module. If
the linker decides to pick these two we will have two stubs calling each other.
rdar://21265586
llvm-svn: 239367
on a per-function basis.
Previously some of the passes were conditionally added to ARM's pass pipeline
based on the target machine's subtarget. This patch makes changes to add those
passes unconditionally and execute them conditonally based on the predicate
functor passed to the pass constructors. This enables running different sets of
passes for different functions in the module.
rdar://problem/20542263
Differential Revision: http://reviews.llvm.org/D8717
llvm-svn: 239325
While we have some code to transform specification like {ax} into
{eax}/{rax} if the operand type isn't 16bit, we should reject cases
where there is no sane way to do this, like the i128 type in the
example.
Related to rdar://21042280
Differential Revision: http://reviews.llvm.org/D10260
llvm-svn: 239309
This patch adds support for system register MMFR4_EL1 (memory model feature register) in the assembler.
This register provides information about the implemented memory model and memory management support.
llvm-svn: 239302
This patch adds R_MIPS_PC32 relocation for Mips64.
Patch by Vladimir Radosavljevic.
Differential Revision: http://reviews.llvm.org/D10235
llvm-svn: 239301
Implemented DAG lowering for all these forms.
Added tests for DAG lowering and encoding.
Differential Revision: http://reviews.llvm.org/D10310
llvm-svn: 239300
For GEP instructions isDereferenceablePointer checks that all indices are constant and within bounds. Replace this index calculation logic to a call to accumulateConstantOffset. Separated from the http://reviews.llvm.org/D9791
Reviewed By: sanjoy
Differential Revision: http://reviews.llvm.org/D9874
llvm-svn: 239299
Summary:
We need to add a runtime memcheck for pair of accesses (x,y) where at least one of x and y
are writes.
Assuming we have w writes and r reads, currently this number is estimated as being
w* (w+r-1). This estimation will count (write,write) pairs twice and will overestimate
the number of checks required.
This change adds a getNumberOfChecks method to RuntimePointerCheck, which
will count the number of runtime checks needed (similar in implementation to
needsAnyChecking) and uses it to produce the correct number of runtime checks.
Test Plan:
llvm test suite
spec2k
spec2k6
Performance results: no changes observed (not surprising since the formula for 1 writer is basically the same, which would covers most cases - at least with the current check limit).
Reviewers: anemet
Reviewed By: anemet
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D10217
llvm-svn: 239295
Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector.
E.g. for (i = 0; i < N; i+=2) {
a = A[i]; // load of even element
b = A[i+1]; // load of odd element
... // operations on a, b, c, d
A[i] = c; // store of even element
A[i+1] = d; // store of odd element
}
The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like:
%wide.vec = load <8 x i32>, <8 x i32>* %ptr
%vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like:
%interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr
This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'.
llvm-svn: 239291
Summary:
canUnrollCompletely takes `unsigned` values for `UnrolledCost` and
`RolledDynamicCost` but is passed in `uint64_t`s that are silently
truncated. Because of this, when `UnrolledSize` is a large integer
that has a small remainder with UINT32_MAX, LLVM tries to completely
unroll loops with high trip counts.
Reviewers: mzolotukhin, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10293
llvm-svn: 239218
CVP wants to analyze the condition operand of a select along an edge.
It succeeds in getting back a Constant but not a ConstantInt. Instead,
it gets a ConstantExpr. It then assumes that the Constant must be equal
to false because it isn't equal to true.
Instead, perform an additional comparison.
This fixes PR23752.
llvm-svn: 239217
If we have (select a, b, c), it is sometimes valid to simplify this to a
single select operand. However, doing so is only valid if the
computation doesn't inject poison into the computation.
It might be helpful to consider the following example:
(select (icmp ne %i, INT_MAX), (add nsw %i, 1), INT_MIN)
The select is equivalent to (add %i, 1) but not (add nsw %i, 1).
Self hosting on x86_64 revealed that this occurs very, very rarely so
bailing out is hopefully pretty reasonable.
llvm-svn: 239215
Linking the debug frame section is actually very easy as we just have to
patch the start address in the FDE header and then copy the rest of the
FDE without even looking at it. The only small complexity comes from the
handling of the CIEs that we should unique across object file. This is
also really easy by using a StringMap keyed on the raw contents of the
CIE.
llvm-svn: 239198
The main use of the YAML debug map format is for testing inside LLVM. If we have IR
files in the tests used to generate object files, then we obviously don't know the
addresses of the symbols inside the object files beforehand.
This change lets the YAML import lookup the addresses in the object files and rewrite
them. This will allow to have test that really don't need any binary input.
llvm-svn: 239189
This reverts commit r239141. This commit was an attempt to reintroduce
a previous patch that broke many self-hosting bots with clang timeouts,
but it still has slowdown issues, at least on ARM, increasing the
compilation time (stage 2, clang's) by 5x.
llvm-svn: 239175
For targets with a free fneg, this fold is always a net loss if it
ends up duplicating the multiply, so definitely avoid it.
This might be true for some targets without a free fneg too, but
I'll leave that for future investigation.
llvm-svn: 239167
The new naming is (to me) much easier to understand. Here is a summary
of the new state of the world:
- '*Threshold' is the threshold for full unrolling. It is measured
against the estimated unrolled cost as computed by getUserCost in TTI
(or CodeMetrics, etc). We will exceed this threshold when unrolling
loops where unrolling exposes a significant degree of simplification
of the logic within the loop.
- '*PercentDynamicCostSavedThreshold' is the percentage of the loop's
estimated dynamic execution cost which needs to be saved by unrolling
to apply a discount to the estimated unrolled cost.
- '*DynamicCostSavingsDiscount' is the discount applied to the estimated
unrolling cost when the dynamic savings are expected to be high.
When actually analyzing the loop, we now produce both an estimated
unrolled cost, and an estimated rolled cost. The rolled cost is notably
a dynamic estimate based on our analysis of the expected execution of
each iteration.
While we're still working to build up the infrastructure for making
these estimates, to me it is much more clear *how* to make them better
when they have reasonably descriptive names. For example, we may want to
apply estimated (from heuristics or profiles) dynamic execution weights
to the *dynamic* cost estimates. If we start doing that, we would also
need to track the static unrolled cost and the dynamic unrolled cost, as
only the latter could reasonably be weighted by profile information.
This patch is sadly not without functionality change for the new unroll
analysis logic. Buried in the heuristic management were several things
that surprised me. For example, we never subtracted the optimized
instruction count off when comparing against the unroll heursistics!
I don't know if this just got lost somewhere along the way or what, but
with the new accounting of things, this is much easier to keep track of
and we use the post-simplification cost estimate to compare to the
thresholds, and use the dynamic cost reduction ratio to select whether
we can exceed the baseline threshold.
The old values of these flags also don't necessarily make sense. My
impression is that none of these thresholds or discounts have been tuned
yet, and so they're just arbitrary placehold numbers. As such, I've not
bothered to adjust for the fact that this is now a discount and not
a tow-tier threshold model. We need to tune all these values once the
logic is ready to be enabled.
Differential Revision: http://reviews.llvm.org/D9966
llvm-svn: 239164
These are added mainly for the benefit of clang, but this also means that they
are now allowed in .fpu directives and we emit the correct .fpu directive when
single-precision-only is used.
Differential Revision: http://reviews.llvm.org/D10238
llvm-svn: 239151
Summary:
Only restoring AvailableFeatures is not enough and will lead to buggy behaviour.
For example, if we have a feature enabled and we ".set pop", the next time we try
to ".set" that feature nothing will happen because the "!(STI.getFeatureBits()[Feature])"
check will be false, because we didn't restore STI.FeatureBits.
In order to fix this, we need to make MipsAssemblerOptions remember the STI.FeatureBits
instead of the AvailableFeatures and then regenerate AvailableFeatures each time we ".set pop".
This is because, AFAIK, there is no way to convert from AvailableFeatures back to STI.FeatureBits,
but the reverse is possible by using ComputeAvailableFeatures(STI.FeatureBits).
I also moved the updating of AssemblerOptions inside the "if" statement in
setFeatureBits() and clearFeatureBits(), as there is no reason to update if
nothing changes.
Reviewers: dsanders, mkuper
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9156
llvm-svn: 239144
isInductionPHI wants to calculate the stride based on the pointee size.
However, this is not possible when the pointee is zero sized.
This fixes PR23763.
llvm-svn: 239143
Also, moved test cases from CodeGen/X86/fold-buildvector-bug.ll into
CodeGen/X86/buildvec-insertvec.ll and regenerated CHECK lines using
update_llc_test_checks.py.
llvm-svn: 239142
I don't have the IR which is causing the build bot breakage but I can
postulate as to why they are timing out:
1. SimplifyWithOpReplaced was stripping flags from the simplified value.
2. visitSelectInstWithICmp was overriding SimplifyWithOpReplaced because
it's simplification wasn't correct.
3. InstCombine would revisit the add instruction and note that it can
rederive the flags.
4. By modifying the value, we chose to revisit instructions which reuse
the value. One of the instructions is the original select, causing
LLVM to never reach fixpoint.
Instead, strip the flags only when we are sure we are going to perform
the simplification.
llvm-svn: 239141
We cleverly handle cases where computation done in one argument of a select
instruction is suitable for the other operand, thus obviating the need
of the select and the comparison. However, the other operand cannot
have flags.
This fixes PR23757.
llvm-svn: 239115
gc.statepoint intrinsics with a far immediate call target
were lowered incorrectly as pc-rel32 calls.
This change fixes the problem, and generates an indirect call
via a scratch register.
For example:
Intrinsic:
%safepoint_token = call i32 (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* inttoptr (i64 140727162896504 to void ()*), i32 0, i32 0, i32 0, i32 0)
Old Incorrect Lowering:
callq 140727162896504
New Correct Lowering:
movabsq $140727162896504, %rax
callq *%rax
In lowerCallFromStatepoint(), the callee-target was modified and
represented as a "TargetConstant" node, rather than a "Constant" node.
Undoing this modification enabled LowerCall() to generate the
correct CALL instruction.
llvm-svn: 239114
Summary:
A small bit that I missed when I updated the X86 backend to account for
the Win64 calling convention on non-Windows. Now we don't use dead
non-volatile registers when emitting a Win64 indirect tail call on
non-Windows.
Should fix PR23710.
Test Plan: Added test for the correct behavior based on the case I posted to PR23710.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10258
llvm-svn: 239111
Report proper error code from MachOObjectFile constructor if we
can't parse another segment load command (we already return a proper
error if segment load command contents is suspicious).
llvm-svn: 239109
The big/small ordering here is based on signed values so SmallValue will
be INT_MIN and BigValue 0. This shouldn't be a problem but the code
assumed that BigValue always had more bits set than SmallValue.
We used to just miss the transformation, but a recent refactoring of
mine turned this into an assertion failure.
llvm-svn: 239105
Basic block selection involves checking successor BBs for PHI nodes
that depend on the current BB. In case such BBs are found, the value
being selected is a constant and such constant already exists in
current BB, it's value is reused.
This might lead to wrong locations in some situations, especially if
same constant value ends up being materialized twice in two different
ways, which discards that sharing and leaves us with wrong debug
location in the successor BB.
In code this involves the following sequence of calls:
SelectionDAGBuilder::HandlePHINodesInSuccessorBlocks ->
SelectionDAGBuilder::CopyValueToVirtualRegister ->
SelectionDAGBuilder::getNonRegisterValue
llvm-svn: 239089
Now that we can look at users, we can trivially do this: when we would
have otherwise disabled GlobalMerge (currently -O<3), we can just run
it for minsize functions, as it's usually a codesize win.
Differential Revision: http://reviews.llvm.org/D10054
llvm-svn: 239087
Fix the FIXME and remove this old as(1) compat option. It was useful for
bringup of the integrated assembler to diff object files, but now it's
just causing more relocations than strictly necessary to be generated.
rdar://21201804
llvm-svn: 239084
Summary:
With this patch, NVPTXLowerKernelArgs converts a kernel pointer argument to a
pointer in the global address space. This change, along with
NVPTXFavorNonGenericAddrSpaces, allows the NVPTX backend to emit ld.global.*
and st.global.* for accessing kernel pointer arguments.
Minor changes:
1. refactor: extract function convertToPointerInAddrSpace
2. fix a bug in the test case in bug21465.ll
Test Plan: lower-kernel-ptr-arg.ll
Reviewers: eliben, meheff, jholewinski
Reviewed By: jholewinski
Subscribers: wengxt, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10154
llvm-svn: 239082
Summary:
Properly report the error in segment load commands from MachOObjectFile
constructor instead of crashing the program.
Adjust the test case accordingly.
Test Plan: regression test suite
Reviewers: rafael, filcab
Subscribers: llvm-commits
llvm-svn: 239081
Summary:
Currently all load commands are parsed in MachOObjectFile constructor.
If the next load command cannot be parsed, or if command size is too
small, properly report it through the error code and fail to construct
the object, instead of crashing the program.
Test Plan: regression test suite
Reviewers: rafael, filcab
Subscribers: llvm-commits
llvm-svn: 239080
Summary: Instead, properly report this error from MachOObjectFile constructor.
Test Plan: regression test suite
Reviewers: rafael
Subscribers: llvm-commits
llvm-svn: 239078
Summary:
-march=bpf -> host endian
-march=bpf_le -> little endian
-match=bpf_be -> big endian
Test Plan:
v1 was tested by IBM s390 guys and appears to be working there.
It bit rots too fast here.
Reviewers: chandlerc, tstellarAMD
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10177
llvm-svn: 239071
Method 'visitBUILD_VECTOR' in the DAGCombiner knows how to combine a
build_vector of a bunch of extract_vector_elt nodes and constant zero nodes
into a shuffle blend with a zero vector.
However, method 'visitBUILD_VECTOR' forgot that a floating point
build_vector may contain negative zero as well as positive zero.
Example:
define <2 x double> @example(<2 x double> %A) {
entry:
%0 = extractelement <2 x double> %A, i32 0
%1 = insertelement <2 x double> undef, double %0, i32 0
%2 = insertelement <2 x double> %1, double -0.0, i32 1
ret <2 x double> %2
}
Before this patch, llc (with -mattr=+sse4.1) wrongly generated
movq %xmm0, %xmm0 # xmm0 = xmm0[0],zero
So, the sign bit of the negative zero was effectively lost.
This patch fixes the problem by adding explicit checks for positive zero.
With this patch, llc produces the following code for the example above:
movhpd .LCPI0_0(%rip), %xmm0
where .LCPI0_0 referes to a 'double -0'.
llvm-svn: 239070
* If the input file is missing;
* If the type of input object file can't be recognized;
* If the object file can't be parsed correctly.
llvm-svn: 239065
Now that we sometimes know the address space, this can
theoretically do a better job.
This needs better test coverage, but this mostly depends on
first updating the loop optimizatiosn to provide the address
space.
llvm-svn: 239053
When checking (High - Low + 1).sle(BitWidth), BitWidth would be truncated
to the size of the left-hand side. In the case of this PR, the left-hand
side was i4, so BitWidth=64 got truncated to 0 and the assert failed.
llvm-svn: 239048
Section symbols exist as an optimization: instead of having multiple relocations
point to different symbols, many of them can point to a single section symbol.
When that optimization is unused, a section symbol is also unused and adds no
extra information to the object file.
This saves a bit of space on the object files and makes the output of
llvm-objdump -t easier to read and consequently some tests get quite a bit
simpler.
llvm-svn: 239045
If the compare in a select pattern has another use then it can't be removed, so we'd just
be creating repeated code if we created a min/max node.
Spotted by Matt Arsenault!
llvm-svn: 239037
We don't need to go through LSR to trigger this bug. Instead,
hand-craft a tricky GEP and get the constant folder to hack on it when
parsing the IR.
llvm-svn: 239017
The first try (r238051) to land this was reverted due to ExecutionEngine build failure;
that was hopefully addressed by r238788.
The second try (r238842) to land this was reverted due to BUILD_SHARED_LIBS failure;
that was hopefully addressed by r238953.
This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
llvm-svn: 239001
With a couple more constructors that GCC thinks are necessary.
Original commit message:
[dsymutil] Accept a YAML debug map as input instead of a binary.
To do this, the user needs to pass the new -y flag.
As it wasn't tested before, the debug map YAML deserialization was
completely buggy (mainly because the DebugMapObject has a dual
mapping that allows to search by name and by address, but only the
StringMap got populated). It's fixed and tested in this commit by
augmenting some test with a 2 stage dwarf link: a frist llvm-dsymutil
reads the debug map and pipes it in a second instance that does the
actual link without touching the initial binary.
llvm-svn: 238959
To do this, the user needs to pass the new -y flag.
As it wasn't tested before, the debug map YAML deserialization was
completely buggy (mainly because the DebugMapObject has a dual
mapping that allows to search by name and by address, but only the
StringMap got populated). It's fixed and tested in this commit by
augmenting some test with a 2 stage dwarf link: a frist llvm-dsymutil
reads the debug map and pipes it in a second instance that does the
actual link without touching the initial binary.
llvm-svn: 238941
As the serialized debug map is becoming a first class citizen, a way
to cleanly dump it is required. We used -parse-only combined with
-v for that purpose before, but it dumps a lot of unrelated debug
stuff. Dumping the debug map was the only use of the -parse-only flag
anyway, so replace it with a more useful option.
llvm-svn: 238940
The existing code would unnecessarily break LDRD/STRD apart with
non-adjacent registers, on thumb2 this is not necessary.
Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore
as there is not reason to set register hints anymore, changing that is
something for a future patch however.
Differential Revision: http://reviews.llvm.org/D9694
Recommiting after the revert in r238821, the buildbot still failed with
the patch removed so there seems to be another reason for the breakage.
llvm-svn: 238935
AVX-512: Implemented GETEXP instruction for KNL and SKX
Added rounding mode modifier for SQRTPS/PD
Added tests for encoding and intrinsics.
CR:
http://reviews.llvm.org/D9991
llvm-svn: 238923
The windows buildbot originally failed because the check expressions are
evaluated as 64-bit values, even for 32-bit symbols. Fixed this by comparing
bottom 32-bits of the expressions.
The host/target endian mismatch issue is that it's invalid to read/write target
values using a host pointer without taking care of endian differences between
the target and host. Most (if not all) instances of
reinterpret_cast<uint32_t*>() in the RuntimeDyld are examples of this bug.
This has been fixed for Mips using the endian aware read/write functions.
The original commits were:
r238838:
[mips] Add RuntimeDyld tests for currently supported O32 relocations.
Reviewers: petarj, vkalintiris
Reviewed By: vkalintiris
Subscribers: vkalintiris, llvm-commits
Differential Revision: http://reviews.llvm.org/D10126
r238844:
[mips][mcjit] Add support for R_MIPS_PC32.
Summary:
This allows us to resolve relocations for DW_EH_PE_pcrel TType encodings
in the exception handling LSDA.
Also fixed a nearby typo.
Reviewers: petarj, vkalintiris
Reviewed By: vkalintiris
Subscribers: vkalintiris, llvm-commits
Differential Revision: http://reviews.llvm.org/D10127
llvm-svn: 238915
The ELF spec is very clear:
-----------------------------------------------------------------------------
If the value is non-zero, it represents a string table index that gives the
symbol name. Otherwise, the symbol table entry has no name.
--------------------------------------------------------------------------
In particular, a st_name of 0 most certainly doesn't mean that the symbol has
the same name as the section.
llvm-svn: 238899
Source for the test:
@bloom = global <3 x i32> <i32 0, i32 1, i32 42>
Plus bit twiddling to set the vector numelts to 0 (in the bc file).
llvm-svn: 238894
Summary:
Once a gc.statepoint has been rewritten to relocate live references, the
SSA values represent physical pointers instead of logical references.
Logical dereferencability does not imply physical dereferencability and
after RewriteStatepointsForGC has run any attributes that imply
dereferencability of the logical references need to be stripped.
This current approach is conservative, and can be made more precise
later if needed. For starters, we need to strip dereferencable
attributes only from pointers that live in the GC address space.
Reviewers: reames, pgavlin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10105
llvm-svn: 238883
Summary:
LLVM's MI level notion of invariant_load is different from LLVM's IR
level notion of invariant_load with respect to dereferenceability. The
IR notion of invariant_load only guarantees that all *non-faulting*
invariant loads result in the same value. The MI notion of invariant
load guarantees that the load can be legally moved to any location
within its containing function. The MI notion of invariant_load is
stronger than the IR notion of invariant_load -- an MI invariant_load is
an IR invariant_load + a guarantee that the location being loaded from
is dereferenceable throughout the function's lifetime.
Reviewers: hfinkel, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10075
llvm-svn: 238881
If the first character in a metadata attachment's name is a digit, it has
to be output using an escape sequence, otherwise it's not valid text IR.
Removed an over-zealous assert from LLVMContext which didn't allow this.
The rule should only apply to text IR. Actual names can have any sequence
of non-NUL bytes.
Also added some documentation on accepted names.
Bug found with AFL fuzz.
llvm-svn: 238867
Summary:
Following on from r209907 which made personality encodings indirect, do the
same for TType encodings. This fixes the case where a try/catch block needs
to generate references to, for example, std::exception in the
.gcc_except_table.
Previous attempts at committing this broke the buildbots due to bugs in IAS.
These bugs have now been fixed so trying again.
Reviewers: petarj
Reviewed By: petarj
Subscribers: srhines, joerg, tberghammer, llvm-commits
Differential Revision: http://reviews.llvm.org/D9669
llvm-svn: 238863
As a follow-up to r235955, actually support up to 65535 arguments in a
subprogram. r235955 missed assembly support, having only tested the new
limit via C++ unit tests. Code patch by Amjad Aboud.
llvm-svn: 238854
Summary:
This allows us to resolve relocations for DW_EH_PE_pcrel TType encodings
in the exception handling LSDA.
Also fixed a nearby typo.
Reviewers: petarj, vkalintiris
Reviewed By: vkalintiris
Subscribers: vkalintiris, llvm-commits
Differential Revision: http://reviews.llvm.org/D10127
llvm-svn: 238844
The first try (r238051) to land this was reverted due to bot failures
that were hopefully addressed by r238788.
This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
llvm-svn: 238842
Summary:
With this change we are able to realign the stack dynamically, whenever it
contains objects with alignment requirements that are larger than the
alignment specified from the given ABI.
We have to use the $fp register as the frame pointer when we perform
dynamic stack realignment. In complex stack frames, with variably-sized
objects, we reserve additionally the callee-saved register $s7 as the
base pointer in order to reference locals.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8633
llvm-svn: 238829
This reverts commit r238795, as it broke the Thumb2 self-hosting buildbot.
Since self-hosting issues with Clang are hard to investigate, I'm taking the
liberty to revert now, so we can investigate it offline.
llvm-svn: 238821
Summary:
Make mips-expansions.s more readable by grouping the instructions with their respective CHECK's.
This test is going to get a lot bigger soon and it will become essentially unreadable if the current formatting is kept.
I've also made the comments more useful and accurate, and I've restricted the RUN lines to under 80 columns.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10089
llvm-svn: 238817
Summary: These directives are used to set the current value of the SoftFloat feature.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits, mpf
Differential Revision: http://reviews.llvm.org/D9074
llvm-svn: 238813
The existing code would unnecessarily break LDRD/STRD apart with
non-adjacent registers, on thumb2 this is not necessary.
Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore
as there is not reason to set register hints anymore, changing that is
something for a future patch however.
Differential Revision: http://reviews.llvm.org/D9694
llvm-svn: 238795
Previously CCMP/FCCMP instructions were only used by the
AArch64ConditionalCompares pass for control flow. This patch uses them
for SELECT like instructions as well by matching patterns in ISelLowering.
PR20927, rdar://18326194
Differential Revision: http://reviews.llvm.org/D8232
llvm-svn: 238793
If a dead instruction we may not only have a last-use in the main live
range but also in a subregister range if subregisters are tracked. We
need to partially rebuild live ranges in both cases.
The testcase only broke when subregister liveness was enabled. I
commited it in the current form because there is currently no flag to
enable/disable subregister liveness.
This fixes PR23720.
llvm-svn: 238785
Doing so will allow us to also accept a YAML debug map in input as using
YAMLIO gives us the parsing for free. Being able to have textual debug
maps will in turn allow much more control over the tests, because 1/
no need to check-in a binary containing the debug map and 2/ it will allow
to use the same objects/IR files with made-up debug-maps to test
different scenari.
llvm-svn: 238781
Summary: Implement bswap intrinsic for MIPS FastISel. It's very different for misp32 r1/r2 .
Based on a patch by Reed Kotler.
Test Plan:
bswap1.ll
test-suite
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D7219
llvm-svn: 238760
Summary:
Implement the intrinsics memset, memcopy and memmove in MIPS FastISel.
Make some needed infrastructure fixes so that this can work.
Based on a patch by Reed Kotler.
Test Plan:
memtest1.ll
The patch passes test-suite for mips32 r1/r2 and at O0/O2
Reviewers: rkotler, dsanders
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D7158
llvm-svn: 238759
Summary: Implement the LLVM assembly urem/srem and sdiv/udiv instructions in MIPS FastISel.
Based on a patch by Reed Kotler.
Test Plan:
srem1.ll
div1.ll
test-suite at O0/O2 for mips32 r1/r2
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D7028
llvm-svn: 238757
Summary: Implement the LLVM IR select statement for MIPS FastISelsel.
Based on a patch by Reed Kotler.
Test Plan:
"Make check" test included now.
Passes test-suite at O2/O0 mips32 r1/r2.
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D6774
llvm-svn: 238756
Summary:
The contents of the HI/LO registers are unpredictable after the execution of
the MUL instruction. In addition to implicitly defining these registers in the
MUL instruction definition, we have to mark those registers as dead too.
Without this the fast register allocator is running out of registers when the
MUL instruction is followed by another one that tries to allocate the AC0
register.
Based on a patch by Reed Kotler.
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D9825
llvm-svn: 238755
Unreachable values may use themselves in strange ways due to their
dominance property. Attempting to translate through them can lead to
infinite recursion, crashing LLVM. Instead, claim that we weren't able
to translate the value.
This fixes PR23096.
llvm-svn: 238702
This fixes a bug in the line info handling in the dwarf code, based on a
problem I when implementing RelocVisitor support for MachO.
Since addr+size will give the first address past the end of the function,
we need to back up one line table entry. Fix this by looking up the
end_addr-1, which is the last address in the range. Note that this also
removes a duplicate output from the llvm-rtdyld line table dump. The
relevant line is the end_sequence one in the line table and has an offset
of the first address part the end of the range and hence should not be
included.
Also factor out the common functionality into a separate function.
This comes up on MachO much more than on ELF, since MachO
doesn't store the symbol size separately, hence making
said situation always occur.
Differential Revision: http://reviews.llvm.org/D9925
llvm-svn: 238699
The original version didn't properly account for the base register
being modified before the final jump, so caused miscompilations in
Chromium and LLVM. I've fixed this and tested with an LLVM self-host
(I don't have the means to build & test Chromium).
The general idea remains the same: in pathological cases jump tables
can be too far away from the instructions referencing them (like other
constants) so they need to be movable.
Should fix PR23627.
llvm-svn: 238680
This commit adds partial support for MachO relocations to RelocVisitor.
A simple test case is added to show that relocations are indeed being
applied and that using llvm-dwarfdump on MachO files no longer errors.
Correctness is not yet tested, due to an unrelated bug in DebugInfo,
which will be fixed with appropriate testcase in a followup commit.
Differential Revision: http://reviews.llvm.org/D8148
llvm-svn: 238663
best approach of each.
For vNi16, we use SHL + ADD + SRL pattern that seem easily the best.
For vNi32, we use the PUNPCK + PSADBW + PACKUSWB pattern. In some cases
there is a huge improvement with this in IACA's estimated throughput --
over 2x higher throughput!!!! -- but the measurements are too good to be
true. In one narrow case, the SHL + ADD + SHL + ADD + SRL pattern looks
slightly faster, but I'm not sure I believe any of the measurements at
this point. Both are the exact same uops though. Hard to be confident of
anything past that.
If anyone wants to collect very detailed (Agner-level) timings with the
result of this patch, or with the i32 case replaced with SHL + ADD + SHl
+ ADD + SRL, I'd be very interested. Note that you'll need to test it on
both Ivybridge and Haswell, with both SSE3, SSSE3, and AVX selected as
I saw unique behavior in each of these buckets with IACA all of which
should be checked against measured performance.
But this patch is still a useful improvement by dropping duplicate work
and getting the much nicer PSADBW lowering for v2i64.
I'd still like to rephrase this in terms of generic horizontal sum. It's
a bit lame to have a special case of that just for popcount.
llvm-svn: 238652
helper that skips creating a cast when it isn't necessary.
It's really somewhat concerning that this was caused by the the presence
of a no-op bitcast, but...
llvm-svn: 238642
shifting vectors of bytes as x86 doesn't have direct support for that.
This removes a bunch of redundant masking in the generated code for SSE2
and SSE3.
In order to avoid the really significant code size growth this would
have triggered, I also factored the completely repeatative logic for
shifting and masking into two lambdas which in turn makes all of this
much easier to read IMO.
llvm-svn: 238637
in-register LUT technique.
Summary:
A description of this technique can be found here:
http://wm.ite.pl/articles/sse-popcount.html
The core of the idea is to use an in-register lookup table and the
PSHUFB instruction to compute the population count for the low and high
nibbles of each byte, and then to use horizontal sums to aggregate these
into vector population counts with wider element types.
On x86 there is an instruction that will directly compute the horizontal
sum for the low 8 and high 8 bytes, giving vNi64 popcount very easily.
Various tricks are used to get vNi32 and vNi16 from the vNi8 that the
LUT computes.
The base implemantion of this, and most of the work, was done by Bruno
in a follow up to D6531. See Bruno's detailed post there for lots of
timing information about these changes.
I have extended Bruno's patch in the following ways:
0) I committed the new tests with baseline sequences so this shows
a diff, and regenerated the tests using the update scripts.
1) Bruno had noticed and mentioned in IRC a redundant mask that
I removed.
2) I introduced a particular optimization for the i32 vector cases where
we use PSHL + PSADBW to compute the the low i32 popcounts, and PSHUFD
+ PSADBW to compute doubled high i32 popcounts. This takes advantage
of the fact that to line up the high i32 popcounts we have to shift
them anyways, and we can shift them by one fewer bit to effectively
divide the count by two. While the PSHUFD based horizontal add is no
faster, it doesn't require registers or load traffic the way a mask
would, and provides more ILP as it happens on different ports with
high throughput.
3) I did some code cleanups throughout to simplify the implementation
logic.
4) I refactored it to continue to use the parallel bitmath lowering when
SSSE3 is not available to preserve the performance of that version on
SSE2 targets where it is still much better than scalarizing as we'll
still do a bitmath implementation of popcount even in scalar code
there.
With #1 and #2 above, I analyzed the result in IACA for sandybridge,
ivybridge, and haswell. In every case I measured, the throughput is the
same or better using the LUT lowering, even v2i64 and v4i64, and even
compared with using the native popcnt instruction! The latency of the
LUT lowering is often higher than the latency of the scalarized popcnt
instruction sequence, but I think those latency measurements are deeply
misleading. Keeping the operation fully in the vector unit and having
many chances for increased throughput seems much more likely to win.
With this, we can lower every integer vector popcount implementation
using the LUT strategy if we have SSSE3 or better (and thus have
PSHUFB). I've updated the operation lowering to reflect this. This also
fixes an issue where we were scalarizing horribly some AVX lowerings.
Finally, there are some remaining cleanups. There is duplication between
the two techniques in how they perform the horizontal sum once the byte
population count is computed. I'm going to factor and merge those two in
a separate follow-up commit.
Differential Revision: http://reviews.llvm.org/D10084
llvm-svn: 238636
a separate routine, generalize it to work for all the integer vector
sizes, and do general code cleanups.
This dramatically improves lowerings of byte and short element vector
popcount, but more importantly it will make the introduction of the
LUT-approach much cleaner.
The biggest cleanup I've done is to just force the legalizer to do the
bitcasting we need. We run these iteratively now and it makes the code
much simpler IMO. Other changes were minor, and mostly naming and
splitting things up in a way that makes it more clear what is going on.
The other significant change is to use a different final horizontal sum
approach. This is the same number of instructions as the old method, but
shifts left instead of right so that we can clear everything but the
final sum with a single shift right. This seems likely better than
a mask which will usually have to read the mask from memory. It is
certaily fewer u-ops. Also, this will be temporary. This and the LUT
approach share the need of horizontal adds to finish the computation,
and we have more clever approaches than this one that I'll switch over
to.
llvm-svn: 238635
It turns out that _except_handler3 and _except_handler4 really use the
same stack allocation layout, at least today. They just make different
choices about encoding the LSDA.
This is in preparation for lowering the llvm.eh.exceptioninfo().
llvm-svn: 238627
For some history here see the commit messages of r199797 and r169060.
The original intent was to fix cases like:
%EAX<def> = COPY %ECX<kill>, %RAX<imp-def>
%RCX<def> = COPY %RAX<kill>
where simply removing the copies would have RCX undefined as in terms of
machine operands only the ECX part of it is defined. The machine
verifier would complain about this so 169060 changed such COPY
instructions into KILL instructions so some super-register imp-defs
would be preserved. In r199797 it was finally decided to always do this
regardless of super-register defs.
But this is wrong, consider:
R1 = COPY R0
...
R0 = COPY R1
getting changed to:
R1 = KILL R0
...
R0 = KILL R1
It now looks like R0 dies at the first KILL and won't be alive until the
second KILL, while in reality R0 is alive and must not change in this
part of the program.
As this only happens after register allocation there is not much code
still performing liveness queries so the issue was not noticed. In fact
I didn't manage to create a testcase for this, without unrelated changes
I am working on at the moment.
The fix is simple: As of r223896 the MachineVerifier allows reads from
partially defined registers, so the whole transforming COPY->KILL thing
is not necessary anymore. This patch also changes a similar (but more
benign case as the def and src are the same register) case in the
VirtRegRewriter.
Differential Revision: http://reviews.llvm.org/D10117
llvm-svn: 238588
This patch corresponds to review:
http://reviews.llvm.org/D9941
It adds the various FMA instructions introduced in the version 2.07 of
the ISA along with the testing for them. These are operations on single
precision scalar values in VSX registers.
llvm-svn: 238578
This commit translates the line and column numbers for LLVM IR
errors from the numbers in the YAML block scalar to the numbers
in the MIR file so that the MIRParser users can report LLVM IR
errors with the correct line and column numbers.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10108
llvm-svn: 238576
Small (really small!) C++ exception handling examples work on 32-bit x86
now.
This change disables the use of .seh_* directives in WinException when
CFI is not in use. It also uses absolute symbol references in the tables
instead of imagerel32 relocations.
Also fixes a cache invalidation bug in MMI personality classification.
llvm-svn: 238575
Summary:
In continuation to an earlier commit to DependenceAnalysis.cpp by jingyue (r222100), the type for all subscripts in a coupled group need to be the same since constraints from one subscript may be propagated to another during testing. During testing, new SCEVs may be created and the operands for these need to be the same.
This patch extends unifySubscriptType() to work on lists of subscript pairs, ensuring a common extended type for all of them.
Test Plan:
Added a test case to NonCanonicalizedSubscript.ll which causes dependence analysis to crash without this fix.
All regression tests pass.
Reviewers: spop, sebpop, jingyue
Reviewed By: jingyue
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9698
llvm-svn: 238573
Fixes PR23455, where, when TableGen generates the matcher from the
AsmString, it splits "cmp${cc}ss" into tokens, and the "ss" suffix
is recognized as the SS register.
I can't think of a situation where that's a feature, not a bug, hence:
when a token is "isolated", i.e., it is followed and preceded by
separators, it shouldn't be parsed as a register.
Differential Revision: http://reviews.llvm.org/D9844
llvm-svn: 238536
organize them by the width of vector.
This makes it a lot easier to see that we're covering all of the vector
types but not doing so excessively. This also adds tests across the
spectrum of SSE versions in addition to the AVX versions.
If you're really tired of seeing the *massive* sprawl of scalarized code
for this, don't worry, I'm just about to land Bruno's patch that
dramatically improve the situation for SSSE3 and newer.
llvm-svn: 238520
This commit introduces a serializable structure called
'llvm::yaml::MachineFunction' that stores the machine
function's name. This structure will mirror the machine
function's state in the future.
This commit prints machine functions as YAML documents
containing a YAML mapping that stores the state of a machine
function. This commit also parses the YAML documents
that contain the machine functions.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D9841
llvm-svn: 238519
This moves all the state numbering code for C++ EH to WinEHPrepare so
that we can call it from the X86 state numbering IR pass that runs
before isel.
Now we just call the same state numbering machinery and insert a bunch
of stores. It also populates MachineModuleInfo with information about
the current function.
llvm-svn: 238514
ELF has no restrictions on where undefined symbols go relative to other defined
symbols. In fact, gas just sorts them together. Do the same.
This was there since r111174 probably just because the MachO writer has it.
llvm-svn: 238513
The patch evaluates the expansion cost of exitValue in indVarSimplify pass, and only does the rewriting when the expansion cost is low or loop can be deleted with the rewriting. It provides an option "-replexitval=" to control the default aggressiveness of the exitvalue rewriting. It also fixes some missing cases in SCEVExpander::isHighCostExpansionHelper to enhance the evaluation of SCEV expansion cost.
Differential Revision: http://reviews.llvm.org/D9800
llvm-svn: 238507
For x86 targets, do not do sibling call optimization when materializing
the callee's address would require a GOT relocation. We can still do
tail calls to internal functions, hidden functions, and protected
functions, because they do not require this kind of relocation. It is
still possible to get GOT relocations when the user explicitly asks for
it with musttail or -tailcallopt, both of which are supposed to
guarantee TCO.
Based on a patch by Chih-hung Hsieh.
Reviewers: srhines, timmurray, danalbert, enh, void, nadav, rnk
Subscribers: joerg, davidxl, llvm-commits
Differential Revision: http://reviews.llvm.org/D9799
llvm-svn: 238487
It caused a smaller number of failures than the previous attempt at committing but still caused a couple on the llvm-linux-mips builder. Reverting while I investigate the remainder.
llvm-svn: 238483
We were previously codegen'ing these as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach
the register allocator to allocate the registers in ascending order. This
never got implemented, and up to now we've been stuck with very poor codegen.
A much simpler approach for achiveing better codegen is to create LDM/STM
instructions with identical sets of virtual registers, let the register
allocator pick arbitrary registers and order register lists when printing an
MCInst. This approach also avoids the need to repeatedly calculate offsets
which ultimately ought to be eliminated pre-RA in order to decrease register
pressure.
This is implemented by lowering the memcpy intrinsic to a series of SD-only
MCOPY pseudo-instructions which performs a memory copy using a given number
of registers. During SD->MI lowering, we lower MCOPY to LDM/STM. This is a
little unusual, but it avoids the need to encode register lists in the SD,
and we can take advantage of SD use lists to decide whether to use the _UPD
variant of the instructions.
Fixes PR9199.
Differential Revision: http://reviews.llvm.org/D9508
llvm-svn: 238473
Currently we only fold a BitCast into a Load when the BitCast is its
only user.
Do the same for any no-op cast.
Differential Revision: http://reviews.llvm.org/D9152
llvm-svn: 238452
Octeon CPUs use dmtc2 rt,imm16 and dmfcp2 rt,imm16 for the crypto coprocessor.
E.g. dmtc2 rt,0x4057 starts calculation of sha-1.
I had to introduce a new deconding namespace to avoid a decoding conflict.
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D10083
llvm-svn: 238439
This adds support for the 64-bit DWARF format, but is still limited to
less than 4GB of debug data by the DataExtractor class. Some versions
of the GNU MIPS toolchain generate 64-Bit DWARF even though it isn't
actually necessary.
Differential Revision: http://reviews.llvm.org/D1988
llvm-svn: 238434
This was a bug for bug compatibility with gas that is completely unnecessary.
If a _GLOBAL_OFFSET_TABLE_ symbol is used, it will already be created by
the time we get to the ELF writer.
llvm-svn: 238432
Summary:
Following on from r209907 which made personality encodings indirect, do the
same for TType encodings. This fixes the case where a try/catch block needs
to generate references to, for example, std::exception in the
.gcc_except_table.
Reviewers: petarj
Reviewed By: petarj
Subscribers: srhines, joerg, tberghammer, llvm-commits
Differential Revision: http://reviews.llvm.org/D9669
llvm-svn: 238427
Add support for resolving MIPS64r2 and MIPS64r6 relocations in MCJIT.
Patch by Vladimir Radosavljevic.
Differential Revision: http://reviews.llvm.org/D9667
llvm-svn: 238424
Canonicalizing 'x [+-] (-Constant * y)' is not a win if we don't *know*
we will open up CSE opportunities.
If the multiply was 'nsw', then negating 'y' requires us to clear the
'nsw' flag. If this is actually worth pursuing, it is probably more
appropriate to do so in GVN or EarlyCSE.
This fixes PR23675.
llvm-svn: 238397
Summary:
This patch made two improvements to NaryReassociate and the NVPTX pipeline
1. Run EarlyCSE/GVN after NaryReassociate to get rid of redundant common
expressions.
2. When adding an instruction to SeenExprs, maps both the SCEV before and after
reassociation to that instruction.
Test Plan: updated @reassociate_gep_nsw in nary-gep.ll
Reviewers: meheff, broune
Reviewed By: broune
Subscribers: dberlin, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9947
llvm-svn: 238396
Extracted from the D6531 patch by Bruno Cardoso Lopes, and re-generated
to reflect the current state of the world. This should let Bruno's D6531
actually show the delta between the approaches by running the x86 test
case update script after re-building.
llvm-svn: 238391
This fixes a bit I forgot in r238335. In addition to the data record and
the counter, we can also move the name of the counter to the comdat for
the associated function.
I'm also adding an IR test case to check that these three elements are
placed in the proper comdat.
llvm-svn: 238351
Now that most of the methods in Clang and LLVM that were parsing arch/cpu/fpu
strings are using ARMTargetParser, it's time to make it a bit more conforming
with what the ABI says.
This commit adds some clarification on what build attributes are accepted and
which are "non-standard". It also makes clear that the "defaultCPU" and
"defaultArch" methods were really just build attribute getters.
It also diverges from GCC's behaviour to say that armv2/armv3 are really an
ARMv4 in the build attributes, when the ABI has a clear state for that: Pre-v4.
llvm-svn: 238344
This commit a 3rd attempt at comitting the initial MIR serialization patch.
The first commit (r237708) was reverted in 237730. Then the second commit
(r237954) was reverted in r238007, as the MIR library under CodeGen caused
a circular dependency where the CodeGen library depended on MIR and MIR
library depended on CodeGen.
This commit has fixed the dependencies between CodeGen and MIR by
reorganizing the MIR serialization code - the code that prints out
MIR has been moved to CodeGen, and the MIR library has been renamed
to MIRParser. Now the CodeGen library doesn't depend on the
MIRParser library, thus the circular dependency no longer exists.
--Original Commit Message--
MIR Serialization: print and parse LLVM IR using MIR format.
This commit is the initial commit for the MIR serialization project.
It creates a new library under CodeGen called 'MIR'. This new
library adds a new machine function pass that prints out the LLVM IR
using the MIR format. This pass is then added as a last pass when a
'stop-after' option is used in llc. The new library adds the initial
functionality for parsing of MIR files as well. This commit also
extends the llc tool so that it can recognize and parse MIR input files.
Reviewers: Duncan P. N. Exon Smith, Matthias Braun, Philip Reames
Differential Revision: http://reviews.llvm.org/D9616
llvm-svn: 238341
This broke the llvm-mips-linux builder and several of our out-of-tree builders.
Initial investigations show that the commit probably isn't the problem but
reverting anyway while I investigate.
llvm-svn: 238302
With this patch the x86 backend is now shrink-wrapping capable
and this functionality can be tested by using the
-enable-shrink-wrap switch.
The next step is to make more test and enable shrink-wrapping by
default for x86.
Related to <rdar://problem/20821487>
llvm-svn: 238293
model the dense vector instruction bonuses.
Previously, this code really didn't effectively compute the density of
inlined vector instructions and apply the intended inliner bonus. It
would try to compute it repeatedly while analyzing the function and
didn't handle the case where future vector instructions would tip the
scales back towards the bonus.
Instead, speculatively apply all possible bonuses to the threshold
initially. Once we *know* that a certain bonus can not be applied,
subtract it. This should delay early bailout enough to get much more
consistent results without actually causing us to analyze huge swaths of
code. I expect some (hopefully mild) compile time hit here, and some
swings in performance, but this was definitely the intended behavior of
these bonuses.
This also dramatically simplifies the computation of the bonuses to not
interact with each other in confusing ways. The previous code didn't do
a good job of this and the values for bonuses may be surprising but are
at least now clearly written in the code.
Finally, fix code to be in line with comments and use zero as the
bailout condition.
Patch by Easwaran Raman, with some comment tweaks by me to try and
further clarify what is going on with this code.
http://reviews.llvm.org/D8267
llvm-svn: 238276
This gets gas and llc -filetype=obj to agree on the order of prefixes.
For llvm-mc we need to fix the asm parser to know that it makes a difference
on which line the "lock" is in.
Part of pr23594.
llvm-svn: 238232
Previously, subtarget features were a bitfield with the underlying type being uint64_t.
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.
The first several times this was committed (e.g. r229831, r233055), it caused several buildbot failures.
Apparently the reason for most failures was both clang and gcc's inability to deal with large numbers (> 10K) of bitset constructor calls in tablegen-generated initializers of instruction info tables.
This should now be fixed.
llvm-svn: 238192
Summary:
Following on from r209907 which made personality encodings indirect, do the
same for TType encodings. This fixes the case where a try/catch block needs
to generate references to, for example, std::exception in the
.gcc_except_table.
This commit uses DW_EH_PE_sdata8 for N64 as far as is possible at the moment.
However, it is possible to end up with DW_EH_PE_sdata4 when a TargetMachine is
not available. There's no risk of issues with inconsistency here since the
tables are self describing but it does mean there is a small chance of the
PC-relative offset being out of range for particularly large programs.
Reviewers: petarj
Reviewed By: petarj
Subscribers: srhines, joerg, tberghammer, llvm-commits
Differential Revision: http://reviews.llvm.org/D9669
llvm-svn: 238190
Summary:
In case of functions that have a pointer argument and only pass it to
each other, the function attributes pass deduces that the pointer should
get the readnone attribute, but fails to remove a readonly attribute
that may already have been present.
Reviewers: nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9995
llvm-svn: 238152
Part of D9474, this patch extends AVX2 v16i16 types to 2 x 8i32 vectors and uses i32 shift variable shifts before packing back to i16.
Adds AVX2 tests for v8i16 and v16i16
llvm-svn: 238149
in POWER8:
vadduqm
vaddeuqm
vaddcuq
vaddecuq
vsubuqm
vsubeuqm
vsubcuq
vsubecuq
In addition to adding the instructions themselves, it also adds support for the
v1i128 type for intrinsics (Intrinsics.td, Function.cpp, and
IntrinsicEmitter.cpp).
http://reviews.llvm.org/D9081
llvm-svn: 238144
The semantics of the scalar FMA intrinsics are that the high vector elements are copied from the first source.
The existing pattern switches src1 and src2 around, to match the "213" order, which ends up tying the original src2 to the dest. Since the actual scalar fma3 instructions copy the high elements from the dest register, the wrong values are copied.
This modifies the pattern to leave src1 and src2 in their original order.
Differential Revision: http://reviews.llvm.org/D9908
llvm-svn: 238131
Change `DwarfStringPool` to calculate byte offsets on-the-fly, and
update `DwarfUnit::getLocalString()` to use a `DIEInteger` instead of a
`DIEDelta` when Dwarf doesn't use relocations (i.e., Mach-O). This
eliminates another call to `EmitLabelDifference()`, and drops memory
usage from 865 MB down to 861 MB, around 0.5%.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
llvm-svn: 238114
This test was relying on the numbering of preceding .set directives, but
an upcoming commit is going to remove some of them. Make the CHECKs
more nuanced.
llvm-svn: 238113
On GPU targets, materializing constants is cheap and stores are
expensive, so only doing this for zero vectors was silly.
Most of the new testcases aren't optimally merged, and are for
later improvements.
llvm-svn: 238108
When the compare feeding a branch was in a different BB from the branch, we'd
try to "regenerate" the compare in the block with the branch, possibly trying
to make use of values not available there. Copy a page from AArch64's play book
here to fix the problem (at least in terms of correctness).
Fixes PR23640.
llvm-svn: 238097
This is part of the work to remove TargetMachine::resetTargetOptions.
In this patch, instead of updating global variable NoFramePointerElim in
resetTargetOptions, its use in DisableFramePointerElim is replaced with a call
to TargetFrameLowering::noFramePointerElim. This function determines on a
per-function basis if frame pointer elimination should be disabled.
There is no change in functionality except that cl:opt option "disable-fp-elim"
can now override function attribute "no-frame-pointer-elim".
llvm-svn: 238080
Normally an ELF .o has two string tables, one for symbols, one for section
names.
With the scheme of naming sections like ".text.foo" where foo is a symbol,
there is a big potential saving in using a single one.
Building llvm+clang+lld with master and with this patch the results were:
master: 193,267,008 bytes
patch: 186,107,952 bytes
master non unique section names: 183,260,192 bytes
patch non unique section names: 183,118,632 bytes
So using non usique saves 10,006,816 bytes, and the patch saves 7,159,056 while
still using distinct names for the sections.
llvm-svn: 238073
This patch extends EarlyCSE to take advantage of the information that a controlling branch gives us about the value of a Value within this and dominated basic blocks. If the current block has a single predecessor with a controlling branch, we can infer what the branch condition must have been to execute this block. The actual change to support this is downright simple because EarlyCSE's existing scoped hash table logic deals with most of the complexity around merging.
The patch actually implements two optimizations.
1) The first is analogous to JumpThreading in that it enables EarlyCSE's CSE handling to fold branches which are exactly redundant due to a previous branch to branches on constants. (It doesn't actually replace the branch or change the CFG.) This is pretty clearly a win since it enables substantial CFG simplification before we start trying to inline.
2) The second is analogous to CVP in that it exploits the knowledge gained to replace dominated *uses* of the original value. EarlyCSE does not otherwise reason about specific uses, so this is the more arguable one. It does enable further simplication and constant folding within the rest of the visit by EarlyCSE.
In both cases, the added code only handles the easy dominance based case of each optimization. The general case is deferred to the existing passes.
Differential Revision: http://reviews.llvm.org/D9763
llvm-svn: 238071
InstCombine transforms A *nsw B +nsw A *nsw C to A *nsw (B + C).
This is incorrect -- e.g. if A = -1, B = 1, C = INT_SMAX. Then
nothing in the LHS overflows, but the multiplication in RHS overflows.
We need to first make sure that we won't multiple by INT_SMAX + 1.
Test case `add_of_mul` contributed by Sanjoy Das.
This fixes PR23635.
Differential Revision: http://reviews.llvm.org/D9629
llvm-svn: 238066
The usual CodeGenPrepare trickery, on a target-specific intrinsic.
Without this, the expansion of atomics will usually have the zext
be hoisted out of the loop, defeating the various patterns we have
to catch this precise case.
Differential Revision: http://reviews.llvm.org/D9930
llvm-svn: 238054
This patch adds a class for processing many recip codegen possibilities.
The TargetRecip class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
llvm-svn: 238051
The problem was that I slipped a change required for shrink-wrapping, namely I
used getFirstTerminator instead of the getLastNonDebugInstr that was here before
the refactoring, whereas the surrounding code is not yet patched for that.
Original message:
[X86] Refactor the prologue emission to prepare for shrink-wrapping.
- Add a late pass to expand pseudo instructions (tail call and EH returns).
Instead of doing it in the prologue emission.
- Factor some static methods in X86FrameLowering to ease code sharing.
NFC.
Related to <rdar://problem/20821487>
llvm-svn: 238035
This patch adds support for the ISA 2.07 additions involving the
branch history rolling buffer and event-based branching. These will
not be used by typical applications, so built-in support is not
required. They will only be available via inline assembly.
Assembly/disassembly tests are included in the patch.
llvm-svn: 238032
MachO and COFF quite reasonably only define the size for common symbols.
We used to try to figure out the "size" by computing the gap from one symbol to
the next.
This would not be correct in general, since a part of a section can belong to no
visible symbol (padding, private globals).
It was also really expensive, since we would walk every symbol to find the size
of one.
If a caller really wants this, it can sort all the symbols once and get all the
gaps ("size") in O(n log n) instead of O(n^2).
On MachO this also has the advantage of centralizing all the checks for an
invalid n_sect.
llvm-svn: 238028
The list of subtarget features for the 7em triple contains 't2xtpk',
which actually disables that subtarget feature. Correct that to
'+t2xtpk' and test that the instructions enabled by that feature do
actually work.
Differential Revision: http://reviews.llvm.org/D9936
llvm-svn: 238022
This change does a few things:
- Move some InstCombine transforms to InstSimplify
- Run SimplifyCall from within InstCombine::visitCallInst
- Teach InstSimplify to fold [us]mul_with_overflow(X, undef) to 0.
llvm-svn: 237995
PR23608 pointed out that using the preheader to gain a context instruction isn't always legal because a loop might not have a preheader. When looking into that, I realized that using the preheader to determine legality for sinking is questionable at best. Given no test covers that case and the original commit didn't seem to intend it, I restructured the code to only ask context sensative queries for hoising of loads and stores. This is effectively a partial revert of 237593.
llvm-svn: 237985
Summary:
x = &a[i];
y = &a[i + j];
=>
y = x + j;
along with some refactoring work such as extracting method
findClosestMatchingDominator.
Depends on D9786 which provides the ScalarEvolution::getGEPExpr interface.
Test Plan: nary-gep.ll
Reviewers: meheff, broune
Reviewed By: broune
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9802
llvm-svn: 237971
Summary:
This supersedes http://reviews.llvm.org/D4010, hopefully properly
dealing with the JIT case and also adds an actual test case.
DwarfContext was basically already usable for the JIT (and back when
we were overwriting ELF files it actually worked out of the box by
accident), but in order to resolve relocations correctly it needs
to know the load address of the section.
Rather than trying to get this out of the ObjectFile or requiring
the user to create a new ObjectFile just to get some debug info,
this adds the capability to pass in that info directly.
As part of this I separated out part of the LoadedObjectInfo struct
from RuntimeDyld, since it is now required at a higher layer.
Reviewers: lhames, echristo
Reviewed By: echristo
Subscribers: vtjnash, friss, rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D6961
llvm-svn: 237961
This commit is a 2nd attempt at committing the initial MIR serialization patch.
The first commit (r237708) made the incremental buildbots unstable and was
reverted in r237730. The original commit didn't add a terminating null
character to the LLVM IR source which was passed to LLParser, and this
sometimes caused the test 'llvmIR.mir' to fail with a parsing error because
the LLVM IR source didn't have a null character immediately after the end
and thus LLLexer encountered some garbage characters that ultimately caused
the error.
This commit also includes the other test fixes I committed in
r237712 (llc path fix) and r237723 (remove target triple) which
also got reverted in r237730.
--Original Commit Message--
MIR Serialization: print and parse LLVM IR using MIR format.
This commit is the initial commit for the MIR serialization project.
It creates a new library under CodeGen called 'MIR'. This new
library adds a new machine function pass that prints out the LLVM IR
using the MIR format. This pass is then added as a last pass when a
'stop-after' option is used in llc. The new library adds the initial
functionality for parsing of MIR files as well. This commit also
extends the llc tool so that it can recognize and parse MIR input files.
Reviewers: Duncan P. N. Exon Smith, Matthias Braun, Philip Reames
Differential Revision: http://reviews.llvm.org/D9616
llvm-svn: 237954
My recent patch to add support for ISA 2.07 vector pack/unpack
instructions didn't properly check for availability of the vpkudum
instruction when recognizing it as a special vector shuffle case.
This causes us to leave the vector shuffle in place (rather than
converting it to a vector permute) so that it can be recognized later
as a vpkudum, but that pattern is invalid for processors prior to
POWER8. Thus LLVM crashes with an "unable to select" message. We
observed this since one of our buildbots is configured to generate
code for a POWER7.
This patch fixes the problem by checking for availability of the
vpkudum instruction during custom lowering of vector shuffles.
I've added a test case variant for the vpkudum pattern when the
instruction isn't available.
llvm-svn: 237952
so DWARF skeleton CUs can be expression in IR. A skeleton CU is a
(typically empty) DW_TAG_compile_unit that has a DW_AT_(GNU)_dwo_name and
a DW_AT_(GNU)_dwo_id attribute. It is used to refer to external debug info.
This is a prerequisite for clang module debugging as discussed in
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html.
In order to refer to external types stored in split DWARF (dwo) objects,
such as clang modules, we need to emit skeleton CUs, which identify the
dwarf object (i.e., the clang module) by filename (the SplitDebugFilename)
and a hash value, the dwo_id.
This patch only contains the IR changes. The idea is that a CUs with a
non-zero dwo_id field will be emitted together with a DW_AT_GNU_dwo_name
and DW_AT_GNU_dwo_id attribute.
http://reviews.llvm.org/D9488
rdar://problem/20091852
llvm-svn: 237949
On X86 (and similar OOO cores) unrolling is very limited, and even if the
runtime unrolling is otherwise profitable, the expense of a division to compute
the trip count could greatly outweigh the benefits. On the A2, we unroll a lot,
and the benefits of unrolling are more significant (seeing a 5x or 6x speedup
is not uncommon), so we're more able to tolerate the expense, on average, of a
division to compute the trip count.
llvm-svn: 237947
http://reviews.llvm.org/D9891
Following up on the VSX single precision loads and stores added earlier, this
adds support for elementary arithmetic operations on single precision values
in VSX registers. These instructions utilize the new VSSRC register class.
Instructions added:
xsaddsp
xsdivsp
xsmulsp
xsresp
xsrsqrtesp
xssqrtsp
xssubsp
llvm-svn: 237937
Predicate UseAVX depricates pattern selection on AVX-512.
This predicate is necessary for DAG selection to select EVEX form.
But mapping SSE intrinsics to AVX-512 instructions is not ready yet.
So I replaced UseAVX with HasAVX for intrinsics patterns.
llvm-svn: 237903
One of the testcases introduced by D9365 had incorrect !dereferenceable metadata on load. It must fail but it doesn't due to incorrect order of CHECK/CHECK-NOT commands in test. Fixed both.
Reviewed By: sanjoy
Differential Revision: http://reviews.llvm.org/D9877
llvm-svn: 237897
This patch improves support for sign extension of the lower lanes of vectors of integers by making use of the SSE41 pmovsx* sign extension instructions where possible, and optimizing the sign extension by shifts on pre-SSE41 targets (avoiding the use of i64 arithmetic shifts which require scalarization).
It converts SIGN_EXTEND nodes to SIGN_EXTEND_VECTOR_INREG where necessary, that more closely matches the pmovsx* instruction than the default approach of using SIGN_EXTEND_INREG which splits the operation (into an ANY_EXTEND lowered to a shuffle followed by shifts) making instruction matching difficult during lowering. Necessary support for SIGN_EXTEND_VECTOR_INREG has been added to the DAGCombiner.
Differential Revision: http://reviews.llvm.org/D9848
llvm-svn: 237885
We had not been trying hard enough to resolve def names inside multiclasses
that had complex concatenations, etc. Now we'll try harder.
Patch by Amaury Sechet!
llvm-svn: 237877
In effect a partial revert of r237858, which was a dumb shortcut.
Looking at the dependencies of the destination should be the proper
fix: if the new memset would depend on anything other than itself,
the transformation isn't correct.
llvm-svn: 237874
Fixes PR23599, another miscompile introduced by r235232: when there is
another dependency on the destination of the created memset (i.e., the
part of the original destination that the memcpy doesn't depend on)
between the memcpy and the original memset, we would insert the created
memset after the memcpy, and thus after the other dependency.
Instead, insert the created memset right after the old one.
llvm-svn: 237858
Ideally this is going to be and LLVM IR pass (shared, among others
with AArch64), but for the time being just enable it if consumers
ask us for optimization and not unconditionally.
Discussed with Tim Northover on IRC.
llvm-svn: 237837
Make sure if we're truncating a constant that would then be sign extended
that the sign extension of the truncated constant is the same as the
original constant.
> Canonicalize min/max expressions correctly.
>
> This patch introduces a canonical form for min/max idioms where one operand
> is extended or truncated. This often happens when the other operand is a
> constant. For example:
>
> %1 = icmp slt i32 %a, i32 0
> %2 = sext i32 %a to i64
> %3 = select i1 %1, i64 %2, i64 0
>
> Would now be canonicalized into:
>
> %1 = icmp slt i32 %a, i32 0
> %2 = select i1 %1, i32 %a, i32 0
> %3 = sext i32 %2 to i64
>
> This builds upon a patch posted by David Majenemer
> (https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass
> passively stopped instcombine from ruining canonical patterns. This
> patch additionally actively makes instcombine canonicalize too.
>
> Canonicalization of expressions involving a change in type from int->fp
> or fp->int are not yet implemented.
llvm-svn: 237821
Summary:
During icmp lowering it can happen that a constant value can be larger than expected (see the code around the change).
APInt::getMinSignedBits() must be checked again as the shift before can change the constant sign to positive.
I'm not sure it is the best fix possible though.
Test Plan: Regression test included.
Reviewers: resistor, chandlerc, spatel, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D9147
llvm-svn: 237812
fixed extract-insert i1 element,
load i1, zextload i1 should be with "and $1, %reg" to prevent loading garbage.
added a bunch of new tests.
llvm-svn: 237793
Summary:
-check-prefix replaces the default CHECK prefix rather than adding to it and
must be explicitly re-added.
Also added the N32 cases.
Reviewers: petarj
Reviewed By: petarj
Subscribers: tberghammer, llvm-commits
Differential Revision: http://reviews.llvm.org/D9668
llvm-svn: 237790
Summary:
For N32/N64, private labels begin with '.L' but for O32 they begin with '$'.
MCAsmInfo now has an initializer function which can be used to provide information from the TargetMachine to control the assembly syntax.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: jfb, sandeep, llvm-commits, rafael
Differential Revision: http://reviews.llvm.org/D9821
llvm-svn: 237789
This change implements support for lowering of the gc.relocates tied to the invoke statepoint.
This is acomplished by storing frame indices of the lowered values in "StatepointRelocatedValues" map inside FunctionLoweringInfo instead of storing them in per-basic block structure StatepointLowering.
After this change StatepointLowering is used only during "LowerStatepoint" call and it is not necessary to store it as a field in SelectionDAGBuilder anymore.
Differential Revision: http://reviews.llvm.org/D7798
llvm-svn: 237786
This change adds a new GC strategy for supporting the CoreCLR runtime.
This strategy is currently identical to Statepoint-example GC,
but is necessary for several upcoming changes specific to CoreCLR, such as:
1. Base-pointers not explicitly reported for interior pointers
2. Different format for stack-map encoding
3. Location of Safe-point polls: polls are only needed before loop-back edges and before tail-calls (not needed at function-entry)
4. Runtime specific handshake between calls to managed/unmanaged functions.
llvm-svn: 237753
We were special casing a handful of intrinsics as not needing a safepoint before them. After running into another valid case - memset - I took a closer look and realized that almost no intrinsics need to have a safepoint poll before them. Restructure the code to make that apparent so that we stop hitting these bugs. The only intrinsics which need a safepoint poll before them are ones which can run arbitrary code.
llvm-svn: 237744
The incremental buildbots entered a pass-fail cycle where during the fail
cycle one of the tests from this commit fails for an unknown reason. I
have reverted this commit and will investigate the cause of this problem.
llvm-svn: 237730
This change implements basic support for DWARF alternate sections
proposal: http://www.dwarfstd.org/ShowIssue.php?issue=120604.1&type=open
LLVM tools now understand new forms: DW_FORM_GNU_ref_alt and
DW_FORM_GNU_strp_alt, which are used as references to .debug_info and
.debug_str sections respectively, stored in a separate file, and
possibly shared between different executables / shared objects.
llvm-dwarfdump and llvm-symbolizer don't yet know how to access this
alternate debug file (usually pointed by .gnu_debugaltlink section),
but they can at lease properly parse and dump regular files, which
refer to it.
This change should fix crashes of llvm-dwarfdump and llvm-symbolizer on
files produced by running "dwz" tool. Such files are already installed
on some modern Linux distributions.
llvm-svn: 237721
Summary:
Introduce dereferenceable, dereferenceable_or_null metadata for loads
with the same semantic as corresponding attributes.
This patch depends on http://reviews.llvm.org/D9253
Patch by Artur Pilipenko!
Reviewers: hfinkel, sanjoy, reames
Reviewed By: sanjoy, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9365
llvm-svn: 237720
Summary:
Also tagged a FIXME comment, and added information about why it breaks.
Bug found using AFL fuzz.
Reviewers: rafael, craig.topper
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9729
llvm-svn: 237709
This commit is the initial commit for the MIR serialization project.
It creates a new library under CodeGen called 'MIR'. This new
library adds a new machine function pass that prints out the LLVM IR
using the MIR format. This pass is then added as a last pass when a
'stop-after' option is used in llc. The new library adds the initial
functionality for parsing of MIR files as well. This commit also
extends the llc tool so that it can recognize and parse MIR input files.
Reviewers: Duncan P. N. Exon Smith, Matthias Braun, Philip Reames
Differential Revision: http://reviews.llvm.org/D9616
llvm-svn: 237708
Summary:
The documentation writes vectors highest-index first whereas LLVM-IR writes
them lowest-index first. As a result, instructions defined in terms of
left_half() and right_half() had the halves reversed.
In addition to correcting them, they have been improved to allow shuffles
that use the same operand twice or in reverse order. For example, ilvev
used to accept masks of the form:
<0, n, 2, n+2, 4, n+4, ...>
but now accepts:
<0, 0, 2, 2, 4, 4, ...>
<n, n, n+2, n+2, n+4, n+4, ...>
<0, n, 2, n+2, 4, n+4, ...>
<n, 0, n+2, 2, n+4, 4, ...>
One further improvement is that splati.[bhwd] is now the preferred instruction
for splat-like operations. The other special shuffles are no longer used
for splats. This lead to the discovery that <0, 0, ...> would not cause
splati.[hwd] to be selected and this has also been fixed.
This fixes the enc-3des test from the test-suite on Mips64r6 with MSA.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9660
llvm-svn: 237689
This changes the ABI used on 32-bit x86 for passing vector arguments.
Historically, clang passes the first 4 vector arguments in-register, and additional vector arguments on the stack, regardless of platform. That is different from the behavior of gcc, icc, and msvc, all of which pass only the first 3 arguments in-register.
The 3-register convention is documented, unofficially, in Agner's calling convention guide, and, officially, in the recently released version 1.0 of the i386 psABI.
Darwin is kept as is because the OS X ABI Function Call Guide explicitly documents the current (4-register) behavior.
This fixes PR21510
Differential revision: http://reviews.llvm.org/D9644
llvm-svn: 237682
This reverts commit r237210.
Also fix X86/complex-fca.ll to match the code that we used to generate
on win32 and now generate everwhere to conform to SysV.
llvm-svn: 237639
ld64 currently mishandles internal pointer relocations (i.e.
ARM64_RELOC_UNSIGNED referred to by section & offset rather than symbol). The
existing __cfstring clause was an early discovery and workaround for this, but
the problem is wider and we should avoid such relocations wherever possible for
now.
This code should be reverted to allowing internal relocations as soon as
possible.
PR23437.
llvm-svn: 237621
Summary:
Added isLoadableOrStorableType to PointerType.
We were doing some checks in some places, occasionally assert()ing instead
of telling the caller. With this patch, I'm putting all type checking in
the same place for load/store type instructions, and verifying the same
thing every time.
I also added a check for load/store of a function type.
Applied extracted check to Load, Store, and Cmpxcg.
I don't have exhaustive tests for all of these, but all Error() calls in
TypeCheckLoadStoreInst are being tested (in invalid.test).
Reviewers: dblaikie, rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9785
llvm-svn: 237619
Summary: Add an assertion in verifier.cpp to make sure gc_relocate relocate a gc pointer, and its return type has the same address space with the relocated pointer.
Reviewers: reames, AndyAyers, sanjoy, pgavlin
Reviewed By: pgavlin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9695
llvm-svn: 237605
Summary: When PlaceSafepoints pass replaces old return result with gc_result from statepoint, it asserts that gc_result can not have preceding phis in its parent block. This is only true on invoke statepoint, which terminates the block and puts its result at the beginning of the normal successor block. Call statepoint does not terminate the block and thus its result is in the same block with it. There should be no restriction on whether there are phis or not.
Reviewers: reames, igor-laevsky
Reviewed By: igor-laevsky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9803
llvm-svn: 237597
Summary:
Allow hoisting of loads from values marked with dereferenceable_or_null
attribute. For values marked with the attribute perform
context-sensitive analysis to determine whether it's known-non-null or
not.
Patch by Artur Pilipenko!
Reviewers: hfinkel, sanjoy, reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9253
llvm-svn: 237593
Previously, they were forced to immediately follow the actual branch
instruction. This was usually OK (the LEAs actually accessing them got emitted
nearby, and weren't usually separated much afterwards). Unfortunately, a
sufficiently nasty phi elimination dumps many instructions right before the
basic block terminator, and this can increase the range too much.
This patch frees them up to be placed as usual by the constant islands pass,
and consequently has to slightly modify the form of TBB/TBH tables to refer to
a PC-relative label at the final jump. The other jump table formats were
already position-independent.
rdar://20813304
llvm-svn: 237590
This pseudo-instruction expands into 'sethi' and 'or' instructions,
or, just one of them, if the other isn't necessary for a given value.
Differential Revision: http://reviews.llvm.org/D9089
llvm-svn: 237585
At the present time, we don't have a way to represent general dependency
relationships, so everything is represented using memory dependency. In order
to preserve the data dependency of a READ_REGISTER on WRITE_REGISTER, we need
to model WRITE_REGISTER as writing (which we had been doing) and model
READ_REGISTER as reading (which we had not been doing). Fix this, and also the
way that the chain operands were generated at the SDAG level.
Patch by Nicholas Paul Johnson, thanks! Test case by me.
llvm-svn: 237584
- Adds support for the asm syntax, which has an immediate integer
"ASI" (address space identifier) appearing after an address, before
a comma.
- Adds the various-width load, store, and swap in alternate address
space instructions. (ldsba, ldsha, lduba, lduha, lda, stba, stha,
sta, swapa)
This does not attempt to hook these instructions up to pointer address
spaces in LLVM, although that would probably be a reasonable thing to
do in the future.
Differential Revision: http://reviews.llvm.org/D8904
llvm-svn: 237581
(Note that register "Y" is essentially just ASR0).
Also added some test cases for divide and multiply, which had none before.
Differential Revision: http://reviews.llvm.org/D8670
llvm-svn: 237580
This patch implements LLVM support for the ACLE special register intrinsics in
section 10.1, __arm_{w,r}sr{,p,64}.
This patch is intended to lower the read/write_register instrinsics, used to
implement the special register intrinsics in the clang patch for special
register intrinsics (see http://reviews.llvm.org/D9697), to ARM specific
instructions MRC,MCR,MSR etc. to allow reading an writing of coprocessor
registers in AArch32 and AArch64. This is done by inspecting the register
string passed to the intrinsic and then lowering to the appropriate
instruction.
Patch by Luke Cheeseman.
Differential Revision: http://reviews.llvm.org/D9699
llvm-svn: 237579