consistently accept a pointee-type argument.
Note: this also adds a new C API and soft-deprecates the old C API.
Differential Revision: https://reviews.llvm.org/D56559
llvm-svn: 351124
accept a return-type argument.
Note: this also adds a new C API and soft-deprecates the old C API.
Differential Revision: https://reviews.llvm.org/D56558
llvm-svn: 351123
accept a callee-type argument.
Note: this also adds a new C API and soft-deprecates the old C API.
Differential Revision: https://reviews.llvm.org/D56557
llvm-svn: 351122
accept a callee-type argument.
Note: this also adds a new C API and soft-deprecates the old C API.
Differential Revision: https://reviews.llvm.org/D56556
llvm-svn: 351121
Summary:
Use appendToUsed instead of include to ensure that
SanitizerCoverage's constructors are not stripped.
Also, use isOSBinFormatCOFF() to determine if target
binary format is COFF.
Reviewers: pcc
Reviewed By: pcc
Subscribers: hiraditya
Differential Revision: https://reviews.llvm.org/D56369
llvm-svn: 351118
If we have PSHUFB and we're shuffling with a zero vector, then we are better off not doing VECTOR_SHUFFLE(UNPCK()) as we lose track of those zero elements.
llvm-svn: 351103
Summary: Like branch instructions, phi nodes frequently do not have debug information related to the block they are in and so they should be ignored.
Reviewers: danielcdh, twoh, Kader, wmi
Reviewed By: wmi
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D55094
llvm-svn: 351102
This reverts commit a9788dd6587d67c856df74eedff5a6ad34ce8320, reversing
changes made to f1309ffebf718d16aec4fab83380556c660e2825.
unintended merge pushed
llvm-svn: 351095
add (extractelt (X, 0), extractelt (X, 1)) --> extractelt (hadd X, X), 0
This is the integer sibling to D56011.
There's an additional restriction to only to do this transform in the
case where we don't have extra extracts from the source vector. Without
that, we can fail to match larger horizontal patterns that are more
beneficial than this minimal case. An improvement to the more general
h-op lowering may allow us to remove the restriction here in a follow-up.
llvm-svn: 351093
This removes the old grow_memory and mem.grow-style intrinsics, leaving just
the memory.grow-style intrinsics.
Differential Revision: https://reviews.llvm.org/D56645
llvm-svn: 351084
Split MachinePipeliner code into header and cpp files to allow
inheritance from SwingSchedulerDAG.
This reapplies https://reviews.llvm.org/D56084 after moving the
implementation of the dump functions into the .cpp files. This fixes a
linker error when building with Clang modules enables and local
submodule visibility disabled.
Original patch by Lama Saba <lama.saba@intel.com>!
llvm-svn: 351077
Normally, changing the function signatures of C APIs is disallowed,
but as these two are brand new last week, and haven't been released
yet, it is okay in this instance.
As per discussion in D56556, we will not add NameLen arguments to IR
building APIs, for the following reasons:
1. We do not want to deprecate all of the IR building APIs, just to add a
NameLen argument to each one.
2. Consistency is important, so adding it just to new ones is unfortunate.
3. The IR names are completely optional, useful for readability of IR
only. There is no value in ever supporting nul bytes.
Differential Revision: https://reviews.llvm.org/D56669
llvm-svn: 351076
With this patch, shifts are lowered to optimal number of instructions
necessary to shift types larger than the general purpose register size.
This resolves PR/32293.
Thanks to Kyle Butt for reporting the issue!
Differential Revision: https://reviews.llvm.org/D56320
llvm-svn: 351059
Following PR39807, the way in which SimplifyCFG hoists common code on
branch paths was fixed in r347782. However this left extra code hanging
around HoistThenElseCodeToIf that wasn't necessary and needlessly
complicated matters -- we no longer need to look up through the 'if'
basic block to find a location for hoisted 'select' insts, we can instead
use the location chosen by applyMergedLocation.
This patch deletes that extra logic, and updates a regression test to
reflect the new logic (selects get the merged location, not a previous
insts location).
Differential Revision: https://reviews.llvm.org/D55272
llvm-svn: 351058
Make it possible for TableGen to produce code for selecting MOVi32imm.
This allows reasonably recent ARM targets to select a lot more constants
than before.
We achieve this by adding GISelPredicateCode to arm_i32imm. It's
impossible to use the exact same code for both DAGISel and GlobalISel,
since one uses "Subtarget->" and the other "STI." to refer to the
subtarget. Moreover, in GlobalISel we don't have ready access to the
MachineFunction, so we need to add a bit of code for obtaining it from
the instruction that we're selecting. This is also the reason why it
needs to remain a PatLeaf instead of the more specific IntImmLeaf.
llvm-svn: 351056
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.
This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.
This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).
There's an additional fix now to avoid a dmask=0
For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.
Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.
The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:
%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1
This re-submit of the change also includes a slight modification in
SIISelLowering.cpp to work-around a compiler bug for the powerpc_le
platform that caused a buildbot failure on a previous submission.
Differential revision: https://reviews.llvm.org/D48826
Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
Work around for ppcle compiler bug
Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b
llvm-svn: 351054
Summary:
Previously only one RealFileSystem instance was available, and its working
directory is shared with the process. This doesn't work well for multithreaded
programs that want to work with relative paths - the vfs::FileSystem is assumed
to provide the working directory, but a thread cannot control this exclusively.
The new vfs::createPhysicalFileSystem() factory copies the process's working
directory initially, and then allows it to be independently modified.
This implementation records the working directory path, and glues it to relative
paths to provide the correct absolute path to the sys::fs:: functions.
This will give different results in unusual situations (e.g. the CWD is moved).
The main alternative is the use of openat(), fstatat(), etc to ask the OS to
resolve paths relative to a directory handle which can be kept open. This is
more robust. There are two reasons not to do this initially:
1. these functions are not available on all supported Unixes, and are somewhere
between difficult and unavailable on Windows. So we need a path-based
fallback anyway.
2. this would mean also adding support at the llvm::sys::fs level, which is a
larger project. My clearest idea is an OS-specific `BaseDirectory` object
that can be optionally passed to functions there. Eventually this could be
backed by either paths or a fd where openat() is supported.
This is a large project, and demonstrating here that a path-based fallback
works is a useful prerequisite.
There is some subtlety to the path-manipulation mechanism:
- when setting the working directory, both Specified=makeAbsolute(path) and
Resolved=realpath(path) are recorded. These may differ in the presence of
symlinks.
- getCurrentWorkingDirectory() and makeAbsolute() use Specified - this is
similar to the behavior of $PWD and sys::path::current_path
- IO operations like openFileForRead use Resolved. This is similar to the
behavior of an openat() based implementation, that doesn't see changes
in symlinks.
There may still be combinations of operations and FS states that yield unhelpful
behavior. This is hard to avoid with symlinks and FS abstractions :(
The caching behavior of the current working directory is removed in this patch.
getRealFileSystem() is now specified to link to the process CWD, so the caching
is incorrect.
The user who needed this so far is clangd, which will immediately switch to
createPhysicalFileSystem().
Reviewers: ilya-biryukov, bkramer, labath
Subscribers: ioeric, kadircet, kristina, llvm-commits
Differential Revision: https://reviews.llvm.org/D56545
llvm-svn: 351050
Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim" and
"no-frame-pointer-elim-non-leaf" to represent three kinds of frame
pointer usage: (all) frames use frame pointer, (non-leaf) frames use
frame pointer, (none) frame use frame pointer. This CL makes the idea
explicit by using only one enum function attribute "frame-pointer"
Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as
llc.
"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still
supported for easy migration to "frame-pointer".
tests are mostly updated with
// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’
grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"
// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’
grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"
Patch by Yuanfang Chen (tabloid.adroit)!
Differential Revision: https://reviews.llvm.org/D56351
llvm-svn: 351049
Introduce GlobalISel pre legalizer pass for MIPS.
It will be used to cope with instructions that require
combining before legalization.
Differential Revision: https://reviews.llvm.org/D56269
llvm-svn: 351046
Utility function `DeleteDeadBlock` expects that all predecessors of a block being
deleted are already deleted, with the exception of single-block loop. It makes it
hard to use for deletion of a set of blocks that may contain cyclic dependencies.
The is no correct order of invocations of this function that does not produce
dangling pointers on already deleted blocks.
This patch introduces a generalized version of this function `DeleteDeadBlocks`
that allows us to remove multiple blocks at once, even if there are cycles among
them. The only requirement is that no block being deleted should have a predecessor
that is not being deleted.
The logic of `DeleteDeadBlocks` is following:
for each block
create relevant DT updates;
remove all instructions (replace with undef if needed);
replace terminator with unreacheable;
apply DT updates;
for each block
delete block;
Therefore, `DeleteDeadBlock` becomes a particular case of
the general algorithm called for a single block.
Differential Revision: https://reviews.llvm.org/D56120
Reviewed By: skatkov
llvm-svn: 351045
Summary:
Add support for options that always prefix their value, giving an error
if the value is in the next argument or if the option is given a value
assignment (ie. opt=val). This is the desired behavior for the -D option
of FileCheck for instance.
Copyright:
- Linaro (changes in version 2 of revision D55940)
- GraphCore (changes in later versions and introduced when creating
D56549)
Reviewers: jdenny
Subscribers: llvm-commits, probinson, kristina, hiraditya,
JonChesterfield
Differential Revision: https://reviews.llvm.org/D56549
llvm-svn: 351038
I accidentally triggered this code while doing some experiments and it doesn't look lke it could possibly work.
It calls 'getNOT' on a node that should be a CondCode.
I think to do this right we would need to swap the branch target and the fallthrough target. But that's not easy to do. Or we could create an explicit SetCC and feed that into a new BR_CC?
llvm-svn: 351022
We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this.
Fixes another case from PR34877
llvm-svn: 351018
We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this.
Fixes another case from PR34877.
llvm-svn: 351017
This pattern:
t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0>
t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0>
...shows up in PR33758:
https://bugs.llvm.org/show_bug.cgi?id=33758
...although this patch doesn't make any difference to the final result on that yet.
In the affected tests here, it looks like it just makes RA wiggle. But we might
as well squash this to prevent it interfering with other pattern-matching.
Differential Revision:
https://reviews.llvm.org/D56604
llvm-svn: 351008
MIPS `jr` instruction uses a delay-slot. To escape execution of
arbitrary instruction we should either fill the delay-slot by `nop`
instruction or swap `jr` instruction and logically preceding
instruction. This fix implements the second method to generate a bit
more effective code.
llvm-svn: 351001
MIPS ABI states that every function must be called through jalr $t9. In
other words, a function expect that t9 register points to the beginning
of its code. A function uses this register to calculate offset to the
Global Offset Table and save it to the `gp` register.
```
lui $gp, %hi(_gp_disp)
addiu $gp, %lo(_gp_disp)
addu $gp, $gp, $t9
```
If `t9` and as a result `$gp` point to the wrong place the following code
loads incorrect value from GOT and passes control to invalid code.
```
lw $v0,%call16(foo)($gp)
jalr $t9
```
OrcMips32 and OrcMips64 writeResolverCode methods pass control to the
resolved address, but do not setup `$t9` before the call. The `t9` holds
value of the beginning of `resolver` code so any attempts to call
routines via GOT failed.
This change fixes the problem. The `OrcLazy/hidden-visibility.ll` test
starts to pass correctly. Before the change it fails on MIPS because the
`exitOnLazyCallThroughFailure` called from the resolver code could not
call libc routine `exit` via GOT.
Differential Revision: http://reviews.llvm.org/D56058
llvm-svn: 351000
This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types.
Differential Revision: https://reviews.llvm.org/D56544
llvm-svn: 350998
DemandedBits currently uses a simple vector for the worklist, which
means that instructions may be inserted multiple times into it.
Especially in combination with the deep lattice, this may cause
instructions too be recomputed very often. To avoid this, switch
to a SetVector.
Reapplying with a smaller number of inline elements in the
SmallSetVector, to avoid running into the SmallDenseMap issue
described in D56455.
Differential Revision: https://reviews.llvm.org/D56362
llvm-svn: 350997
The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do.
This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there.
Fixes another instruction for PR34877.
llvm-svn: 350994
As discussed on llvm-dev
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>, we have
to be careful when trying to select the *w RV64M instructions. i32 is not a
legal type for RV64 in the RISC-V backend, so operations have been promoted by
the time they reach instruction selection. Information about whether the
operation was originally a 32-bit operations has been lost, and it's easy to
write incorrect patterns.
Similarly to the variable 32-bit shifts, a DAG combine on ANY_EXTEND will
produce a SIGN_EXTEND if this is likely to result in sdiv/udiv/urem being
selected (and so save instructions to sext/zext the input operands).
Differential Revision: https://reviews.llvm.org/D53230
llvm-svn: 350993
This restores support for selecting the SLLW/SRLW/SRAW instructions, which was
removed in rL348067 as the previous patterns made some unsafe assumptions.
Also see the related llvm-dev discussion
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>
Ultimately I didn't introduce a custom SelectionDAG node, but instead added a
DAG combine that inserts an AssertZext i5 on the shift amount for an i32
variable-length shift and also added an ANY_EXTEND DAG-combine which will
instead produce a SIGN_EXTEND for an i32 variable-length shift, increasing the
opportunity to safely select SLLW/SRLW/SRAW.
There are obviously different ways of addressing this (a number discussed in
the llvm-dev thread), so I'd welcome further feedback and comments.
Note that there are now some cases in
test/CodeGen/RISCV/rv64i-exhaustive-w-insts.ll where sraw/srlw/sllw is
selected even though sra/srl/sll could be used without any extra instructions.
Given both are semantically equivalent, there doesn't seem a good reason to
prefer one vs the other. Given that would require more logic to still select
sra/srl/sll in those cases, I've left it preferring the *w variants.
Differential Revision: https://reviews.llvm.org/D56264
llvm-svn: 350992
We no longer need to extend mask scalars before bitcasting them to vXi1. This was only needed for the truncate intrinsics. And was really a bug in our lowering of them.
llvm-svn: 350991
We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type.
By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned.
llvm-svn: 350989
We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask.
This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that.
This fixes some of PR34877
llvm-svn: 350985
This fixes https://bugs.llvm.org/show_bug.cgi?id=40110.
This implements handling of undef operands for integer intrinsics in
ConstantFolding, in particular for the bitcounting intrinsics (ctpop,
cttz, ctlz), the with.overflow intrinsics, the saturating math
intrinsics and the funnel shift intrinsics.
The undef behavior follows what InstSimplify does for the general cas
e of non-constant operands. For the bitcount intrinsics (where
InstSimplify doesn't do undef handling -- there cannot be a combination
of an undef + non-constant operand) I'm using a 0 result if the intrinsic
is defined for zero and undef otherwise.
Differential Revision: https://reviews.llvm.org/D55950
llvm-svn: 350971
Teach x86 assembly operand parsing to distinguish between assembler
variable assigned to named registers and those assigned to immediate
values.
Reviewers: rnk, nickdesaulniers, void
Subscribers: hiraditya, jyknight, llvm-commits
Differential Revision: https://reviews.llvm.org/D56287
llvm-svn: 350966
Summary:
When legalizing the result of a SELECT_CC node by promoting the
floating-point type, use the promoted-to type rather than the original
type.
Fix PR40273.
Reviewers: efriedma, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56566
llvm-svn: 350951
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).
The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.
This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.
Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.
Reviewers: pcc
Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D53890
llvm-svn: 350948
MergeFunc only deletes unused duplicate functions if they have local
linkage, but it should be safe to relax this to any "discardable if
unused" linkage type.
Differential Revision: https://reviews.llvm.org/D56574
llvm-svn: 350939
Currently when a select has a constant value in one branch and the select feeds
a conditional branch (via a compare/ phi and compare) we unfold the select
statement. This results in threading the conditional branch later on. Similar
opportunity exists when a select (with a constant in one branch) feeds a
switch (via a phi node). The patch unfolds select under this condition.
A testcase is provided.
llvm-svn: 350931
Previously, we limited this transform to cases where the
extraction into the build vector happens from vectors of
the same type as the build vector, but that's not required.
There's a slight potential regression seen in the AVX512
result for phadd -- we're using the 256-bit flavor of the
instruction now even though the 128-bit subset is sufficient.
The same problem could already be seen in the AVX2 result.
Follow-up patches will attempt to narrow that back down.
llvm-svn: 350928
We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions.
Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector.
This fixes some of the regressions from D350800.
llvm-svn: 350918
Summary:
We now use __stack_pointer global and global.get/global.set instruction.
This fixes the checking routine for stack_pointer writes accordingly.
This also fixes the existing __stack_pointer test in reg-stackify.ll:
That test used to pass not because of __stack_pointer clashes but
because the function `stackpointer_callee` was not marked as `readnone`,
so it was assumed to possibly write to memory arbitraily, and
`global.set` instruction was marked as `mayStore` in the .td definition,
so they were identified as intervening writes. After we added `readnone`
to its attribute, this test fails without this patch.
Reviewers: dschuff, sunfish
Subscribers: jgravelle-google, sbc100, llvm-commits
Differential Revision: https://reviews.llvm.org/D56094
llvm-svn: 350906
* Teach AsmParser to recognize @rn in distination operand as 0(rn).
* Do not allow Disassembler decoding instructions that have size more
than a number of input bytes.
* Fix UB in MSP430MCCodeEmitter.
Patch by Kristina Bessonova!
Differential Revision: https://reviews.llvm.org/D56547
llvm-svn: 350903
Summary:
This is a third attempt, but this time we have vetted it on Windows
first. The previous errors were due to an uninitialized class member.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D56560
llvm-svn: 350901
Sanity will fail for this, since we're exploring getting a clobber
further than the sanity check expects.
Ideally we need to teach the sanity check to differentiate between the
two walkers based on the SkipSelf bool in the query.
llvm-svn: 350895
Summary:
The original patch addressed the use of BlockRPONumber by forcing a sequence point when accessing that map in a conditional. In short we found cases where that map was being accessed with blocks that had not yet been added to that structure. For context, I've kept the wall of text below, to what we are trying to fix, by always ensuring a updated BlockRPONumber.
== Backstory ==
I was investigating an ICE (segfault accessing a DenseMap item). This failure happened non-deterministically, with no apparent reason and only on a Windows build of LLVM (from October 2018).
After looking into the crashes (multiple core files) and running DynamoRio, the cores and DynamoRio (DR) log pointed to the same code in `GVN::performScalarPRE()`. The values in the map are unsigned integers, the keys are `llvm::BasicBlock*`. Our test case that triggered this warning and periodic crash is rather involved. But the problematic line looks to be:
GVN.cpp: Line 2197
```
if (BlockRPONumber[P] >= BlockRPONumber[CurrentBlock] &&
```
To test things out, I cooked up a patch that accessed the items in the map outside of the condition, by forcing a sequence point between accesses. DynamoRio stopped warning of the issue, and the test didn't seem to crash after 1000+ runs.
My investigation was on an older version of LLVM, (source from October this year). What it looks like was occurring is the following, and the assembly from the latest pull of llvm in December seems to confirm this might still be an issue; however, I have not witnessed the crash on more recent builds. Of course the asm in question is generated from the host compiler on that Windows box (not clang), but it hints that we might want to consider how we access the BlockRPONumber map in this conditional (line 2197, listed above). In any case, I don't think the host compiler is wrong, rather I think it is pointing out a possibly latent bug in llvm.
1) There is no sequence point for the `>=` operation.
2) A call to a `DenseMapBase::operator[]` can have the side effect of the map reallocating a larger store (more Buckets, via a call to `DenseMap::grow`).
3) It seems perfectly legal for a host compiler to generate assembly that stores the result of a call to `operator[]` on the stack (that's what my host compile of GVN.cpp is doing) . A second call to `operator[]` //might// encourage the map to 'grow' thus making any pointers to the map's store invalid. The `>=` compares the first and second values. If the first happens to be a pointer produced from operator[], it could be invalid when dereferenced at the time of comparison.
The assembly generated from the Window's host compiler does show the result of the first access to the map via `operator[]` produces a pointer to an unsigned int. And that pointer is being stored on the stack. If a second call to the map (which does occur) causes the map to grow, that address (on the stack) is now invalid.
Reviewers: t.p.northover, efriedma
Reviewed By: efriedma
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D55974
llvm-svn: 350880
Summary:
Step 2 in using MemorySSA in LICM:
Use MemorySSA in LICM to do sinking and hoisting, all under "EnableMSSALoopDependency" flag.
Promotion is disabled.
Enable flag in LICM sink/hoist tests to test correctness of this change. Moved one test which
relied on promotion, in order to test all sinking tests.
Reviewers: sanjoy, davide, gberry, george.burgess.iv
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D40375
llvm-svn: 350879
This extends to combineVSelectToShrunkBlend to be able to resimplify SHRUNKBLENDS that have already been created.
This should help some of the regressions from D56387
Differential Revision: https://reviews.llvm.org/D56421
llvm-svn: 350875
Despite what the comment says, FCMP_UNE would be an OR not an AND. In the lowering code the first branch created still goes to the original destination. The second branch was exchanged to go to where the subsequent unconditional branch went. This is different than what we do for FCMP_OEQ where both branches that we create go to the original unconditional branch.
As far as I can tell, I think this means we don't need to exchange the branch target with the unconditional branch for FCMP_UNE at all.
Differential Revision: https://reviews.llvm.org/D56309
llvm-svn: 350873
This commit fixes the dwordx3/southern-islands failures that were found
in bugzilla https://bugs.llvm.org/show_bug.cgi?id=40129, by not
generating the dwordx3 variants of load/store instructions that were
added to the ISA after southern islands.
Differential Revision: https://reviews.llvm.org/D56434
llvm-svn: 350838
That is, remove many of the calls to Type::getNumContainedTypes(),
Type::subtypes(), and Type::getContainedType(N).
I'm not intending to remove these accessors -- they are
useful/necessary in some cases. However, removing the pointee type
from pointers would potentially break some uses, and reducing the
number of calls makes it easier to audit.
llvm-svn: 350835
This further improves compatibility with GNU as, allowing input such as the
following to be assembled:
.equ CONST, 0x123456
li a0, CONST
addi a0, a0, %lo(CONST)
.equ CONST, 1
slli a0, a0, CONST
Note that we don't have perfect compatibility with gas, as it will avoid
emitting a relocation in this case:
addi a0, a0, %lo(CONST2)
.equ CONST2, 0x123456
Thanks to Shiva Chen for suggesting a better way to approach this during review.
Differential Revision: https://reviews.llvm.org/D52298
llvm-svn: 350831
When we use the partial-matching function on a 128-bit chunk, we must
account for the possibility that we've matched undef halves of the
original source vectors, so the outputs may need to be reset.
This should allow closing PR40243:
https://bugs.llvm.org/show_bug.cgi?id=40243
llvm-svn: 350830
This is a partial fix for:
https://bugs.llvm.org/show_bug.cgi?id=40243
...as seen in the integer test, we still need to correct the result when using the
existing (old) horizontal op matching function because it does not model the way
x86 256-bit horizontal ops return results (each 128-bit half is its own horizontal-op).
A potential follow-up change for that is discussed in the bug report - see also D56490.
This generally duplicates a lot of the existing matching code, but we can't just remove
that without introducing regressions, so the existing code is renamed and used less often.
Follow-ups may try to reduce that overlap.
Differential Revision: https://reviews.llvm.org/D56450
llvm-svn: 350826
Summary:
This patch changes the legalization action for some half-precision floating-
point vector intrinsics (FSIN, FLOG, etc.) from Promote to Expand. These ops
are not supported in hardware for half-precision vectors, but promotion is
not always possible (for v8f16 operands). Changing the action to Expand fixes
an assertion failure in the legalizer when the frontend produces such ops.
In addition, a quick microbenchmark shows that, in the v4f16 case,
expanding introduces fewer spills and is therefore slightly faster than
promoting.
Reviewers: t.p.northover, SjoerdMeijer
Reviewed By: SjoerdMeijer
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D56296
llvm-svn: 350825
Field ResourceUnitMask was incorrectly defined as a 'const unsigned' mask. It
should have been a 64 bit quantity instead. That means, ResourceUnitMask was
always implicitly truncated to a 32 bit quantity.
This issue has been found by inspection. Surprisingly, that bug was latent, and
it never negatively affected any existing upstream targets.
This patch fixes the wrong definition of ResourceUnitMask, and adds a bunch of
extra debug prints to help debugging potential issues related to invalid
processor resource masks.
llvm-svn: 350820
Allow to specify loop-unrolling with optional parameters explicitly
spelled out in -passes pipeline specification.
Introducing somewhat generic way of specifying parameters parsing via
FUNCTION_PASS_PARAMETRIZED pass registration.
Syntax of parametrized unroll pass name is as follows:
'unroll<' parameter-list '>'
Where parameter-list is ';'-separate list of parameter names and optlevel
optlevel: 'O[0-3]'
parameter: { 'partial' | 'peeling' | 'runtime' | 'upperbound' }
negated: 'no-' parameter
Example:
-passes=loop(unroll<O3;runtime;no-upperbound>)
this invokes LoopUnrollPass configured with OptLevel=3,
Runtime, no UpperBound, everything else by default.
llvm-svn: 350808
Add t2TEQrr to the map of instructions with can be reduced down into
a T1 instruction. This is a special case because TEQ just sets the
CPSR and doesn't write to a GPR, which is not the case for EOR. So,
we need to ensure that the EOR can write to the first operand.
Differential Revision: https://reviews.llvm.org/D56255
llvm-svn: 350801
Summary:
This pass replaces GR8/GR16/GR32/GR64 with their equivalent sized mask register classes. But VK32/VK64 aren't legal without AVX512BW. Apparently this mostly appears to work if the register coalescer is able to remove the VK32/VK64 register class reference. Or if we don't ever spill it. But there's no guarantee of that.
Another Intel employee managed to trigger a crash due to this with ISPC. Unfortunately, I've lost the test case he sent me at the time. I'm trying to get him to reproduce it for me. I'd like to get this in before 8.0 branches since its a little scary.
The regressions here are unfortunate, but I think we can make some improvements to DAG combine, load folding, etc. to fix them. Just not sure if we can get that done for 8.0.
Fixes PR39741
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56460
llvm-svn: 350800
Summary:
D55896 and D56029 add support to emit fixups for :abs_g0: , :abs_g1_s: , etc.
This patch adds the necessary enums and MCExpr needed for lowering these.
Reviewers: rnk, mstorsjo, efriedma
Reviewed By: efriedma
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D56037
llvm-svn: 350798
This is a second attempt at r350778, which was reverted in
r350789. The only change is that the unimplemented-simd128 feature has
been renamed simd128-unimplemented, since naming it
unimplemented-simd128 somehow made the simd128 feature flag enable the
unimplemented-simd128 feature on Windows.
llvm-svn: 350791
Found while trying to figure out why my second version of D56421 worked better than the first version. We weren't deleting the vselect in a timely fashion and that caused SimplfyDemandedBit to see an additional user.
The new version doesn't have this problem so this fix isn't needed there, but seemed like the right thing to do.
llvm-svn: 350781
Summary:
This replaces the old ad-hoc -wasm-enable-unimplemented-simd
flag. Also makes the new unimplemented-simd128 feature imply the
simd128 feature.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton
Differential Revision: https://reviews.llvm.org/D56501
llvm-svn: 350778
The C standard says "The memchr function locates the first
occurrence of c (converted to an unsigned char)[...]". The expansion
was missing the conversion to unsigned char.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39041 .
Differential Revision: https://reviews.llvm.org/D55947
llvm-svn: 350775
When a null terminator is required and the file size is a multiple of the system page size, MemoryBuffer will prefer pread() over mmap(), which can result in excessive memory usage.
Patch by Mike Hommey!
Differential Revision: https://reviews.llvm.org/D56475
llvm-svn: 350774
Summary:
Looks like many passes print its pass description as a debug message at
the start of each pass, so added that to (mostly newly added) other
passes as well.
Reviewers: dschuff
Subscribers: jgravelle-google, sbc100, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56142
llvm-svn: 350771
Summary:
Instead of using two separate callbacks to return the entry count and the
relative block frequency, use a single callback to return callsite
count. This would allow better supporting hybrid mode in the future as
the count of callsite need not always be derived from entry count (as in
sample PGO).
Reviewers: davidxl
Subscribers: mehdi_amini, steven_wu, dexonsmith, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D56464
llvm-svn: 350755
If the caller's return type does not have a zeroext attribute but the
callee does a tail call zeroext, we won't consider the tail call during
CodeGenPrepare because the attributes don't match.
However, if the result of the tail call has no uses, it makes sense to
drop the sext/zext attributes.
Differential Revision: https://reviews.llvm.org/D56486
llvm-svn: 350753
Summary: All a non-default title for the debugging this debugging aide
Reviewers: twoh, Kader, modocache
Reviewed By: twoh
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56499
llvm-svn: 350749
Summary:
For some reason the backend assumed that the condition mask would be
the first argument to the LLVM intrinsic, but everywhere else the
condition mask is the third argument.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56412
llvm-svn: 350746
This is an initial implementation for Speculative Load Hardening for
AArch64. It builds on top of the recently introduced
AArch64SpeculationHardening pass.
This doesn't implement (yet) some of the optimizations implemented for
the X86SpeculativeLoadHardening pass. I thought introducing the
optimizations incrementally in follow-up patches should make this easier
to review.
Differential Revision: https://reviews.llvm.org/D55929
llvm-svn: 350729
Fixed issue with identity values and other cases, f32/f16 identity values to be added later. fma/mac instructions is disabled for now.
Test is fully reworked, added comments. Other fixes:
1. dpp move with uses and old reg initializer should be in the same BB.
2. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Othervise the old register value is checked for identity.
3. Added add, subrev, and, or instructions to the old folding function.
4. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user.
Differential revision: https://reviews.llvm.org/D55444
llvm-svn: 350721
Follow up patch of rL350385, for adding predres
command line option. This patch renames the
feature as to keep it aligned with the option
passed by/to clang
Differential Revision: https://reviews.llvm.org/D56484
llvm-svn: 350702
Summary:
This fixes PR39710. In that case we emitted a location list looking like
this:
.Ldebug_loc0:
.quad .Lfunc_begin0-.Lfunc_begin0
.quad .Lfunc_begin0-.Lfunc_begin0
.short 1 # Loc expr size
.byte 85 # DW_OP_reg5
.quad .Lfunc_begin0-.Lfunc_begin0
.quad .Lfunc_end0-.Lfunc_begin0
.short 1 # Loc expr size
.byte 85 # super-register DW_OP_reg5
.quad 0
.quad 0
As seen, the first entry's beginning and ending addresses evalute to 0,
which meant that the entry inadvertently became an "end of list" entry,
resulting in the location list ending sooner than expected.
To fix this, omit all entries with empty ranges. Location list entries
with empty ranges do not have any effect, as specified by DWARF, so we
might as well drop them:
"A location list entry (but not a base address selection or end of list
entry) whose beginning and ending addresses are equal has no effect
because the size of the range covered by such an entry is zero."
Reviewers: davide, aprantl, dblaikie
Reviewed By: aprantl
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D55919
llvm-svn: 350698
Current strategy of dropping `InstructionPrecedenceTracking` cache is to
invalidate the entire basic block whenever we change its contents. In fact,
`InstructionPrecedenceTracking` has 2 internal strictures: `OrderedInstructions`
that is needed to be invalidated whenever the contents changes, and the map
with first special instructions in block. This second map does not need an
update if we add/remove a non-special instuction because it cannot
affect the contents of this map.
This patch changes API of `InstructionPrecedenceTracking` so that it now
accounts for reasons under which we invalidate blocks. This should lead
to much less recalculations of the map and should save us some compile time
because in practice we don't typically add/remove special instructions.
Differential Revision: https://reviews.llvm.org/D54462
Reviewed By: efriedma
llvm-svn: 350694
When the result type is v2i64/v2f64 and the index element size is i32, the index vector has two unused elements making the type v4i32. The mask VT should match the number of memory accesses that will be made.
This is consistent with the isel patterns used for the target independent gather/scatter intrinsic.
llvm-svn: 350687
Bad machine code: Illegal virtual register for instruction
function: TestULE
basic block: %bb.0 entry (0x1000a39b158)
instruction: %2:crrc = FCMPUD %1:vsfrc, %3:f8rc
operand 1: %1:vsfrc
Fix assert about missing match between fcmp instruction and register class.
We should use vsx related cmp instruction xvcmpudp instead of fcmpu when vsx is opened.
add -verifymachineinstrs option into related test cases to enable the verify pass.
Differential Revision: https://reviews.llvm.org/D55686
llvm-svn: 350685
This removes check for single use from general ShrinkDemandedConstant
to the BE because of the AArch64 regression after D56289/rL350475.
After several hours of experiments I did not come up with a testcase
failing on any other targets if check is not performed.
Moreover, direct call to ShrinkDemandedConstant is not really needed
and superceed by SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D56406
llvm-svn: 350684
Currently it's possible for following
check on V.WriteLanes (which is not really meaningful
during SubRangeJoin) to pass for one half of the pair,
and then fall through to to one of the impossible
or unresolved states. This then fails as inconsistent
on the other half.
During the main range join, the check between V.WriteLanes
and OtherV.ValidLanes must have passed, meaning this
should be a CR_Replace.
Fixes most of the testcases in bugs 39542 and 39602
llvm-svn: 350678
We can't go back and recover the lanes if it turns
out the implicit_def really can't be erased.
Assume all lanes are valid if an unresolved conflict
is encountered. There aren't any tests where this
seems to matter either way, but this seems like a
safer option.
Fixes bug 39602
llvm-svn: 350676
This is matching the equivalent of the DAG expansion,
so it should never end up with worse perf than the
original code even if the target doesn't have a rotate
instruction.
llvm-svn: 350672
In LTO or Thin-lto mode (though linker plugin), the module
names are of temp file names which are different for
different compilations. Using SourceFileName avoids the issue.
This should not change any functionality for current PGO as
all the current callers of getPGOFuncName() is before LTO.
Differential Revision: https://reviews.llvm.org/D56327
llvm-svn: 350671
Summary:
StoreResults pass does not optimize store instructions anymore because
store instructions don't return results values anymore. Now this pass is
used solely for memory intrinsics, so update the pass name accordingly
and fix outdated pass descriptions as well.
This patch does not change any meaningful behavior, but not marked as
NFC because it changes a comment check line in a test case.
Reviewers: dschuff
Subscribers: mgorny, sbc100, jgravelle-google, sunfiish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56093
llvm-svn: 350669
Starting in C++17, MSVC introduced a new mangling for function
parameters that are themselves noexcept functions. This patch
makes llvm-undname properly demangle them.
Patch by Zachary Henkel
Differential Revision: https://reviews.llvm.org/D55769
llvm-svn: 350656
A straightforward port of tsan to the new PM, following the same path
as D55647.
Differential Revision: https://reviews.llvm.org/D56433
llvm-svn: 350647
Summary:
This fixes the IDom for exit blocks and all blocks reachable from the exit blocks, when runtime unrolling under multiexit/exiting case.
We initially had a restrictive check that the IDom is only updated when
it is the header of the loop.
However, we also need to update the IDom to the correct one when the
IDom is any block within the original loop. See added test cases (which
fail dom tree verification without the patch).
Reviewers: reames, mzolotukhin, mkazantsev, hfinkel
Reviewed by: brzycki, kuhar
Subscribers: zzheng, dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D56284
llvm-svn: 350640
Commit f1db33c5c1a9 ("[BPF] Disable relocation for .BTF.ext section")
assigned relocation type R_BPF_NONE if the fixup type
is FK_Data_4 and the symbol is temporary.
The reason is we use FK_Data_4 as a fixup type
for insn offsets in .BTF.ext section.
Just checking whether the symbol is temporary is not enough.
For example, .debug_info may reference some strings whose
fixup is FK_Data_4 with a temporary symbol as well.
To truely reflect the case for .BTF.ext section,
this patch further checks that the section associateed with the symbol
must be SHF_ALLOC and SHF_EXECINSTR, i.e., in the text section.
This fixed the above-mentioned problem.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 350637
This patch adds a convenience report() method for physical registers and
uses it to print the offending register with the 'MBB has allocatable
live-in' error.
Reviewers: MatzeB, rtereshin, dsanders
Reviewed By: dsanders
Differential Revision: https://reviews.llvm.org/D55946
llvm-svn: 350630
Commit rL347861 introduced an unintentional change in the behaviour when
compiling for AArch64 at -O0 with -global-isel=0. Previously, explicitly
disabling GlobalISel resulted in using FastISel but an updated condition
in the commit changed it to using SelectionDAG. The patch fixes this
condition and slightly better organizes the code that chooses the
instruction selector.
Fixes PR40131.
Differential Revision: https://reviews.llvm.org/D56266
llvm-svn: 350626
The new-pm version of DA is untested. Testing requires a printer, so
add that and use it in the existing DA tests.
Differential Revision: https://reviews.llvm.org/D56386
llvm-svn: 350624
For stack frames on the size of a register in x86, a code size optimization
emits "push rax/eax" instead of "sub" for stack allocation. For example:
foo:
.cfi_startproc
BB#0:
pushq %rax
Ltmp0:
.cfi_def_cfa_offset 16
...
.cfi_endproc
However, we are falling back to DWARF in this case because we cannot
encode %rax as a saved register.
This requirement is wrong, since we don't care about the contents of
%rax, it is the equivalent of a sub.
In order to specify that we care about the contents of %rax, we would
need a .cfi_offset %rax, <offset>.
It's also overzealous in the case where there are pushes for callee saved
registers followed by a "push rax/eax" instead of "sub", in which case we should
also be able to encode the callee saved regs and everything else using compact
unwind.
Patch authored by Bruno Cardoso Lopes.
Differential Revision: https://reviews.llvm.org/D13793
llvm-svn: 350623
This reverts commit rL350497
reported remaining issues seem to be unrelated to modules or this change.
more info: https://reviews.llvm.org/D56084
llvm-svn: 350621
We have code to split vector splats (of zero and non-zero) for performance
reasons, but it ignores the fact that a store might be truncating.
Actually, truncating stores are formed for vNi8 and vNi16 types. Since the
truncation is from a legal type, the size of the store is always <= 64-bits and
so they don't actually benefit from being split up anyway, so this patch just
disables that transformation.
llvm-svn: 350620
Using a PatLeaf for sext_16_node allowed matching smulbb and smlabb
instructions once the operands had been sign extended. But we also
need to use sext_inreg operands along with sext_16_node to catch a
few more cases that enable use to remove the unnecessary sxth.
Differential Revision: https://reviews.llvm.org/D55992
llvm-svn: 350613
I'm not entirely sure this is the correct thing
to do with the global isel philosophy, but I think
this is necessary to handle how differently SGPRs
are used normally vs. from a condition.
For example, it makes sense to allow a copy
from a VGPR to an SGPR, but it makes no sense
to allow a copy from VGPRs to SGPRs used as
select mask.
This avoids regbankselecting strange code with
a truncate feeding directly into a condition field.
Now a copy is forced from sgpr(s1) to vcc, which is
more sensible to handle.
Some of these issues could probably avoided with making enough
operations resulting in i1 illegal. I think we can't avoid
this register bank for legality.
For example, an i1 and where one source is from a truncate, and
one source is a compare needs some kind of copy inserted to
make sure both are in condition registers.
llvm-svn: 350611
Summary: Add a utility function for creating a basic block without a parent function. A useful operation for compilers that need to synthesize and conditionally insert code without having to bother with appending and immediately unlinking a block.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56279
llvm-svn: 350608
Summary: Fix an old outstanding problem with the int cast builder binding always assuming the cast is signed by introducing a new LLVMBuildIntCast2 operation and deprecating the old prototype.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56280
llvm-svn: 350607
Summary:
FixIrreducibleControlFlow and LateEHPrepare both possibly modify CFG and
create new registers. There seems to be no reason these passes go after
register-related optimization passes (PrepareForLiveIntervals,
OptimizeLiveIntervals, StoreResults, RegStackify, and RegColoring), and
this also possibly create new optimization opportunities. I think we
should put all current and future optimization passes before RegStackify
(and related passes) unless there's a reason not to.
Reviewers: kripken
Subscribers: dschuff, sbc100, sunfish, jgravelle-google, llvm-commits
Differential Revision: https://reviews.llvm.org/D56356
llvm-svn: 350596
If a copy was needed to handle the condition of brcond, it was being
inserted before the defining instruction. Add tests for iterator edge
cases.
I find the existing code here suspect for the case where it's looking
for terminators that modify the register. It's going to insert a copy
in the middle of the terminators, which isn't allowed (it might be
necessary to have a COPY_terminator if anybody actually needs this).
Also legalize brcond for AMDGPU.
llvm-svn: 350595
Summary:
We don't need to explicitly use `NI` anymore because we now don't use
`let` statements within the definitions.
Reviewers: aardappel
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56376
llvm-svn: 350594
merging a src register in ToBeUpdated set.
This is to fix PR40061 related with https://reviews.llvm.org/rL339035.
In https://reviews.llvm.org/rL339035, live interval of source pseudo register
in rematerialized copy may be saved in ToBeUpdated set and its update may be
postponed.
In PR40061, %t2 = %t1 is rematerialized and %t1 is added into toBeUpdated set
to postpone its live interval update. After the rematerialization, the live
interval of %t1 is larger than necessary. Then %t1 is merged into %t3 and %t1
gets removed. After the merge, %t3 contains live interval larger than necessary.
Because %t3 is not in toBeUpdated set, its live interval is not updated after
register coalescing and it will break some assumption in regalloc.
The patch requires the live interval of destination register in a merge to be
updated if the source register is in ToBeUpdated.
Differential revision: https://reviews.llvm.org/D55867
llvm-svn: 350586
MSVC has a nesting limit of around 110-130. An if/else if/else if counts against this next level. The autoupgrade code consists a long chain of these checking matches against strings.
This commit moves some code to a helper function to move out a large if/else chain that was inside of one of the blocks into a separate function. There are more of these we could move or we could change some to lookup tables.
I've also merged together a few similar blocks in the outer chain. This should buy us some margin for a little bit.
llvm-svn: 350564
As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions.
Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node.
This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input.
Differential Revision: https://reviews.llvm.org/D56087
llvm-svn: 350560
This patch adds the sign/zero extension done by
vgetlane to ARM computeKnownBitsForTargetNode.
Differential revision: https://reviews.llvm.org/D56098
llvm-svn: 350553
Summary:
The option enables loop transformations to hoist accesses that do not
have clobbers in the loop. If the clobber queries skips the starting
access, the result may be outside the loop instead of the header Phi.
Adding the walker that uses this option in a separate patch.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D55944
llvm-svn: 350551
DemandedBits currently uses a simple vector for the worklist, which
means that instructions may be inserted multiple times into it.
Especially in combination with the deep lattice, this may cause
instructions too be recomputed very often. To avoid this, switch
to a SetVector.
Differential Revision: https://reviews.llvm.org/D56362
llvm-svn: 350547
Summary:
If a divergent branch instruction is marked as divergent by propagation
rule 2 in DivergencePropagator::exploreSyncDependency() and its condition
is uniform, that branch would incorrectly be assumed to be uniform.
Reviewers: arsenm, tstellar
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D56331
llvm-svn: 350532
`CallSite`.
With this change, the remaining `CallSite` usages are just for
implementing the wrapper type itself.
This does update the C API but leaves the names of that API alone and
only updates their implementation.
Differential Revision: https://reviews.llvm.org/D56184
llvm-svn: 350509
update client code.
Also rename it to use the more generic term `call` instead of something
that could be confused with a praticular type.
Differential Revision: https://reviews.llvm.org/D56183
llvm-svn: 350508
`CallSite` wrapper.
Mostly mechanical, but I've tried to tidy up code where it made sense to
do so.
Differential Revision: https://reviews.llvm.org/D56143
llvm-svn: 350507
minted `CallBase` class instead of the `CallSite` wrapper.
This moves the largest interwoven collection of APIs that traffic in
`CallSite`s. While a handful of these could have been migrated with
a minorly more shallow migration by converting from a `CallSite` to
a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
migrate together because of the complex interplay of AA APIs and the
fact that converting from a `CallBase` to a `CallSite` isn't free in its
current implementation.
Out of tree users of these APIs can fairly reliably migrate with some
combination of `.getInstruction()` on the `CallSite` instance and
casting the resulting pointer. The most generic form will look like `CS`
-> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
is a more elegant migration. Hopefully, this migrates enough APIs for
users to fully move from `CallSite` to the base class. All of the
in-tree users were easily migrated in that fashion.
Thanks for the review from Saleem!
Differential Revision: https://reviews.llvm.org/D55641
llvm-svn: 350503
Summary: AVX512VBMI2 supports a funnel shift by immediate and a funnel shift by a variable vector.
Reviewers: spatel, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56361
llvm-svn: 350498
There are no test changes here in the existing cost model
regression tests because integer add/sub have a default
legal cost of 1 already. This would break, however, if
we custom lower those ops because the default cost model
assumes that custom-lowered ops are more expensive.
This is similar to the change in rL350403. See discussion
in D56011 for more details. When we enhance that patch to
handle integer ops, we need this cost model change to avoid
unintended diffs here from the custom lowering.
llvm-svn: 350496
The FSHL/FSHR nodes are handled in the expand function, but they need to also be listed in the code that queries for the operation action too.
llvm-svn: 350490
This was here because out and in instructions allow '(%dx)' even though its not a memory reference. To handle this we build a special operand for the DX register reference before we get to the call to CheckBaseRegAndIndexRegAndScale. So we no longer need this special case.
llvm-svn: 350483
This is especially helpful on targets without avx512bw since we don't have a good way to convert from v16i8/v32i8 to v16i1/v32i1 for the truncate anyway. If we're just going to convert it to a GPR we might as well use pmovmskb to accomplish both.
llvm-svn: 350480
Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink
source node to demanded bits only even if there are other uses.
Differential Revision: https://reviews.llvm.org/D56289
llvm-svn: 350475
The cttz/ctlz intrinsics have a parameter specifying whether the
result is undefined for zero. cttz(x, false) can be relaxed to
cttz(x, true) if x is known non-zero, and in fact such an optimization
is already performed. However, this currently doesn't work if x is
non-zero as a result of a select rather than an explicit branch.
This patch adds handling for this case, thus allowing
x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y.
Differential Revision: https://reviews.llvm.org/D55786
llvm-svn: 350463
This has some minor optimizations to shouldBeDeferred. This is not
strictly NFC because the early exit inside the loop assumes
TotalSecondaryCost is monotonically non-decreasing, which is not true if
the threshold used by CostAnalyzer is negative. AFAICT the thresholds do
not go below 0 for the default values of the various options we use.
llvm-svn: 350456
We don't need to require the first operand to be an integer because we already said it was the same type as the result which we also constrained to an integer.
llvm-svn: 350455
In LTO or Thin-lto mode (though linker plugin), the module
names are of temp file names which are different for
different compilations. Using SourceFileName avoids the issue.
This should not change any functionality for current PGO as
all the current callers of getPGOFuncName() is before LTO.
llvm-svn: 350442
In addition to finding dead uses of instructions, also find dead uses
of function arguments, and replace them with zero as well.
I'm changing the way the known bits are computed here to remove the
coupling between the transfer function and the algorithm. It previously
relied on the first op being visited first and computing known bits --
unless the first op is not an instruction, in which case they're computed
on the second op. I could have adjusted this to check for "instruction
or argument", but I think it's better to avoid the repeated calculation
with an explicit flag.
Differential Revision: https://reviews.llvm.org/D56247
llvm-svn: 350435
This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits.
My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common.
Differential Revision: https://reviews.llvm.org/D56283
llvm-svn: 350432
The problem is similar to D55986 but for threads: a process with the
interceptor hwasan library loaded might have some threads started by
instrumented libraries and some by uninstrumented libraries, and we
need to be able to run instrumented code on the latter.
The solution is to perform per-thread initialization lazily. If a
function needs to access shadow memory or add itself to the per-thread
ring buffer its prologue checks to see whether the value in the
sanitizer TLS slot is null, and if so it calls __hwasan_thread_enter
and reloads from the TLS slot. The runtime does the same thing if it
needs to access this data structure.
This change means that the code generator needs to know whether we
are targeting the interceptor runtime, since we don't want to pay
the cost of lazy initialization when targeting a platform with native
hwasan support. A flag -fsanitize-hwaddress-abi={interceptor,platform}
has been introduced for selecting the runtime ABI to target. The
default ABI is set to interceptor since it's assumed that it will
be more common that users will be compiling application code than
platform code.
Because we can no longer assume that the TLS slot is initialized,
the pthread_create interceptor is no longer necessary, so it has
been removed.
Ideally, lazy initialization should only cost one instruction in the
hot path, but at present the call may cause us to spill arguments
to the stack, which means more instructions in the hot path (or
theoretically in the cold path if the spills are moved with shrink
wrapping). With an appropriately chosen calling convention for
the per-thread initialization function (TODO) the hot path should
always need just one instruction and the cold path should need two
instructions with no spilling required.
Differential Revision: https://reviews.llvm.org/D56038
llvm-svn: 350429
At -O0, globalopt is not run during the compile step, and we can have a
chain of an alias having an immediate aliasee of another alias. The
summaries are constructed assuming aliases in a canonical form
(flattened chains), and as a result only the base object but no
intermediate aliases were preserved.
Fix by adding a pass that canonicalize aliases, which ensures each
alias is a direct alias of the base object.
Reviewers: pcc, davidxl
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54507
llvm-svn: 350423
The 1st try for this was at rL350369, but it caused IR-level diffs because
our cost models differentiate custom vs. legal/promote lowering. So that was
reverted at rL350373. The cost models were fixed independently at rL350403,
so this is effectively the same patch as last time.
Original commit message:
This would show up if we fix horizontal reductions to narrow as they go along,
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.
We need to do this late to not interfere with other pattern matching of larger
horizontal sequences.
We can extend this to integer ops in a follow-up patch.
Differential Revision: https://reviews.llvm.org/D56011
llvm-svn: 350421
Lifetime markers which reference inputs to the extraction region are not
safe to extract. Example ('rhs' will be extracted):
```
entry:
+------------+
| x = alloca |
| y = alloca |
+------------+
/ \
lhs: rhs:
+-------------------+ +-------------------+
| lifetime_start(x) | | lifetime_start(x) |
| use(x) | | lifetime_start(y) |
| lifetime_end(x) | | use(x, y) |
| lifetime_start(y) | | lifetime_end(y) |
| use(y) | | lifetime_end(x) |
| lifetime_end(y) | +-------------------+
+-------------------+
```
Prior to extraction, the stack coloring pass sees that the slots for 'x'
and 'y' are in-use at the same time. After extraction, the coloring pass
infers that 'x' and 'y' are *not* in-use concurrently, because markers
from 'rhs' are no longer available to help decide otherwise.
This leads to a miscompile, because the stack slots actually are in-use
concurrently in the extracted function.
Fix this by moving lifetime start/end markers for memory regions defined
in the calling function around the call to the extracted function.
Fixes llvm.org/PR39671 (rdar://45939472).
Differential Revision: https://reviews.llvm.org/D55967
llvm-svn: 350420
Similar to rL350199 - there are no known analysis/codegen holes for
funnel shift intrinsics now, so we can canonicalize the 6+ regular
instructions to funnel shift to improve vectorization, inlining,
unrolling, etc.
llvm-svn: 350419
In some cases the order that we hoist instructions in means that when rehoisting
(which uses the same order as hoisting) we can rehoist to a block A, then a
block B, then block A again. This currently causes an assertion failure as it
expects that when changing the hoist point it only ever moves to a block that
dominates the hoist point being moved from.
Fix this by moving the re-hoist point when it doesn't dominate the dominator of
hoisted instruction, or in other words when it wouldn't dominate the uses of
the instruction being rehoisted.
Differential Revision: https://reviews.llvm.org/D55266
llvm-svn: 350408
Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4
Add the other costs so that we're not relying on the default "is legal/custom" cost logic.
llvm-svn: 350403
Added field 'MustIssueImmediately' to the instruction descriptor of instructions
that only consume in-order issue/dispatch processor resources.
This speeds up queries from the hardware Scheduler, and gives an average ~5%
speedup on a release build.
No functional change intended.
llvm-svn: 350397
GetPointerBaseWithConstantOffset include this code, where ByteOffset
and GEPOffset are both of type llvm::APInt :
ByteOffset += GEPOffset.getSExtValue();
The problem with this line is that getSExtValue() returns an int64_t, but
the += matches an overload for uint64_t. The problem is that the resulting
APInt is no longer considered to be signed. That in turn causes assertion
failures later on if the relevant pointer type is > 64 bits in width and
the GEPOffset was negative.
Changing it to
ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth());
resolves the issue and explicitly performs the sign-extending
or truncation. Additionally, instead of asserting later if the result
is > 64 bits, it breaks out of the loop in that case.
See also
https://reviews.llvm.org/D24729https://reviews.llvm.org/D24772
This commit must be merged after D38662 in order for the test to pass.
Patch by Michael Ferguson <mpfergu@gmail.com>.
Reviewers: reames, sanjoy, hfinkel
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D38501
llvm-svn: 350395
Method ResourceManager::use() is responsible for updating the internal state of
used processor resources, as well as notifying resource groups that contain used
resources.
Before this patch, method 'use()' didn't know how to quickly obtain the set of
groups that contain a particular resource unit. It had to discover groups by
perform a potentially slow search (done by iterating over the set of processor
resource descriptors).
With this patch, the relationship between resource units and groups is stored in
the ResourceManager. That means, method 'use()' no longer has to search for
groups. This gives an average speedup of ~4-5% on a release build.
This patch also adds extra code comments in ResourceManager.h to better describe
the resource mask layout, and how resouce indices are computed from resource
masks.
llvm-svn: 350387
Move the check for -1 and identical values outside the vector sorting code.
Compare functions need to be able to compare identical elements to be
conforming.
llvm-svn: 350379
Doing this late so we will prefer to fold the AND into a masked comparison first. That can be better for the live range of the mask register.
Differential Revision: https://reviews.llvm.org/D56246
llvm-svn: 350374
This would show up if we fix horizontal reductions to narrow as they go along,
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.
We need to do this late to not interfere with other pattern matching of larger
horizontal sequences.
We can extend this to integer ops in a follow-up patch.
Differential Revision: https://reviews.llvm.org/D56011
llvm-svn: 350369
Summary:
Irreducible control flow is not that rare, e.g. it happens in malloc and
3 other places in the libc portions linked in to a hello world program.
This patch improves how we handle that code: it emits a br_table to
dispatch to only the minimal necessary number of blocks. This reduces
the size of malloc by 33%, and makes it comparable in size to asm2wasm's
malloc output.
Added some tests, and verified this passes the emscripten-wasm tests run
on the waterfall (binaryen2, wasmobj2, other).
Reviewers: aheejin, sunfish
Subscribers: mgrang, jgravelle-google, sbc100, dschuff, llvm-commits
Differential Revision: https://reviews.llvm.org/D55467
Patch by Alon Zakai (kripken)
llvm-svn: 350367
Summary:
The previously introduced new operand type for br_table didn't have
a disassembler implementation, causing an assert.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56227
llvm-svn: 350366
Summary:
Instead of asserting on certain kinds of malformed instructions, it
now still print, but instead adds an annotation indicating the
problem, and/or indicates invalid_type etc.
We're using the InstPrinter from many contexts that can't always
guarantee values are within range (e.g. the disassembler), where having
output is more valueable than asserting.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56223
llvm-svn: 350365
As noted in PR39973 and D55558:
https://bugs.llvm.org/show_bug.cgi?id=39973
...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine:
// extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index)
We want to have this in the DAG too because as we can see in some of the test diffs (reductions),
the pattern may not be visible in IR.
Given that this is already an IR canonicalization, any backend that would prefer a vector op over
a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's
a realistic expectation though). The transform is limited with a TLI hook because there's an
existing transform in CodeGenPrepare that tries to do the opposite transform.
Differential Revision: https://reviews.llvm.org/D55722
llvm-svn: 350354
Detailed description: SIFoldOperands::foldInstOperand iterates over the
operand uses calling the function that changes def-use iteratorson the
way. As a result loop exits immediately when def-use iterator is
changed. Hence, the operand is folded to the very first use instruction
only. This makes VGPR live along the whole basic block and increases
register pressure significantly. The performance drop observed in SHOC
DeviceMemory test is caused by this bug.
Proposed fix: collect uses to separate container for further processing
in another loop.
Testing: make check-llvm
SHOC performance test.
Reviewers: rampitec, ronlieb
Differential Revision: https://reviews.llvm.org/D56161
llvm-svn: 350350
Use report_fatal_error in MCStreamer::EmitRawTextImpl instead of
using errs() and explain the rationale behind it not being
llvm_unreachable() to save confusion for any future maintainers.
Differential Revision: https://reviews.llvm.org/D56245
llvm-svn: 350342
NFC: This adds the dom tree verification under debug mode at a point
just before we start unrolling the loop. This allows us to verify dom
tree at a state where it is much smaller and before the unrolling
actually happens.
This also implies we do not need to run -verify-dom-info everytime to
see if the DT is in a valid state when we transform the loop for runtime
unrolling.
llvm-svn: 350334
Summary:
Keeping msan a function pass requires replacing the module level initialization:
That means, don't define a ctor function which calls __msan_init, instead just
declare the init function at the first access, and add that to the global ctors
list.
Changes:
- Pull the actual sanitizer and the wrapper pass apart.
- Add a newpm msan pass. The function pass inserts calls to runtime
library functions, for which it inserts declarations as necessary.
- Update tests.
Caveats:
- There is one test that I dropped, because it specifically tested the
definition of the ctor.
Reviewers: chandlerc, fedor.sergeev, leonardchan, vitalybuka
Subscribers: sdardis, nemanjai, javed.absar, hiraditya, kbarton, bollu, atanasyan, jsji
Differential Revision: https://reviews.llvm.org/D55647
llvm-svn: 350305
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.
This patch also renames FeatureSpecRestrict to FeatureSB.
Reviewed By: olista01, LukeCheeseman
Differential Revision: https://reviews.llvm.org/D55990
llvm-svn: 350299
Summary:
The commit rL348922 introduced a means to set Metadata
section kind for a global variable, if its explicit section
name was prefixed with ".AMDGPU.metadata.".
This patch changes that prefix to ".AMDGPU.comment.",
as "metadata" in the section name might lead to
ambiguity with metadata used by AMD PAL runtime.
Change-Id: Idd4748800d6fe801441d91595fc21e5a4171e668
Reviewers: kzhuravl
Reviewed By: kzhuravl
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D56197
llvm-svn: 350292
A DBG_VALUE between a two-address instruction and a following COPY
would prevent rescheduleMIBelowKill optimization inside
TwoAddressInstructionPass.
Differential Revision: https://reviews.llvm.org/D55987
llvm-svn: 350289
There can be multiple local symbols with the same name (for e.g.
comdat sections), and thus the symbol name itself isn't enough
to disambiguate symbols.
Differential Revision: https://reviews.llvm.org/D56140
llvm-svn: 350288
When switched to the MI scheduler for P9, the hardware is modeled as out of order.
However, inside the MI Scheduler algorithm, we still use the in-order scheduling model
as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer
the op. So, only when all the available instructions issued, the pending instruction
could be scheduled. That is not true for our P9 hw in fact.
This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is
picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017.
With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows:
x264_r: +6.95%
cactuBSSN_r: +6.94%
lbm_r: +4.11%
xz_r: -3.85%
And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved.
Reviewer: Nemanjai
Differential Revision: https://reviews.llvm.org/D55810
llvm-svn: 350285
OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it
sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have
equivalent arguments (either identical or equivalent PHIs). It then assumes that
ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead.
Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs
so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue.
This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence.
rdar://problem/47005143
Reviewed By: ahatanak
Differential Revision: https://reviews.llvm.org/D56235
llvm-svn: 350284
Summary: Add read[only|write] PIC relocation models to the C API and teach the TargetMachine API about it.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56187
llvm-svn: 350279
Summary:
This was previously ignored and an incorrect value generated.
Also fixed Disassembler's handling of block_type.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56092
llvm-svn: 350270
Summary:
Alias can make one (but not all) live, we still need to scan all others if this symbol is reachable
from somewhere else.
Reviewers: tejohnson, grimar
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D56117
llvm-svn: 350269
The caller to EraseInstruction had this conditional:
// ARC calls with null are no-ops. Delete them.
if (IsNullOrUndef(Arg))
but the assert inside EraseInstruction only allowed ConstantPointerNull and not
undef or bitcasts.
This adds support for both of these cases.
rdar://problem/47003805
llvm-svn: 350261
If an instruction has no demanded bits, remove it directly during BDCE,
instead of leaving it for something else to clean up.
Differential Revision: https://reviews.llvm.org/D56185
llvm-svn: 350257
INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag.
This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used.
I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag.
This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes.
Differential Revision: https://reviews.llvm.org/D55975
llvm-svn: 350245
Sometimes it's useful to be able to output demangled names without
tag specifiers like "struct", "class", etc. This patch adds a
flag enabling this.
llvm-svn: 350241
Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced.
Improves the test case from PR38217. There may be additional opportunities after this.
Differential Revision: https://reviews.llvm.org/D56145
llvm-svn: 350239
By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined.
By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend.
This fixes the regression on X86 in D56156.
Differential Revision: https://reviews.llvm.org/D56176
llvm-svn: 350236
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.
The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.
Differential Revision: https://reviews.llvm.org/D56156
llvm-svn: 350235
PPCPreEmitPeephole pass.
PPCPreEmitPeephole will convert a BC to B when the conditional branch is
based on a constant CR by CRSET or CRUNSET. This is added in
https://reviews.llvm.org/rL343100.
When the conditional branch is known to be always taken, all branches will
be removed and a new unconditional branch will be inserted. However, when
SeenUse is false the original patch will not remove the branches, but still
insert the new unconditional branch, update the successors and create
inconsistent IR. Compiling the synthetic testcase included can show the
problem we run into.
The patch simply removes the SeenUse condition when adding branches into
InstrsToErase set.
Differential Revision: https://reviews.llvm.org/D56041
llvm-svn: 350223
Peek through shift modulo masks while matching double shift patterns.
I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now.
Differential Revision: https://reviews.llvm.org/D56199
llvm-svn: 350222
Motivated by the discussion in D38499, this patch updates BasicAA to support
arbitrary pointer sizes by switching most remaining non-APInt calculations to
use APInt. The size of these APInts is set to the maximum pointer size (maximum
over all address spaces described by the data layout string).
Most of this translation is straightforward, but this patch contains a fix for
a bug that revealed itself during this translation process. In order for
test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit
pointers, the intermediate calculations must be performed using 64-bit
integers. This is because, as noted in the patch, when GetLinearExpression
decomposes an expression into C1*V+C2, and we then multiply this by Scale, and
distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even
through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can
overflow. If this happens, later logic will draw invalid conclusions from the
(base) offset value. Thus, when initially applying the APInt conversion,
because the maximum pointer size in this test is 32 bits, it started failing.
Suspicious, I created a 64-bit version of this test (included here), and that
failed (miscompiled) on trunk for a similar reason (the multiplication can
overflow).
After fixing this overflow bug, the first test case (at least) in
Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was
relying on having 64-bit intermediate values to have BasicAA return an accurate
result. In order to fix this problem, and because I believe that it is not
uncommon to use i64 indexing expressions in 32-bit code (especially portable
code using int64_t), it seems reasonable to always use at least 64-bit
integers. In this way, we won't regress our analysis capabilities (and there's
a command-line option added, so experimenting with this should be easy).
As pointed out by Eli during the review, there are other potential overflow
conditions that this patch does not address. Fixing those is left to follow-up
work.
Patch by me with contributions from Michael Ferguson (mferguson@cray.com).
Differential Revision: https://reviews.llvm.org/D38662
llvm-svn: 350220
GlobalVariable
Summary:
Extend Module::getOrInsertGlobal to accept a callback for creating a new
GlobalVariable if necessary instead of calling the GV constructor
directly using default arguments. Additionally overload
getOrInsertGlobal for the previous default behavior.
Reviewers: chandlerc
Subscribers: hiraditya, llvm-commits, bollu
Differential Revision: https://reviews.llvm.org/D56130
llvm-svn: 350219
Common code used by the default resource strategy to select pipeline resources
has been moved to an helper function.
The new selection logic has been slightly rewritten to get rid of a redundant
zero check on the `ReadyMask` value. Before this patch, method select internally
called function `PowerOf2Floor` to compute the next ready pipeline resource.
However, `PowerOf2Floor` forces an implicit (redundant) zero check on the input
value. By construction, `ReadyMask` can never be zero. This patch replaces the
call to `PowerOf2Floor` with an equivalent block of code which avoids the
redundant zero check. This gives a minor 3-3.5% speedup on a release build.
No functional change intended.
llvm-svn: 350218
All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering.
llvm-svn: 350206
These require a different X86ISD node to be created than i16/i32/i64. I guess no one wanted to add the special code for that except in LowerXALUO. But now LowerXALUO, LowerSELECT, and LowerBRCOND all use a common helper function so they all share the special code.
Unfortunately, there are no test changes because we seem to correct the miss in a DAG combine later. I did verify it manually using test cases from xmulo.ll
llvm-svn: 350205
The final piece of IR-level analysis to allow this was committed with:
rL350188
Using the intrinsics should improve transforms based on cost models
like vectorization and inlining.
The backend should be prepared too, so we can now canonicalize more
sequences of shift/logic to the intrinsics and know that the end
result should be equal or better to the original code even if the
target does not have an actual rotate instruction.
llvm-svn: 350199
This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions.
This is inspired by the helper functions used by ARM and AArch64 for the same purpose.
The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO.
llvm-svn: 350198
Summary: Add accessors so the performance improvement from this setting is accessible to third parties.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56179
llvm-svn: 350196
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.
BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.
The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.
The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.
Differential Revision: https://reviews.llvm.org/D55563
llvm-svn: 350188
default
During the lowering of a switch that would result in the generation of a jump
table, a range check is performed before indexing into the jump table, for the
switch value being outside the jump table range and a conditional branch is
inserted to jump to the default block. In case the default block is
unreachable, this conditional jump can be omitted. This patch implements
omitting this conditional branch for unreachable defaults.
Review Reference: D52002
llvm-svn: 350186
MSan used to report false positives in the case the argument of
llvm.is.constant intrinsic was uninitialized.
In fact checking this argument is unnecessary, as the intrinsic is only
used at compile time, and its value doesn't depend on the value of the
argument.
llvm-svn: 350173
Summary:
For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo.
Then the verifier runs and it seems like we have a use of an undefined register (the register will
be reserved later, but the verifier doesn't know that).
So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know
X2 is a reserved register.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D56148
llvm-svn: 350165
This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better.
Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it.
llvm-svn: 350159
This allows us to sign extend to v4i32 first. And then share that extension to implement the final steps to v4i64 using a pcmpgt and punpckl and punpckh.
We already do something similar for SIGN_EXTEND with -x86-experimental-vector-widening-legalization.
llvm-svn: 350158
We have some unfortunate code in the back end that defines a bunch of register
sets for the Asm Parser. Every time another class is needed in the parser, we
have to add another one of those definitions with explicit lists of registers.
This NFC patch simply provides macros to use to condense that code a little bit.
Differential revision: https://reviews.llvm.org/D54433
llvm-svn: 350156
A recent patch has added custom legalization of vector conversions of
v2i16 -> v2f64. This just rounds it out for other types where the input vector
has an illegal (narrower) type than the result vector. Specifically, this will
handle the following conversions:
v2i8 -> v2f64
v4i8 -> v4f32
v4i16 -> v4f32
Differential revision: https://reviews.llvm.org/D54663
llvm-svn: 350155
The current CRBIT spill pseudo-op expansion creates a KILL instruction
that kills the CRBIT and defines the enclosing CR field. However, this
paints a false picture to the register allocator that all bits in the CR
field are killed so copies of other bits out of the field become dead and
removable.
This changes the expansion to preserve the KILL flag on the CRBIT as an
implicit use and to treat the CR field as an undef input.
Thanks to Hal Finkel for the review and Uli Weigand for implementation input.
Differential revision: https://reviews.llvm.org/D55996
llvm-svn: 350153
The following code requests 64-bit PC-relative relocations unsupported
by MIPS ABI. Now it triggers an assertion. It's better to show an error
message.
```
foo:
.quad bar - foo
```
llvm-svn: 350152
This was tricking us into making these operations and then letting them get scalarized later. But I can't prove that the scalarized version is actually better.
llvm-svn: 350141
Previously we emitted a multiply and some masking that was supposed to matched to PMULUDQ, but the masking could sometimes be removed before we got a chance to match it. So instead just emit the PMULUDQ directly.
Remove the DAG combine that was added when the ReplaceNodeResults code was originally added. Add a new DAG combine to avoid regressions in shrink_vmul.ll
Some of the shrink_vmul.ll test cases now pick PMULUDQ instead of PMADDWD/PMULLD, but I think this should be an improvement on most CPUs.
I think all of this can go away if/when we switch to -x86-experimental-vector-widening-legalization
llvm-svn: 350134
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.
This patch also moves to FeatureSB the old FeatureSpecRestrict.
Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman
Differential Revision: https://reviews.llvm.org/D55921
llvm-svn: 350126
This is the last one in a series of patches to support better code generation for bitfield insert.
BitPermutationSelector already support ISD::ZERO_EXTEND but not TRUNCATE.
This patch adds support for ISD:TRUNCATE in BitPermutationSelector.
For example of this test case,
struct s64b {
int a:4;
int b:16;
int c:24;
};
void bitfieldinsert64b(struct s64b *p, unsigned char v) {
p->b = v;
}
the selection DAG loos like:
t14: i32,ch = load<(load 4 from %ir.0)> t0, t2, undef:i64
t18: i32 = and t14, Constant:i32<-1048561>
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t22: i64 = AssertZext t4, ValueType:ch:i8
t23: i32 = truncate t22
t16: i32 = shl nuw nsw t23, Constant:i32<4>
t19: i32 = or t18, t16
t20: ch = store<(store 4 into %ir.0)> t14:1, t19, t2, undef:i64
By handling truncate in the BitPermutationSelector, we can use information from AssertZext when selecting t19 and skip the mask operation corresponding to t18.
So the generated sequences with and without this patch are
without this patch
rlwinm 5, 5, 0, 28, 11 # corresponding to t18
rlwimi 5, 4, 4, 20, 27
with this patch
rlwimi 5, 4, 4, 12, 27
Differential Revision: https://reviews.llvm.org/D49076
llvm-svn: 350118
Deletion of dead blocks in arbitrary order may lead to failure
of assertion in `DeleteDeadBlock` that requires that we have
deleted all predecessors before we can delete the current block.
We should instead delete them in RPO order.
llvm-svn: 350116
If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have.
Differential Revision: https://reviews.llvm.org/D56078
llvm-svn: 350115
For atomic value operand which less than 4 bytes need to be masked.
And the related operation to calculate the newvalue can be done in 32 bit gprc.
So just use gprc for mask and value calculation.
Differential Revision: https://reviews.llvm.org/D56077
llvm-svn: 350113
Summary:
This will make migrating code easier and generally seems like a good collection
of API improvements.
Some of these APIs seem like more consistent / better naming of existing
ones. I've retained the old names for migration simplicit and am just
adding the new ones in this commit. I'll try to garbage collect these
once CallSite is gone.
Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D55638
llvm-svn: 350109
Create PMULDQ/PMULUDQ as long as the number of elements is a power of 2.
This seems to give some improvements in our ability to use SimplifyDemandedBits.
llvm-svn: 350084
Make each of the helper functions only return their comparison node and the condition code. Leave X86ISD::SETCC creation to the LowerSETCC function itself.
Looking into whether we can use this code directly in BRCOND and SELECT lowering instead of going through LowerSETCC which creates an X86ISD::SETCC node we need to look through.
llvm-svn: 350082
Only one of the 3 callers of LowerAndToBT need the SETCC node. Two of them have to look through it to find the operands they really need. Instead create it after the one call that needs it.
LowerAndToBT now returns both the BT node and the X86 specific condition code separately.
llvm-svn: 350081
Summary:
These instructions are currently unused in our backend, but for
completeness it is good to support them, so they can be used with
the assembler in hand-written code.
Tests are very basic, signature support missing much like other blocks.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55973
llvm-svn: 350079
Summary:
It does so using a simple nesting stack, and gives clear errors upon
violation. This is unique to wasm, since most CPUs do not have
any nested constructs.
Had to add an end of file check to the general assembler for this.
Note: if/else/end instructions are not currently supported in our
tablegen defs, so these tests will be enabled in a follow-up.
They already pass the nesting check.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55797
llvm-svn: 350078
Summary:
Existing LIR recognizes CTLZ where shifting input variable right until it is zero. (Shift-Until-Zero idiom)
This commit:
1. Augments Shift-Until-Zero idiom to recognize CTTZ where input variable is shifted left.
2. Prepare for BitScan idiom recognition.
Patch by Yuanfang Chen (tabloid.adroit)
Reviewers: craig.topper, evstupac
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55876
llvm-svn: 350074
Fixes crash reported after r347354 for frontends that don't always emit
'this' pointers for methods. Now we will silently produce debug info
that makes functions like this look like static methods, which seems
reasonable.
llvm-svn: 350073
The patch adds a possibility to make library calls on NVPTX.
An important thing about library functions - they must be defined within
the current module. This basically should guarantee that we produce a
valid PTX assembly (without calls to not defined functions). The one who
wants to use the libcalls is probably will have to link against
compiler-rt or any other implementation.
Currently, it's completely impossible to make library calls because of
error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can
lower ExternalSymbol to TargetExternalSymbol and verify if the function
definition is available.
Also, there was an issue with a DAG during legalisation. When we expand
instruction into libcall, the inner call-chain isn't being "integrated"
into outer chain. Since the last "data-flow" (call retval load) node is
located in call-chain earlier than CALLSEQ_END node, the latter becomes
a leaf and therefore a dead node (and is being removed quite fast).
Proposed here solution relies on another data-flow pseudo nodes
(ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and
instruction selection phases - we remove the pseudo instructions before
register scheduling phase.
Patch by Denys Zariaiev!
Differential Revision: https://reviews.llvm.org/D34708
llvm-svn: 350069
Add widen scalar for type index 1 (i1 condition) for G_SELECT.
Select G_SELECT for pointer, s32(integer) and smaller low level
types on MIPS32.
Differential Revision: https://reviews.llvm.org/D56001
llvm-svn: 350063
Summary:
This patch is to fix the bug imported by rL341634.
In above submit , the the return type of ISD::ADDE is
14224: SDVTList VTs = DAG.getVTList(MVT::i64, MVT::i64),
but in fact, the second return type of ISD::ADDE should be
MVT::Glue not MVT::i64.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D55977
llvm-svn: 350061
This is an alternative to what I attempted in D56057.
GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users.
I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits.
Based on a patch that Simon Pilgrim sent me in email.
Fixes PR40142.
llvm-svn: 350059
This patch teaches LoopSimplifyCFG to remove dead exiting edges
from loops.
Differential Revision: https://reviews.llvm.org/D54025
Reviewed By: fedor.sergeev
llvm-svn: 350049
Both of these places reference memset-like loops. Memset is precise.
Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.
llvm-svn: 350044
More migration so we can disable the implicit int -> LocationSize
conversion.
All of these are either scatter/gather'ed vector instructions, or direct
loads. Hence, they're all precise.
Perhaps if we see way more getTypeStoreSize calls, we can make a
getTypeStoreLocationSize (or similar) as a wrapper that applies this
::precise. Doesn't appear that it's a good idea to make getTypeStoreSize
return a LocationSize itself, however.
llvm-svn: 350042
Remove the TESTmr isel patterns and add another postprocessing combine for TESTrr+ANDrm->TESTmr. We already have a postprocessing combine for TESTrr+ANDrr->TESTrr. With this we can give ANDN a chance to match first. And clean it up during post processing if we ended up with just a regular AND.
This is another step towards my plan to gut EmitTest and do more flag handling during isel matching or by using optimizeCompare.
llvm-svn: 350038
It's dangerous to knowingly create an illegal vector type
no matter what stage of combining we're in.
This prevents the missed folding/scalarization seen in:
https://bugs.llvm.org/show_bug.cgi?id=40146
llvm-svn: 350034
Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.
srcSize is derived from the size of an alloca, and we quit out if the
size of that is > the size of the thing we're copying to. Hence, we
should always copy everything over, so these sizes are precise.
Don't make srcSize itself a LocationSize, since optionality isn't
helpful, and we do some comparisons against other sizes elsewhere in
that function.
llvm-svn: 350019
Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.
This one sadly isn't *super* small, but all of the changes here are
either to:
- libfuncs that are passed a constant size (memcpy, memset, ...)
- instructions that store/load a constant size
So they have to be precise
llvm-svn: 350017
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.
This tries to find literal loads/stores of the given type, so this has
to be precise.
llvm-svn: 350016
This is difficult/not possible to test in LLVM, but is visible as a
crash in LLD when parsing DWARF to generate gdb-index.
This function is called by llvm-dwarfdump when parsing high_pc for
non-verbose output (to print the actual high_pc rather than the low_pc
relative value), but in that case llvm-dwarfdump doesn't print section
names (if it did, it would hit this problem).
We could add some other features to llvm-dwarfdump to expose this, but
nothing really springs to my mind. I will add a test to lld, though.
llvm-svn: 350010
trunc (add X, C ) --> add (trunc X), C'
If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type.
This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine).
This change used to show regressions for x86, but those are gone after D55494.
This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic)
that does almost the same thing.
Differential Revision: https://reviews.llvm.org/D55866
llvm-svn: 350006
The missed load folding noticed in D55898 is visible independent of that change
either with an adjusted IR pattern to start or with AVX2/AVX512 (where the build
vector becomes a broadcast first; movddup is not produced until we get into isel
via tablegen patterns).
Differential Revision: https://reviews.llvm.org/D55936
llvm-svn: 350005
Currently the section name (& possibly number) is only printed on
addresses in ranges - but no reason it couldn't also be displayed on
other addresses (like low/high PC).
Refactor in that direction by pulling out the section lookup and name
ambiguity dumping logic into a reusable helper.
llvm-svn: 349995
NVPTXAsmPrinter::doInitialization() was creating an NVPTXSubtarget on
the stack. This object is huge, about 80kb. Also it's slow to create.
And it's all redundant; we have one in NVPTXTargetMachine anyway!
llvm-svn: 349982
Propagate the llvm::Error a little further up. This is NFC for
llvm-dwarfdump in this change, but allows ld.lld to emit more precise
error messages about which object and archive the erroneous DWARF is in.
llvm-svn: 349978
Summary:
The "single parameter" .file directive appears to be an ELF-only feature
that is intended to insert the main source filename into the string
table table.
I noticed that if you assemble an ELF .s file for COFF, typically it
will assert right away on a .file directive near the top of the file. My
first change was to make this emit a proper error in the asm parser so
that we don't assert so easily.
However, COFF actually does have some support for this directive, and if
you emit an object file, llvm-mc does not assert. When emitting a COFF
object, MC will take those file names and create "debug" symbol table
entries for them. I'm not familiar with these kinds of symbol table
entries, and I'm not aware of any users of them, but @compnerd added
them a while ago. They don't introduce absolute paths, and most main
source file paths are short enough that this extra entry shouldn't cause
any problems, so I enabled the flag in MCAsmInfoCOFF that indicates that
it's supported.
This has the side effect of adding an extra debug symbol to every object
produced by clang, which is a pretty big functional change. My question
is, should we keep the functionality or remove it in the name of symbol
table minimalism?
Reviewers: mstorsjo, compnerd
Subscribers: hiraditya, compnerd, llvm-commits
Differential Revision: https://reviews.llvm.org/D55900
llvm-svn: 349976
Summary:
Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling)
The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues:
- communicates overflow back to the user
- supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported.
Reviewers: davidxl, danielcdh, wmi, dblaikie
Reviewed By: dblaikie
Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D55681
llvm-svn: 349973
Originally committed in r349333, reverted in r349353.
GCC emitted these unconditionally on/before 4.4/March 2012
Clang emitted these unconditionally on/before 3.5/March 2014
This improves performance when parsing CUs (especially those using split
DWARF) that contain no code ranges (such as the mini CUs that may be
created by ThinLTO importing - though generally they should be/are
avoided, especially for Split DWARF because it produces a lot of very
small CUs, which don't scale well in a bunch of other ways too
(including size)).
The revert was due to a (Google internal) test that had some checked in old
object files missing DW_AT_ranges. That's since been fixed.
llvm-svn: 349968
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.
This was suggested as a cleanup in D55967.
Differential Revision: https://reviews.llvm.org/D56019
llvm-svn: 349964
This fixes the patterns that have or/and as a root. 'and' is handled differently since thy usually have a CMP wrapped around them.
I had to look for uses of the CF flag because all these nodes have non-standard CF flag behavior. A real or/xor would always clear CF. In practice we shouldn't be using the CF flag from these nodes as far as I know.
Differential Revision: https://reviews.llvm.org/D55813
llvm-svn: 349962
The BEXTR instruction documents the SF bit as undefined.
The TBM BEXTR instruction has the same issue, but I'm not sure how to test it. With the control being an immediate we can determine the sign bit is 0 or the BEXTR would have been removed.
Fixes PR40060
Differential Revision: https://reviews.llvm.org/D55807
llvm-svn: 349956
Summary:
Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing.
This is because the M0 field is of unsigned.
This patch achieves the similar goal as https://reviews.llvm.org/D55241, but keeps the optimization
if the base is known unsigned.
Reviewers:
arsemn
Differential Revision:
https://reviews.llvm.org/D55568
llvm-svn: 349951
Weak symbols are supposed to be supported in the ELF TextAPI
implementation, but the YAML handler didn't read or write the `Weak`
member of ELFSymbol. This change adds the YAML mapping and updates tests
to ensure correct behavior.
Differential Revision: https://reviews.llvm.org/D56020
llvm-svn: 349950
Summary:
BasicAA has special logic for unescaped allocas, which normally applies
equally well to dynamic and static allocas. However, llvm.stackrestore
has the power to end the lifetime of dynamic allocas, without referring
to them directly.
stackrestore is already marked with the most conservative memory
modification attributes, but because the alloca is not escaped, the
normal logic produces incorrect results. I think BasicAA needs a special
case here to teach it about the relationship between dynamic allocas and
stackrestore.
Fixes PR40118
Reviewers: gbiv, efriedma, george.burgess.iv
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D55969
llvm-svn: 349945
Currently, runtime unrolling does not support loops where multiple
exiting blocks exit to the latchExit. Added TODO and other code
clarifications for ConnectProlog code.
llvm-svn: 349944
This is admittedly a narrow fix for the problem:
https://bugs.llvm.org/show_bug.cgi?id=37502
...but as the XOP restriction shows, it's a maze to get this right.
In the motivating example, note that we have movddup before SSE4.1 and
again with AVX2. That's because insertps isn't available pre-SSE41 and
vbroadcast is (more generally) available with AVX2 (and the splat is
reduced to movddup via isel pattern).
Differential Revision: https://reviews.llvm.org/D55898
llvm-svn: 349937
This adds support for widening G_FCEIL in LegalizerHelper and
AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to
widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't
support full FP 16.
This also updates AArch64/f16-instructions.ll to show that we perform the
correct transformation.
llvm-svn: 349927
-print-after IR printing generally can not print the IR unit (Loop or SCC)
which has just been invalidated by the pass. However, when working in -print-module-scope
mode even if Loop was invalidated there is still a valid module that we can print.
Since we can not access invalidated IR unit from AfterPassInvalidated instrumentation
point we can remember the module to be printed *before* pass. This change introduces
BeforePass instrumentation that stores all the information required for module printing
into the stack and then after pass (in AfterPassInvalidated) just print whatever
has been placed on stack.
Reviewed By: philip.pfaffe
Differential Revision: https://reviews.llvm.org/D55278
llvm-svn: 349896
- When signing return addresses with -msign-return-address=<scope>{+<key>},
either the A key instructions or the B key instructions can be used. To
correctly authenticate the return address, the unwinder/debugger must know
which key was used to sign the return address.
- When and exception is thrown or a break point reached, it may be necessary to
unwind the stack. To accomplish this, the unwinder/debugger must be able to
first authenticate an the return address if it has been signed.
- To enable this, the augmentation string of CIEs has been extended to allow
inclusion of a 'B' character. Functions that are signed using the B key
variant of the instructions should have and FDE whose associated CIE has a 'B'
in the augmentation string.
- One must also be able to preserve these semantics when first stepping from a
high level language into assembly and then, as a second step, into an object
file. To achieve this, I have introduced a new assembly directive
'.cfi_b_key_frame ', that tells the assembler the current frame uses return
address signing with the B key.
- This ensures that the FDE is associated with a CIE that has 'B' in the
augmentation string.
Differential Revision: https://reviews.llvm.org/D51798
llvm-svn: 349895
It seems better to avoid using the callback if possible since
there are coverage assertions which are disabled if this is used.
Also fix missing tests. Only test the legal cases since it seems
legalization for build_vector is quite lacking.
llvm-svn: 349878
This verification is linear in the size of the function, so it can cause
a quadratic compile-time explosion in a function with many loops to
unroll.
Differential Revision: https://reviews.llvm.org/D54732
llvm-svn: 349871
This shortens the switches in X86ISelDAGToDAG.cpp to only need to check condition code instead of a list of opcodes.
This also fixes a bug where the memory forms of SETcc were missing from hasNoCarryFlagUses.
llvm-svn: 349868
This saves materializing the immediate. The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.
I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.
Differential Revision: https://reviews.llvm.org/D55630
llvm-svn: 349857
Summary:
This function is very similar to add_llvm_library(), so this patch merges it
into add_llvm_library() and replaces all calls to add_llvm_loadable_module(lib ...)
with add_llvm_library(lib MODULE ...)
Reviewers: philip.pfaffe, beanz, chandlerc
Reviewed By: philip.pfaffe
Subscribers: chapuni, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D51748
llvm-svn: 349839
If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end
up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback.
Add it to the switch, and add a test verifying that this happens.
llvm-svn: 349822
When deciding lazily whether a CU would be split or non-split I
accidentally dropped some handling for the line tables comp_dir (by
doing it lazily it was too late to be handled properly by the MC line
table code).
Move that bit of the code back to the non-lazy place.
llvm-svn: 349819
We have to treat constructs like this as if they were "symbolic", to use
the correct codepath to resolve them. This mostly only affects movz
etc. because the other uses of classifySymbolRef conservatively treat
everything that isn't a constant as if it were a symbol.
Differential Revision: https://reviews.llvm.org/D55906
llvm-svn: 349800
This requires a bit more code than other fixups, to distingush between
abs_g0/abs_g1/etc. Actually, I think some of the other fixups are
missing some checks, but I won't try to address that here.
I haven't seen any real-world code that uses a construct like this, but
it clearly should work, and we're considering using it in the
implementation of localescape/localrecover on Windows (see
https://reviews.llvm.org/D53540). I've verified that binutils produces
the same code as llvm-mc for the testcase.
This currently doesn't include support for the *_s variants (that
requires a bit more work to set the opcode).
Differential Revision: https://reviews.llvm.org/D55896
llvm-svn: 349799
If we found unsafe dependences other than 'unknown', we already know at
compile time that they are unsafe and the runtime checks should always
fail. So we can avoid generating them in those cases.
This should have no negative impact on performance as the runtime checks
that would be created previously should always fail. As a sanity check,
I measured the test-suite, spec2k and spec2k6 and there were no regressions.
Reviewers: Ayal, anemet, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D55798
llvm-svn: 349794
Summary:
Add PLATFORM constants for iOS, tvOS, and watchOS simulators, as well
as human readable names for these constants, to the Mach-O file format
header files.
rdar://46854119
Reviewers: ab, davide
Reviewed By: ab, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55905
llvm-svn: 349779
Build llvm with assertion on, and then build bcc against this llvm.
Run any bcc tool with debug=8 (turning on -g for clang compilation),
you will get the following assertion errors,
/home/yhs/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:888:
void llvm::RuntimeDyldELF::resolveBPFRelocation(const llvm::SectionEntry&, uint64_t,
uint64_t, uint32_t, int64_t): Assertion `Value <= (4294967295U)' failed.
The .BTF.ext ELF section uses Fixup's to get the instruction
offsets. The data width of the Fixup is 4 bytes since we only need
the insn offset within the section.
This caused the above error though since R_BPF_64_32 expects
4-byte value and the Runtime Dyld tried to resolve the actual
insn address which is 8 bytes.
Actually the offset within the section is all what we need.
Therefore, there is no need to perform any kind of relocation
for .BTF.ext section and such relocation will actually cause
incorrect result.
This patch changed BPFELFObjectWriter::getRelocType() such that
for Fixup Kind FK_Data_4, if the relocation Target is a temporary
symbol, let us skip the relocation (ELF::R_BPF_NONE).
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 349778
Emit static locals within the correct lexical scope so variables with the same
name will not confuse the debugger into getting the wrong value.
Differential Revision: https://reviews.llvm.org/D55336
llvm-svn: 349777
This patch enables funnel shift -> rotate building for all ROTL/ROTR custom/legal operations.
AFAICT X86 was the last target that was missing modulo support (PR38243), but I've tried to CC stakeholders for every target that has ROTL/ROTR custom handling for their final OK.
Differential Revision: https://reviews.llvm.org/D55747
llvm-svn: 349765
This is a update to D43157 to correctly handle fixup_riscv_pcrel_lo12.
Notable changes:
Rebased onto trunk
Handle and test S-type
Test case pcrel-hilo.s is merged into relocations.s
D43157 description:
VK_RISCV_PCREL_LO has to be handled specially. The MCExpr inside is
actually the location of an auipc instruction with a VK_RISCV_PCREL_HI fixup
pointing to the real target.
Differential Revision: https://reviews.llvm.org/D54029
Patch by Chih-Mao Chen and Michael Spencer.
llvm-svn: 349764
There are several vector instructions which may or may not set the
condition code register, depending on the value of an argument.
For codegen, we use two versions of the instruction, one that sets
CC and one that doesn't, which hard-code appropriate values of that
argument. But we also have a "generic" version of the instruction
that is used for the assembler/disassembler. These generic versions
should always be considered to clobber CC just to be safe.
llvm-svn: 349761
This patch fixes two deficiencies in current code that recognizes
the VLLEZ idiom:
- For the floating-point versions, we have ISel patterns that match
on a bitconvert as the top node. In more complex cases, that
bitconvert may already have been merged into something else.
Fix the patterns to match the inner nodes instead.
- For the 64-bit integer versions, depending on the surrounding code,
we may get either a DAG tree based on JOIN_DWORDS or one based on
INSERT_VECTOR_ELT. Use a PatFrags to simply match both variants.
llvm-svn: 349749
Current code in SystemZDAGToDAGISel::tryGather refuses to perform
any transformation if the Load SDNode has more than one use. This
(erronously) counts uses of the chain result, which prevents the
optimization in many cases unnecessarily. Fixed by this patch.
llvm-svn: 349748
We already have special code (DAG combine support for FP_ROUND)
to recognize cases where we an use a vector version of VLEDB to
perform two floating-point truncates in parallel, but equivalent
support for VLEDB (vector floating-point extends) has been
missing so far. This patch adds corresponding DAG combine
support for FP_EXTEND.
llvm-svn: 349746
These tools were assuming ABI version is 0,
that is not always true.
Patch teaches them to work with that field.
Differential revision: https://reviews.llvm.org/D55884
llvm-svn: 349737
LLVM treats void* pointers passed to assembly routines as pointers to
sized types.
We used to emit calls to __msan_instrument_asm_load() for every such
void*, which sometimes led to false positives.
A less error-prone (and truly "conservative") approach is to unpoison
only assembly output arguments.
llvm-svn: 349734
Summary:
This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp
in just two loads on X86. These were previously calling memcmp.
Reviewers: spatel, gchatelet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55263
llvm-svn: 349731
Summary:
PowerPC has scalar selects (isel) and vector mask selects (xxsel). But PowerPC
does not have vector CR selects, PowerPC does not support scalar condition
selects on vectors.
In addition to implementing this hook, isSelectSupported() should return false
when the SelectSupportKind is ScalarCondVectorVal, so that predictable selects
are converted into branch sequences.
Reviewed By: steven.zhang, hfinkel
Differential Revision: https://reviews.llvm.org/D55754
llvm-svn: 349727
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.
This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).
This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.
The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.
Differential Revision: https://reviews.llvm.org/D52116
llvm-svn: 349725
Summary:
This is a code size savings and is also important to get runnable code
while engines do not support v128.const.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55910
llvm-svn: 349724
Summary:
Gates v128.const, f32x4.sqrt, f32x4.div, i8x16.extract_lane_u, and
i16x8.extract_lane_u on the --wasm-enable-unimplemented-simd flag,
since these ops are not implemented yet in V8.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55904
llvm-svn: 349720
This code pattern is an unfortunate side effect of the way some types get split
at call lowering. Ideally we'd either not generate it at all or combine it away
in the legalizer artifact combiner.
Until then, add selection support anyway which is a significant proportion of
our current fallbacks on CTMark.
rdar://46491420
llvm-svn: 349712
Summary:
On non-Windows these are already removed by ShouldInstrumentGlobal.
On Window we will wait until we get actual issues with that.
Reviewers: pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D55899
llvm-svn: 349707
Summary:
ICF prevented by removing unnamed_addr and local_unnamed_addr for all sanitized
globals.
Also in general unnamed_addr is not valid here as address now is important for
ODR violation detector and redzone poisoning.
Before the patch ICF on globals caused:
1. false ODR reports when we register global on the same address more than once
2. globals buffer overflow if we fold variables of smaller type inside of large
type. Then the smaller one will poison redzone which overlaps with the larger one.
Reviewers: eugenis, pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D55857
llvm-svn: 349706
Creating the IR builder, then modifying the CFG, leads to an IRBuilder
where the BB and insertion point are inconsistent, so new instructions
have the wrong parent.
Modified an existing test because the test wasn't covering anything
useful (the "invoke" was not actually an invoke by the time we hit the
code in question).
Differential Revision: https://reviews.llvm.org/D55729
llvm-svn: 349693
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.
BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.
The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.
The test case has a couple of cases that are not simplified yet. In
particular, we're only looking at uses of instructions right now. I think
it would make sense to also extend this to arguments. Furthermore
DemandedBits doesn't yet know some of the tricks that InstCombine does
for the demanded bits or bitwise or/and/xor in combination with known
bits information.
Differential Revision: https://reviews.llvm.org/D55563
llvm-svn: 349674
This is to address post-commit feedback from Paul Robinson on r348954.
The original commit misinterprets count and upper bound as the same thing (I thought I saw GCC producing an upper bound the same as Clang's count, but GCC correctly produces an upper bound that's one less than the count (in C, that is, where arrays are zero indexed)).
I want to preserve the C-like output for the common case, so in the absence of a lower bound the count (or one greater than the upper bound) is rendered between []. In the trickier cases, where a lower bound is specified, a half-open range is used (eg: lower bound 1, count 2 would be "[1, 3)" and an unknown parts use a '?' (eg: "[1, ?)" or "[?, 7)" or "[?, ? + 3)").
Reviewers: aprantl, probinson, JDevlieghere
Differential Revision: https://reviews.llvm.org/D55721
llvm-svn: 349670
Summary:
The LTO/ThinLTO driver currently creates invalid bitcode by setting
symbols marked dllimport as dso_local. The compiler often has access
to the definition (often dllexport) and the declaration (often
dllimport) of an object at link-time, leading to a conflicting
declaration. This patch resolves the inconsistency by removing the
dllimport attribute.
Reviewers: tejohnson, pcc, rnk, echristo
Reviewed By: rnk
Subscribers: dmikulin, wristow, mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D55627
llvm-svn: 349667
This adds a G_FCEIL generic instruction and uses it in AArch64. This adds
selection for floating point ceil where it has a supported, dedicated
instruction. Other cases aren't handled here.
It updates the relevant gisel tests and adds a select-ceil test. It also adds a
check to arm64-vcvt.ll which ensures that we don't fall back when we run into
one of the relevant cases.
llvm-svn: 349664
The (cmp (and X, Y) 0) pattern is greedy and ends up forming a TESTrr and consuming the and when it might be better to use one of the BMI/TBM like BLSR or BLSI.
This patch moves removes the pattern from isel and adds a post processing check to combine TESTrr+ANDrr into just a TESTrr. With this patch we are able to select the BMI/TBM instructions, but we'll also emit a TESTrr when the result is compared to 0. In many cases the peephole pass will be able to use optimizeCompareInstr to remove the TEST, but its probably not perfect.
Differential Revision: https://reviews.llvm.org/D55870
llvm-svn: 349661
Fixes https://bugs.llvm.org/show_bug.cgi?id=38743
The function removeRedundantBlockingStores is supposed to remove any blocking stores contained in each other in lockingStoresDispSizeMap.
But it currently looks only at the previous one, which will miss some cases that result in assert.
This patch refine the function to check all previous layouts until find the uncontained one. So all redundant stores will be removed.
Patch by Pengfei Wang
Differential Revision: https://reviews.llvm.org/D55642
llvm-svn: 349660
Summary:
Import libraries as created by llvm-dlltool always use the same archive
member name for every object file (namely, the DLL library name). Ensure
that long names are not repeatedly stored in the string table.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D55860
llvm-svn: 349637
Now that we use the generic ISD opcodes, we can use the generic intrinsics directly as well. This fixes the poor fast-isel codegen by not expanding to an easily broken IR code sequence.
I'm intending to deal with the signed saturation equivalents as well.
Clang counterpart: https://reviews.llvm.org/D55879
Differential Revision: https://reviews.llvm.org/D55855
llvm-svn: 349630
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.
This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.
I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases.
Differential Revision: https://reviews.llvm.org/D55822
llvm-svn: 349629
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.
This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.
Differential Revision: https://reviews.llvm.org/D55822
llvm-svn: 349628
As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero.
SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue.
Thanks to @dmgreen for catching this.
Differential Revision: https://reviews.llvm.org/D55883
llvm-svn: 349625
This relands r330742:
"""
Let TableGen write output only if it changed, instead of doing so in cmake.
Removes one subprocess and one temp file from the build for each tablegen
invocation.
No intended behavior change.
"""
In particular, if you see rebuilds after this change that you didn't see
before this change, that's unintended and it's fine to revert this change
again (but let me know).
r330742 got reverted because some people reported that llvm-tblgen ran on every
build after it. This could happen if the depfile output got deleted without
deleting the main .inc output. To fix, make TableGen always write the depfile,
but keep writing the main .inc output only if it has changed. This matches what
we did in cmake before.
Differential Revision: https://reviews.llvm.org/D55842
llvm-svn: 349624
Summary:
Using HI here makes no logical sense, since the dword is only
32 bits to begin with.
Current Mesa master does not look at the relocation type at all,
so this change is fine. Future Mesa will rely on this, however.
Change-Id: I91085707834c4ac0370926602b93c94b90e44cb1
Reviewers: arsenm, rampitec, mareko
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D55369
llvm-svn: 349620
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs.
This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument.
I've updated SelectionDAG::simplifyShift to demonstrate its use.
Differential Revision: https://reviews.llvm.org/D55819
llvm-svn: 349616
Summary:
Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged.
This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds.
Irreducible loop test has been extended based on a CTS failure detected for GFX9.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D55602
llvm-svn: 349611
All we have to do is mark it as legal.
This allows us to select a lot of new patterns handled by TableGen. This
patch adds tests for them and splits up the existing test file for
binary operators into 2 files, one for arithmetic ops and one for
logical ones.
llvm-svn: 349610
This is an initial implementation of no-op passthrough copying of COFF
with objcopy.
Differential Revision: https://reviews.llvm.org/D54939
llvm-svn: 349605
For type v4i32/v8ii16/v16i8, do following transforms:
(vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b)
(vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b)
Differential Revision: https://reviews.llvm.org/D55812
llvm-svn: 349599
This patch moved the following files in lib/CodeGen/AsmPrinter/
AsmPrinterHandler.h
DbgEntityHistoryCalculator.h
DebugHandlerBase.h
to include/llvm/CodeGen directory.
Such a change will enable Target to extend DebugHandlerBase
and emit Target specific debug info sections.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D55755
llvm-svn: 349564
Clang uses weak linkage for objc runtime functions when they are not available on the platform.
The intrinsic has this linkage so we just need to pass that on to the runtime call.
llvm-svn: 349559
For performance reasons, clang set nonlazybind on these functions. Now that we
are using intrinsics instead of runtime calls, we should set this attribute when
creating the runtime functions.
llvm-svn: 349558
This patch adds a VectorizationSafetyStatus enum, which will be extended
in a follow up patch to distinguish between 'safe with runtime checks'
and 'known unsafe' dependences.
Reviewers: anemet, anna, Ayal, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D54892
llvm-svn: 349556
Summary:
unnamed_addr is still useful for detecting of ODR violations on vtables
Still unnamed_addr with lld and --icf=safe or --icf=all can trigger false
reports which can be avoided with --icf=none or by using private aliases
with -fsanitize-address-use-odr-indicator
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: kubamracek, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D55799
llvm-svn: 349555
SelectionDAG currently changes these intrinsics to function calls, but that won't work
for other ISel's. Also we want to eventually support nonlazybind and weak linkage coming
from the front-end which we can't do in SelectionDAG.
llvm-svn: 349552
Looks like there are valid reasons why we need to allow bitcasts in llvm.asan.globals, see discussion at https://github.com/apple/swift-llvm/pull/133. Let's look through bitcasts when iterating over entries in the llvm.asan.globals list.
Differential Revision: https://reviews.llvm.org/D55794
llvm-svn: 349544