A block ending in an unconditional branch can have two successors if one
is a landing pad. In practice, I think this only has an effect on
Windows because landing pads are never empty for Itanium unwinding.
(Alternatively, I could add a check to
AArch64InstrInfo::canInsertSelect, but this seems more obvious.)
Differential Revision: https://reviews.llvm.org/D56468
llvm-svn: 351142
Otherwise, with D56544, the intrinsic will be expanded to an integer
csel, which is probably not what the user expected. This matches the
general convention of using "v1" types to represent scalar integer
operations in vector registers.
While I'm here, also add some error checking so we don't generate
illegal ABS nodes.
Differential Revision: https://reviews.llvm.org/D56616
llvm-svn: 351141
This feature enables the fusion of some arithmetic and logic instructions
together.
Differential revision: https://reviews.llvm.org/D56572
llvm-svn: 351139
official Git repository.
Remove the directions for using git-svn, and demote the prominence of
the svn instructions.
Also, fix a few other issues while I'm in there:
* Mention LLVM_ENABLE_PROJECTS more.
* Getting started doesn't need to mention test-suite, but should
mention clang and the other projects.
* Remove mentions of "configure", since that's long gone.
I've also adjusted a few other mentions of svn to point to github, but
have not done so comprehensively.
Differential Revision: https://reviews.llvm.org/D56654
llvm-svn: 351130
Summary:
Normal behavior for GNU ar is to flatten thin archives when adding them to another thin archive, i.e. add the members directly instead of nesting the archive.
Some refactoring done as part of this patch to ease things:
- Consolidate `addMember`/`addLibMember` methods
- Rename `addMember` to `addChildMember` to make it more visibly different at the call site that an archive child is passed instead of a regular member
- Pass in a separate vector and splice it back into position instead of passing a vector + optional Pos (which makes expanding libs tricky)
This fixes PR37530 as raised by https://github.com/ClangBuiltLinux/linux/issues/279.
Reviewers: mstorsjo, pcc, ruiu
Reviewed By: mstorsjo
Subscribers: llvm-commits, tpimh, nickdesaulniers
Differential Revision: https://reviews.llvm.org/D56508
llvm-svn: 351120
Summary:
Use appendToUsed instead of include to ensure that
SanitizerCoverage's constructors are not stripped.
Also, use isOSBinFormatCOFF() to determine if target
binary format is COFF.
Reviewers: pcc
Reviewed By: pcc
Subscribers: hiraditya
Differential Revision: https://reviews.llvm.org/D56369
llvm-svn: 351118
If we have PSHUFB and we're shuffling with a zero vector, then we are better off not doing VECTOR_SHUFFLE(UNPCK()) as we lose track of those zero elements.
llvm-svn: 351103
add (extractelt (X, 0), extractelt (X, 1)) --> extractelt (hadd X, X), 0
This is the integer sibling to D56011.
There's an additional restriction to only to do this transform in the
case where we don't have extra extracts from the source vector. Without
that, we can fail to match larger horizontal patterns that are more
beneficial than this minimal case. An improvement to the more general
h-op lowering may allow us to remove the restriction here in a follow-up.
llvm-svn: 351093
With this patch, shifts are lowered to optimal number of instructions
necessary to shift types larger than the general purpose register size.
This resolves PR/32293.
Thanks to Kyle Butt for reporting the issue!
Differential Revision: https://reviews.llvm.org/D56320
llvm-svn: 351059
Following PR39807, the way in which SimplifyCFG hoists common code on
branch paths was fixed in r347782. However this left extra code hanging
around HoistThenElseCodeToIf that wasn't necessary and needlessly
complicated matters -- we no longer need to look up through the 'if'
basic block to find a location for hoisted 'select' insts, we can instead
use the location chosen by applyMergedLocation.
This patch deletes that extra logic, and updates a regression test to
reflect the new logic (selects get the merged location, not a previous
insts location).
Differential Revision: https://reviews.llvm.org/D55272
llvm-svn: 351058
Make it possible for TableGen to produce code for selecting MOVi32imm.
This allows reasonably recent ARM targets to select a lot more constants
than before.
We achieve this by adding GISelPredicateCode to arm_i32imm. It's
impossible to use the exact same code for both DAGISel and GlobalISel,
since one uses "Subtarget->" and the other "STI." to refer to the
subtarget. Moreover, in GlobalISel we don't have ready access to the
MachineFunction, so we need to add a bit of code for obtaining it from
the instruction that we're selecting. This is also the reason why it
needs to remain a PatLeaf instead of the more specific IntImmLeaf.
llvm-svn: 351056
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.
This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.
This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).
There's an additional fix now to avoid a dmask=0
For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.
Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.
The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:
%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1
This re-submit of the change also includes a slight modification in
SIISelLowering.cpp to work-around a compiler bug for the powerpc_le
platform that caused a buildbot failure on a previous submission.
Differential revision: https://reviews.llvm.org/D48826
Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
Work around for ppcle compiler bug
Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b
llvm-svn: 351054
Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim" and
"no-frame-pointer-elim-non-leaf" to represent three kinds of frame
pointer usage: (all) frames use frame pointer, (non-leaf) frames use
frame pointer, (none) frame use frame pointer. This CL makes the idea
explicit by using only one enum function attribute "frame-pointer"
Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as
llc.
"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still
supported for easy migration to "frame-pointer".
tests are mostly updated with
// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’
grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"
// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’
grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"
Patch by Yuanfang Chen (tabloid.adroit)!
Differential Revision: https://reviews.llvm.org/D56351
llvm-svn: 351049
Introduce GlobalISel pre legalizer pass for MIPS.
It will be used to cope with instructions that require
combining before legalization.
Differential Revision: https://reviews.llvm.org/D56269
llvm-svn: 351046
Support arbitrary suffix when matching FileCheck executable name in
defines.txt to successfully match FileCheck.EXE on Microsoft Windows.
llvm-svn: 351042
Summary:
While the backend code of FileCheck relies on definition of variable
from the command-line to have an equal sign '=' and a variable name
before that, the frontend does not actually enforce it. This leads to
FileCheck crashing when invoked with invalid syntax for the -D option.
This patch adds the missing validation in the frontend. It also makes
the -D option an AlwaysPrefix option to be able to detect -D=FOO as
being a define without variable and -D as missing its value.
Copyright:
- Linaro (changes in version 2 of revision D55940)
- GraphCore (changes in later versions)
Reviewers: jdenny
Subscribers: JonChesterfield, hiraditya, kristina, probinson,
llvm-commits
Differential Revision: https://reviews.llvm.org/D55940
llvm-svn: 351039
This pattern:
t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0>
t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0>
...shows up in PR33758:
https://bugs.llvm.org/show_bug.cgi?id=33758
...although this patch doesn't make any difference to the final result on that yet.
In the affected tests here, it looks like it just makes RA wiggle. But we might
as well squash this to prevent it interfering with other pattern-matching.
Differential Revision:
https://reviews.llvm.org/D56604
llvm-svn: 351008
This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types.
Differential Revision: https://reviews.llvm.org/D56544
llvm-svn: 350998
The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do.
This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there.
Fixes another instruction for PR34877.
llvm-svn: 350994
As discussed on llvm-dev
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>, we have
to be careful when trying to select the *w RV64M instructions. i32 is not a
legal type for RV64 in the RISC-V backend, so operations have been promoted by
the time they reach instruction selection. Information about whether the
operation was originally a 32-bit operations has been lost, and it's easy to
write incorrect patterns.
Similarly to the variable 32-bit shifts, a DAG combine on ANY_EXTEND will
produce a SIGN_EXTEND if this is likely to result in sdiv/udiv/urem being
selected (and so save instructions to sext/zext the input operands).
Differential Revision: https://reviews.llvm.org/D53230
llvm-svn: 350993
This restores support for selecting the SLLW/SRLW/SRAW instructions, which was
removed in rL348067 as the previous patterns made some unsafe assumptions.
Also see the related llvm-dev discussion
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>
Ultimately I didn't introduce a custom SelectionDAG node, but instead added a
DAG combine that inserts an AssertZext i5 on the shift amount for an i32
variable-length shift and also added an ANY_EXTEND DAG-combine which will
instead produce a SIGN_EXTEND for an i32 variable-length shift, increasing the
opportunity to safely select SLLW/SRLW/SRAW.
There are obviously different ways of addressing this (a number discussed in
the llvm-dev thread), so I'd welcome further feedback and comments.
Note that there are now some cases in
test/CodeGen/RISCV/rv64i-exhaustive-w-insts.ll where sraw/srlw/sllw is
selected even though sra/srl/sll could be used without any extra instructions.
Given both are semantically equivalent, there doesn't seem a good reason to
prefer one vs the other. Given that would require more logic to still select
sra/srl/sll in those cases, I've left it preferring the *w variants.
Differential Revision: https://reviews.llvm.org/D56264
llvm-svn: 350992
We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type.
By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned.
llvm-svn: 350989
We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask.
This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that.
This fixes some of PR34877
llvm-svn: 350985
This fixes https://bugs.llvm.org/show_bug.cgi?id=40110.
This implements handling of undef operands for integer intrinsics in
ConstantFolding, in particular for the bitcounting intrinsics (ctpop,
cttz, ctlz), the with.overflow intrinsics, the saturating math
intrinsics and the funnel shift intrinsics.
The undef behavior follows what InstSimplify does for the general cas
e of non-constant operands. For the bitcount intrinsics (where
InstSimplify doesn't do undef handling -- there cannot be a combination
of an undef + non-constant operand) I'm using a 0 result if the intrinsic
is defined for zero and undef otherwise.
Differential Revision: https://reviews.llvm.org/D55950
llvm-svn: 350971
Teach x86 assembly operand parsing to distinguish between assembler
variable assigned to named registers and those assigned to immediate
values.
Reviewers: rnk, nickdesaulniers, void
Subscribers: hiraditya, jyknight, llvm-commits
Differential Revision: https://reviews.llvm.org/D56287
llvm-svn: 350966
Summary:
When legalizing the result of a SELECT_CC node by promoting the
floating-point type, use the promoted-to type rather than the original
type.
Fix PR40273.
Reviewers: efriedma, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56566
llvm-svn: 350951
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).
The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.
This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.
Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.
Reviewers: pcc
Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D53890
llvm-svn: 350948
Summary:
As pointed out in D53667, our use of hyphens in flags can be inconsistent, mixing `-` with `--`. This change makes all long style flags use `--`.
Automatically changed via:
```
find test/tools/llvm-objcopy/ELF -type f | xargs sed -i 's/ -\([a-zA-Z]\{3\}\)/ --\1/g'
```
Two false positives were manually fixed/reverted.
Reviewers: jhenderson, espindola, alexshap
Reviewed By: jhenderson
Subscribers: emaste, javed.absar, arichardson, fedor.sergeev, jakehehrlich, llvm-commits
Differential Revision: https://reviews.llvm.org/D56513
llvm-svn: 350944
MergeFunc only deletes unused duplicate functions if they have local
linkage, but it should be safe to relax this to any "discardable if
unused" linkage type.
Differential Revision: https://reviews.llvm.org/D56574
llvm-svn: 350939
Currently when a select has a constant value in one branch and the select feeds
a conditional branch (via a compare/ phi and compare) we unfold the select
statement. This results in threading the conditional branch later on. Similar
opportunity exists when a select (with a constant in one branch) feeds a
switch (via a phi node). The patch unfolds select under this condition.
A testcase is provided.
llvm-svn: 350931
Previously, we limited this transform to cases where the
extraction into the build vector happens from vectors of
the same type as the build vector, but that's not required.
There's a slight potential regression seen in the AVX512
result for phadd -- we're using the 256-bit flavor of the
instruction now even though the 128-bit subset is sufficient.
The same problem could already be seen in the AVX2 result.
Follow-up patches will attempt to narrow that back down.
llvm-svn: 350928
Previously, this was broken - by setting PointerToSymbolTable to zero
but still actually writing the string table length, the object file
header was corrupted.
Differential Revision: https://reviews.llvm.org/D56584
llvm-svn: 350926
We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions.
Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector.
This fixes some of the regressions from D350800.
llvm-svn: 350918
Summary:
We now use __stack_pointer global and global.get/global.set instruction.
This fixes the checking routine for stack_pointer writes accordingly.
This also fixes the existing __stack_pointer test in reg-stackify.ll:
That test used to pass not because of __stack_pointer clashes but
because the function `stackpointer_callee` was not marked as `readnone`,
so it was assumed to possibly write to memory arbitraily, and
`global.set` instruction was marked as `mayStore` in the .td definition,
so they were identified as intervening writes. After we added `readnone`
to its attribute, this test fails without this patch.
Reviewers: dschuff, sunfish
Subscribers: jgravelle-google, sbc100, llvm-commits
Differential Revision: https://reviews.llvm.org/D56094
llvm-svn: 350906
* Teach AsmParser to recognize @rn in distination operand as 0(rn).
* Do not allow Disassembler decoding instructions that have size more
than a number of input bytes.
* Fix UB in MSP430MCCodeEmitter.
Patch by Kristina Bessonova!
Differential Revision: https://reviews.llvm.org/D56547
llvm-svn: 350903
Summary:
This is a third attempt, but this time we have vetted it on Windows
first. The previous errors were due to an uninitialized class member.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D56560
llvm-svn: 350901
Summary:
Step 2 in using MemorySSA in LICM:
Use MemorySSA in LICM to do sinking and hoisting, all under "EnableMSSALoopDependency" flag.
Promotion is disabled.
Enable flag in LICM sink/hoist tests to test correctness of this change. Moved one test which
relied on promotion, in order to test all sinking tests.
Reviewers: sanjoy, davide, gberry, george.burgess.iv
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D40375
llvm-svn: 350879
This extends to combineVSelectToShrunkBlend to be able to resimplify SHRUNKBLENDS that have already been created.
This should help some of the regressions from D56387
Differential Revision: https://reviews.llvm.org/D56421
llvm-svn: 350875
`FILECHECK_OPTS` into environment for FileCheck tests.
Summary:
This fixes the following FileCheck tests:
* FileCheck/dump-input-enable.txt
* FileCheck/match-full-lines.txt
when `FILECHECK_DUMP_INPUT_ON_FAILURE` is set in the environment.
By default llvm-lit propagates `FILECHECK_DUMP_INPUT_ON_FAILURE` and
`FILECHECK_OPTS` from llvm-lit's environment into the test environment.
Unfortunately this can break FileCheck's tests because they expect that
these environment variables not to be set.
rdar://problem/47176262
Reviewers: jdenny, probinson, george.karpenkov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56541
llvm-svn: 350850
In an assert build, the Error gets destroyed and we get "Program aborted
due to an unhandled Error:".
In release, we get an empty message.
llvm-svn: 350848
This is https://bugs.llvm.org/show_bug.cgi?id=26892,
GNU objdump hides the special symbol entry:
SYMBOL TABLE:
000000000000a7e0 l F .text 00000000000003f9 bi_copymodules
while llvm-objdump does not:
SYMBOL TABLE:
0000000000000000 *UND* 00000000
000000000000a7e0 l F .text 000003f9 bi_copymodules
Patch makes the behavior of the llvm-objdump to be consistent with the GNU objdump.
Differential revision: https://reviews.llvm.org/D56076
llvm-svn: 350840
This commit fixes the dwordx3/southern-islands failures that were found
in bugzilla https://bugs.llvm.org/show_bug.cgi?id=40129, by not
generating the dwordx3 variants of load/store instructions that were
added to the ISA after southern islands.
Differential Revision: https://reviews.llvm.org/D56434
llvm-svn: 350838
This further improves compatibility with GNU as, allowing input such as the
following to be assembled:
.equ CONST, 0x123456
li a0, CONST
addi a0, a0, %lo(CONST)
.equ CONST, 1
slli a0, a0, CONST
Note that we don't have perfect compatibility with gas, as it will avoid
emitting a relocation in this case:
addi a0, a0, %lo(CONST2)
.equ CONST2, 0x123456
Thanks to Shiva Chen for suggesting a better way to approach this during review.
Differential Revision: https://reviews.llvm.org/D52298
llvm-svn: 350831
When we use the partial-matching function on a 128-bit chunk, we must
account for the possibility that we've matched undef halves of the
original source vectors, so the outputs may need to be reset.
This should allow closing PR40243:
https://bugs.llvm.org/show_bug.cgi?id=40243
llvm-svn: 350830
This is a partial fix for:
https://bugs.llvm.org/show_bug.cgi?id=40243
...as seen in the integer test, we still need to correct the result when using the
existing (old) horizontal op matching function because it does not model the way
x86 256-bit horizontal ops return results (each 128-bit half is its own horizontal-op).
A potential follow-up change for that is discussed in the bug report - see also D56490.
This generally duplicates a lot of the existing matching code, but we can't just remove
that without introducing regressions, so the existing code is renamed and used less often.
Follow-ups may try to reduce that overlap.
Differential Revision: https://reviews.llvm.org/D56450
llvm-svn: 350826
Summary:
This patch changes the legalization action for some half-precision floating-
point vector intrinsics (FSIN, FLOG, etc.) from Promote to Expand. These ops
are not supported in hardware for half-precision vectors, but promotion is
not always possible (for v8f16 operands). Changing the action to Expand fixes
an assertion failure in the legalizer when the frontend produces such ops.
In addition, a quick microbenchmark shows that, in the v4f16 case,
expanding introduces fewer spills and is therefore slightly faster than
promoting.
Reviewers: t.p.northover, SjoerdMeijer
Reviewed By: SjoerdMeijer
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D56296
llvm-svn: 350825
This is https://bugs.llvm.org/show_bug.cgi?id=37151,
GNU objdump spec says that "Normally the disassembly output will skip blocks of zeroes.",
but currently, llvm-objdump prints them.
The patch implements the -z/--disassemble-zeroes option and switches the default to always
skip blocks of zeroes.
Differential revision: https://reviews.llvm.org/D56083
llvm-svn: 350823
See https://bugs.llvm.org/show_bug.cgi?id=40070.
GNU addr2line accepts input addresses both on the command-line and via
stdin. llvm-symbolizer previously only supported the latter. This
change adds support for the former. As with addr2line, the new
behaviour is to only look for addresses on stdin if no positional
arguments were provided to llvm-symbolizer.
Reviewed by: ruiu
Differential Revision: https://reviews.llvm.org/D56272
llvm-svn: 350821
Add t2TEQrr to the map of instructions with can be reduced down into
a T1 instruction. This is a special case because TEQ just sets the
CPSR and doesn't write to a GPR, which is not the case for EOR. So,
we need to ensure that the EOR can write to the first operand.
Differential Revision: https://reviews.llvm.org/D56255
llvm-svn: 350801
Summary:
This pass replaces GR8/GR16/GR32/GR64 with their equivalent sized mask register classes. But VK32/VK64 aren't legal without AVX512BW. Apparently this mostly appears to work if the register coalescer is able to remove the VK32/VK64 register class reference. Or if we don't ever spill it. But there's no guarantee of that.
Another Intel employee managed to trigger a crash due to this with ISPC. Unfortunately, I've lost the test case he sent me at the time. I'm trying to get him to reproduce it for me. I'd like to get this in before 8.0 branches since its a little scary.
The regressions here are unfortunate, but I think we can make some improvements to DAG combine, load folding, etc. to fix them. Just not sure if we can get that done for 8.0.
Fixes PR39741
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56460
llvm-svn: 350800
Summary:
D55896 and D56029 add support to emit fixups for :abs_g0: , :abs_g1_s: , etc.
This patch adds the necessary enums and MCExpr needed for lowering these.
Reviewers: rnk, mstorsjo, efriedma
Reviewed By: efriedma
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D56037
llvm-svn: 350798
This is a second attempt at r350778, which was reverted in
r350789. The only change is that the unimplemented-simd128 feature has
been renamed simd128-unimplemented, since naming it
unimplemented-simd128 somehow made the simd128 feature flag enable the
unimplemented-simd128 feature on Windows.
llvm-svn: 350791
Summary:
This replaces the old ad-hoc -wasm-enable-unimplemented-simd
flag. Also makes the new unimplemented-simd128 feature imply the
simd128 feature.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton
Differential Revision: https://reviews.llvm.org/D56501
llvm-svn: 350778
The C standard says "The memchr function locates the first
occurrence of c (converted to an unsigned char)[...]". The expansion
was missing the conversion to unsigned char.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39041 .
Differential Revision: https://reviews.llvm.org/D55947
llvm-svn: 350775
If the caller's return type does not have a zeroext attribute but the
callee does a tail call zeroext, we won't consider the tail call during
CodeGenPrepare because the attributes don't match.
However, if the result of the tail call has no uses, it makes sense to
drop the sext/zext attributes.
Differential Revision: https://reviews.llvm.org/D56486
llvm-svn: 350753
Summary:
For some reason the backend assumed that the condition mask would be
the first argument to the LLVM intrinsic, but everywhere else the
condition mask is the third argument.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56412
llvm-svn: 350746
NVPTX format requires that no labels/label arithmetics is used in the
debug info sections. To avoid possible problems with the adding/modifying the debug info functionality, made these tests more strict.
llvm-svn: 350731
This is an initial implementation for Speculative Load Hardening for
AArch64. It builds on top of the recently introduced
AArch64SpeculationHardening pass.
This doesn't implement (yet) some of the optimizations implemented for
the X86SpeculativeLoadHardening pass. I thought introducing the
optimizations incrementally in follow-up patches should make this easier
to review.
Differential Revision: https://reviews.llvm.org/D55929
llvm-svn: 350729
When GNU objdump dumps the input with -d it prints the symbol addresses,
for example:
0000000000000031 <foo>:
31: 00 00 add %al,(%rax)
...
llvm-objdump currently does not do that.
Patch changes the behavior to match the GNU objdump.
That is useful for implementing -z/--disassemble-zeroes (D56083),
it allows omitting first zero bytes and keep the information
about the symbol address in the output.
Differential revision: https://reviews.llvm.org/D56123
llvm-svn: 350726
Fixed issue with identity values and other cases, f32/f16 identity values to be added later. fma/mac instructions is disabled for now.
Test is fully reworked, added comments. Other fixes:
1. dpp move with uses and old reg initializer should be in the same BB.
2. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Othervise the old register value is checked for identity.
3. Added add, subrev, and, or instructions to the old folding function.
4. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user.
Differential revision: https://reviews.llvm.org/D55444
llvm-svn: 350721
Follow up patch of rL350385, for adding predres
command line option. This patch renames the
feature as to keep it aligned with the option
passed by/to clang
Differential Revision: https://reviews.llvm.org/D56484
llvm-svn: 350702
Summary:
This fixes PR39710. In that case we emitted a location list looking like
this:
.Ldebug_loc0:
.quad .Lfunc_begin0-.Lfunc_begin0
.quad .Lfunc_begin0-.Lfunc_begin0
.short 1 # Loc expr size
.byte 85 # DW_OP_reg5
.quad .Lfunc_begin0-.Lfunc_begin0
.quad .Lfunc_end0-.Lfunc_begin0
.short 1 # Loc expr size
.byte 85 # super-register DW_OP_reg5
.quad 0
.quad 0
As seen, the first entry's beginning and ending addresses evalute to 0,
which meant that the entry inadvertently became an "end of list" entry,
resulting in the location list ending sooner than expected.
To fix this, omit all entries with empty ranges. Location list entries
with empty ranges do not have any effect, as specified by DWARF, so we
might as well drop them:
"A location list entry (but not a base address selection or end of list
entry) whose beginning and ending addresses are equal has no effect
because the size of the range covered by such an entry is zero."
Reviewers: davide, aprantl, dblaikie
Reviewed By: aprantl
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D55919
llvm-svn: 350698
Bad machine code: Illegal virtual register for instruction
function: TestULE
basic block: %bb.0 entry (0x1000a39b158)
instruction: %2:crrc = FCMPUD %1:vsfrc, %3:f8rc
operand 1: %1:vsfrc
Fix assert about missing match between fcmp instruction and register class.
We should use vsx related cmp instruction xvcmpudp instead of fcmpu when vsx is opened.
add -verifymachineinstrs option into related test cases to enable the verify pass.
Differential Revision: https://reviews.llvm.org/D55686
llvm-svn: 350685
Currently it's possible for following
check on V.WriteLanes (which is not really meaningful
during SubRangeJoin) to pass for one half of the pair,
and then fall through to to one of the impossible
or unresolved states. This then fails as inconsistent
on the other half.
During the main range join, the check between V.WriteLanes
and OtherV.ValidLanes must have passed, meaning this
should be a CR_Replace.
Fixes most of the testcases in bugs 39542 and 39602
llvm-svn: 350678
We can't go back and recover the lanes if it turns
out the implicit_def really can't be erased.
Assume all lanes are valid if an unresolved conflict
is encountered. There aren't any tests where this
seems to matter either way, but this seems like a
safer option.
Fixes bug 39602
llvm-svn: 350676
This patch improves llvm-profdata show command:
(1) add -value-cutoff=<N> option: Show only those functions whose max count
values are greater or equal to N.
(2) add -list-below-cutoff option: Only output names of functions whose max
count value are below the cutoff.
(3) formats value-profile counts and prints out percentage.
Differential Revision: https://reviews.llvm.org/D56342
llvm-svn: 350673
This is matching the equivalent of the DAG expansion,
so it should never end up with worse perf than the
original code even if the target doesn't have a rotate
instruction.
llvm-svn: 350672
Summary:
StoreResults pass does not optimize store instructions anymore because
store instructions don't return results values anymore. Now this pass is
used solely for memory intrinsics, so update the pass name accordingly
and fix outdated pass descriptions as well.
This patch does not change any meaningful behavior, but not marked as
NFC because it changes a comment check line in a test case.
Reviewers: dschuff
Subscribers: mgorny, sbc100, jgravelle-google, sunfiish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56093
llvm-svn: 350669
Starting in C++17, MSVC introduced a new mangling for function
parameters that are themselves noexcept functions. This patch
makes llvm-undname properly demangle them.
Patch by Zachary Henkel
Differential Revision: https://reviews.llvm.org/D55769
llvm-svn: 350656
A straightforward port of tsan to the new PM, following the same path
as D55647.
Differential Revision: https://reviews.llvm.org/D56433
llvm-svn: 350647
Summary:
This fixes the IDom for exit blocks and all blocks reachable from the exit blocks, when runtime unrolling under multiexit/exiting case.
We initially had a restrictive check that the IDom is only updated when
it is the header of the loop.
However, we also need to update the IDom to the correct one when the
IDom is any block within the original loop. See added test cases (which
fail dom tree verification without the patch).
Reviewers: reames, mzolotukhin, mkazantsev, hfinkel
Reviewed by: brzycki, kuhar
Subscribers: zzheng, dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D56284
llvm-svn: 350640
Commit f1db33c5c1a9 ("[BPF] Disable relocation for .BTF.ext section")
assigned relocation type R_BPF_NONE if the fixup type
is FK_Data_4 and the symbol is temporary.
The reason is we use FK_Data_4 as a fixup type
for insn offsets in .BTF.ext section.
Just checking whether the symbol is temporary is not enough.
For example, .debug_info may reference some strings whose
fixup is FK_Data_4 with a temporary symbol as well.
To truely reflect the case for .BTF.ext section,
this patch further checks that the section associateed with the symbol
must be SHF_ALLOC and SHF_EXECINSTR, i.e., in the text section.
This fixed the above-mentioned problem.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 350637
Commit rL347861 introduced an unintentional change in the behaviour when
compiling for AArch64 at -O0 with -global-isel=0. Previously, explicitly
disabling GlobalISel resulted in using FastISel but an updated condition
in the commit changed it to using SelectionDAG. The patch fixes this
condition and slightly better organizes the code that chooses the
instruction selector.
Fixes PR40131.
Differential Revision: https://reviews.llvm.org/D56266
llvm-svn: 350626
The new-pm version of DA is untested. Testing requires a printer, so
add that and use it in the existing DA tests.
Differential Revision: https://reviews.llvm.org/D56386
llvm-svn: 350624
For stack frames on the size of a register in x86, a code size optimization
emits "push rax/eax" instead of "sub" for stack allocation. For example:
foo:
.cfi_startproc
BB#0:
pushq %rax
Ltmp0:
.cfi_def_cfa_offset 16
...
.cfi_endproc
However, we are falling back to DWARF in this case because we cannot
encode %rax as a saved register.
This requirement is wrong, since we don't care about the contents of
%rax, it is the equivalent of a sub.
In order to specify that we care about the contents of %rax, we would
need a .cfi_offset %rax, <offset>.
It's also overzealous in the case where there are pushes for callee saved
registers followed by a "push rax/eax" instead of "sub", in which case we should
also be able to encode the callee saved regs and everything else using compact
unwind.
Patch authored by Bruno Cardoso Lopes.
Differential Revision: https://reviews.llvm.org/D13793
llvm-svn: 350623
We have code to split vector splats (of zero and non-zero) for performance
reasons, but it ignores the fact that a store might be truncating.
Actually, truncating stores are formed for vNi8 and vNi16 types. Since the
truncation is from a legal type, the size of the store is always <= 64-bits and
so they don't actually benefit from being split up anyway, so this patch just
disables that transformation.
llvm-svn: 350620
Using a PatLeaf for sext_16_node allowed matching smulbb and smlabb
instructions once the operands had been sign extended. But we also
need to use sext_inreg operands along with sext_16_node to catch a
few more cases that enable use to remove the unnecessary sxth.
Differential Revision: https://reviews.llvm.org/D55992
llvm-svn: 350613
I'm not entirely sure this is the correct thing
to do with the global isel philosophy, but I think
this is necessary to handle how differently SGPRs
are used normally vs. from a condition.
For example, it makes sense to allow a copy
from a VGPR to an SGPR, but it makes no sense
to allow a copy from VGPRs to SGPRs used as
select mask.
This avoids regbankselecting strange code with
a truncate feeding directly into a condition field.
Now a copy is forced from sgpr(s1) to vcc, which is
more sensible to handle.
Some of these issues could probably avoided with making enough
operations resulting in i1 illegal. I think we can't avoid
this register bank for legality.
For example, an i1 and where one source is from a truncate, and
one source is a compare needs some kind of copy inserted to
make sure both are in condition registers.
llvm-svn: 350611
If a copy was needed to handle the condition of brcond, it was being
inserted before the defining instruction. Add tests for iterator edge
cases.
I find the existing code here suspect for the case where it's looking
for terminators that modify the register. It's going to insert a copy
in the middle of the terminators, which isn't allowed (it might be
necessary to have a COPY_terminator if anybody actually needs this).
Also legalize brcond for AMDGPU.
llvm-svn: 350595
merging a src register in ToBeUpdated set.
This is to fix PR40061 related with https://reviews.llvm.org/rL339035.
In https://reviews.llvm.org/rL339035, live interval of source pseudo register
in rematerialized copy may be saved in ToBeUpdated set and its update may be
postponed.
In PR40061, %t2 = %t1 is rematerialized and %t1 is added into toBeUpdated set
to postpone its live interval update. After the rematerialization, the live
interval of %t1 is larger than necessary. Then %t1 is merged into %t3 and %t1
gets removed. After the merge, %t3 contains live interval larger than necessary.
Because %t3 is not in toBeUpdated set, its live interval is not updated after
register coalescing and it will break some assumption in regalloc.
The patch requires the live interval of destination register in a merge to be
updated if the source register is in ToBeUpdated.
Differential revision: https://reviews.llvm.org/D55867
llvm-svn: 350586
The unobufscation support for BCSymbolMaps was the last piece of code
that hasn't been upstreamed yet. This patch contains a reworked version
of the existing code and relevant tests.
Differential revision: https://reviews.llvm.org/D56346
llvm-svn: 350580
In LTO or Thin-lto mode (though linker plugin), the module
names are of temp file names which are different for
different compilations. Using SourceFileName avoids the issue.
This should not change any functionality for current PGO as
all the current callers of getPGOFuncName() is before LTO.
llvm-svn: 350579
As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions.
Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node.
This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input.
Differential Revision: https://reviews.llvm.org/D56087
llvm-svn: 350560
This patch adds the sign/zero extension done by
vgetlane to ARM computeKnownBitsForTargetNode.
Differential revision: https://reviews.llvm.org/D56098
llvm-svn: 350553
Although llvm-elfabi will attempt to read input files without needing the format to be manually specified, doing so has the potential to introduce extraneous errors that can hinder debugging (since multiple readers may fail in attempts to read the file). This change allows the input file format to be manually specified to force elfabi to use a single reader. This makes it easier to test and debug errors specific to a given reader.
llvm-svn: 350545
Summary:
The -O flag is currently being mostly ignored; it's only checked whether or not the output format is "binary". This adds support for a few formats (e.g. elf64-x86-64), so that when specified, the output can change between 32/64 bit and sizes/alignments are updated accordingly.
This fixes PR39135
Reviewers: jakehehrlich, jhenderson, alexshap, espindola
Reviewed By: jhenderson
Subscribers: emaste, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D53667
llvm-svn: 350541
Summary:
If a divergent branch instruction is marked as divergent by propagation
rule 2 in DivergencePropagator::exploreSyncDependency() and its condition
is uniform, that branch would incorrectly be assumed to be uniform.
Reviewers: arsenm, tstellar
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D56331
llvm-svn: 350532
Summary: AVX512VBMI2 supports a funnel shift by immediate and a funnel shift by a variable vector.
Reviewers: spatel, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56361
llvm-svn: 350498
This is especially helpful on targets without avx512bw since we don't have a good way to convert from v16i8/v32i8 to v16i1/v32i1 for the truncate anyway. If we're just going to convert it to a GPR we might as well use pmovmskb to accomplish both.
llvm-svn: 350480
Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink
source node to demanded bits only even if there are other uses.
Differential Revision: https://reviews.llvm.org/D56289
llvm-svn: 350475
Change part of the tests to use vectors (I'm using scalar for ugt
and vector for ult), add multiuse variations, rename %lz to %tz
for the cttz tests.
llvm-svn: 350471
The cttz/ctlz intrinsics have a parameter specifying whether the
result is undefined for zero. cttz(x, false) can be relaxed to
cttz(x, true) if x is known non-zero, and in fact such an optimization
is already performed. However, this currently doesn't work if x is
non-zero as a result of a select rather than an explicit branch.
This patch adds handling for this case, thus allowing
x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y.
Differential Revision: https://reviews.llvm.org/D55786
llvm-svn: 350463
In addition to finding dead uses of instructions, also find dead uses
of function arguments, and replace them with zero as well.
I'm changing the way the known bits are computed here to remove the
coupling between the transfer function and the algorithm. It previously
relied on the first op being visited first and computing known bits --
unless the first op is not an instruction, in which case they're computed
on the second op. I could have adjusted this to check for "instruction
or argument", but I think it's better to avoid the repeated calculation
with an explicit flag.
Differential Revision: https://reviews.llvm.org/D56247
llvm-svn: 350435
This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits.
My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common.
Differential Revision: https://reviews.llvm.org/D56283
llvm-svn: 350432
The problem is similar to D55986 but for threads: a process with the
interceptor hwasan library loaded might have some threads started by
instrumented libraries and some by uninstrumented libraries, and we
need to be able to run instrumented code on the latter.
The solution is to perform per-thread initialization lazily. If a
function needs to access shadow memory or add itself to the per-thread
ring buffer its prologue checks to see whether the value in the
sanitizer TLS slot is null, and if so it calls __hwasan_thread_enter
and reloads from the TLS slot. The runtime does the same thing if it
needs to access this data structure.
This change means that the code generator needs to know whether we
are targeting the interceptor runtime, since we don't want to pay
the cost of lazy initialization when targeting a platform with native
hwasan support. A flag -fsanitize-hwaddress-abi={interceptor,platform}
has been introduced for selecting the runtime ABI to target. The
default ABI is set to interceptor since it's assumed that it will
be more common that users will be compiling application code than
platform code.
Because we can no longer assume that the TLS slot is initialized,
the pthread_create interceptor is no longer necessary, so it has
been removed.
Ideally, lazy initialization should only cost one instruction in the
hot path, but at present the call may cause us to spill arguments
to the stack, which means more instructions in the hot path (or
theoretically in the cold path if the spills are moved with shrink
wrapping). With an appropriately chosen calling convention for
the per-thread initialization function (TODO) the hot path should
always need just one instruction and the cold path should need two
instructions with no spilling required.
Differential Revision: https://reviews.llvm.org/D56038
llvm-svn: 350429
At -O0, globalopt is not run during the compile step, and we can have a
chain of an alias having an immediate aliasee of another alias. The
summaries are constructed assuming aliases in a canonical form
(flattened chains), and as a result only the base object but no
intermediate aliases were preserved.
Fix by adding a pass that canonicalize aliases, which ensures each
alias is a direct alias of the base object.
Reviewers: pcc, davidxl
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54507
llvm-svn: 350423
The 1st try for this was at rL350369, but it caused IR-level diffs because
our cost models differentiate custom vs. legal/promote lowering. So that was
reverted at rL350373. The cost models were fixed independently at rL350403,
so this is effectively the same patch as last time.
Original commit message:
This would show up if we fix horizontal reductions to narrow as they go along,
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.
We need to do this late to not interfere with other pattern matching of larger
horizontal sequences.
We can extend this to integer ops in a follow-up patch.
Differential Revision: https://reviews.llvm.org/D56011
llvm-svn: 350421
Lifetime markers which reference inputs to the extraction region are not
safe to extract. Example ('rhs' will be extracted):
```
entry:
+------------+
| x = alloca |
| y = alloca |
+------------+
/ \
lhs: rhs:
+-------------------+ +-------------------+
| lifetime_start(x) | | lifetime_start(x) |
| use(x) | | lifetime_start(y) |
| lifetime_end(x) | | use(x, y) |
| lifetime_start(y) | | lifetime_end(y) |
| use(y) | | lifetime_end(x) |
| lifetime_end(y) | +-------------------+
+-------------------+
```
Prior to extraction, the stack coloring pass sees that the slots for 'x'
and 'y' are in-use at the same time. After extraction, the coloring pass
infers that 'x' and 'y' are *not* in-use concurrently, because markers
from 'rhs' are no longer available to help decide otherwise.
This leads to a miscompile, because the stack slots actually are in-use
concurrently in the extracted function.
Fix this by moving lifetime start/end markers for memory regions defined
in the calling function around the call to the extracted function.
Fixes llvm.org/PR39671 (rdar://45939472).
Differential Revision: https://reviews.llvm.org/D55967
llvm-svn: 350420
Similar to rL350199 - there are no known analysis/codegen holes for
funnel shift intrinsics now, so we can canonicalize the 6+ regular
instructions to funnel shift to improve vectorization, inlining,
unrolling, etc.
llvm-svn: 350419
In some cases the order that we hoist instructions in means that when rehoisting
(which uses the same order as hoisting) we can rehoist to a block A, then a
block B, then block A again. This currently causes an assertion failure as it
expects that when changing the hoist point it only ever moves to a block that
dominates the hoist point being moved from.
Fix this by moving the re-hoist point when it doesn't dominate the dominator of
hoisted instruction, or in other words when it wouldn't dominate the uses of
the instruction being rehoisted.
Differential Revision: https://reviews.llvm.org/D55266
llvm-svn: 350408
Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4
Add the other costs so that we're not relying on the default "is legal/custom" cost logic.
llvm-svn: 350403
GetPointerBaseWithConstantOffset include this code, where ByteOffset
and GEPOffset are both of type llvm::APInt :
ByteOffset += GEPOffset.getSExtValue();
The problem with this line is that getSExtValue() returns an int64_t, but
the += matches an overload for uint64_t. The problem is that the resulting
APInt is no longer considered to be signed. That in turn causes assertion
failures later on if the relevant pointer type is > 64 bits in width and
the GEPOffset was negative.
Changing it to
ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth());
resolves the issue and explicitly performs the sign-extending
or truncation. Additionally, instead of asserting later if the result
is > 64 bits, it breaks out of the loop in that case.
See also
https://reviews.llvm.org/D24729https://reviews.llvm.org/D24772
This commit must be merged after D38662 in order for the test to pass.
Patch by Michael Ferguson <mpfergu@gmail.com>.
Reviewers: reames, sanjoy, hfinkel
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D38501
llvm-svn: 350395
Doing this late so we will prefer to fold the AND into a masked comparison first. That can be better for the live range of the mask register.
Differential Revision: https://reviews.llvm.org/D56246
llvm-svn: 350374
This would show up if we fix horizontal reductions to narrow as they go along,
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.
We need to do this late to not interfere with other pattern matching of larger
horizontal sequences.
We can extend this to integer ops in a follow-up patch.
Differential Revision: https://reviews.llvm.org/D56011
llvm-svn: 350369
Summary:
Irreducible control flow is not that rare, e.g. it happens in malloc and
3 other places in the libc portions linked in to a hello world program.
This patch improves how we handle that code: it emits a br_table to
dispatch to only the minimal necessary number of blocks. This reduces
the size of malloc by 33%, and makes it comparable in size to asm2wasm's
malloc output.
Added some tests, and verified this passes the emscripten-wasm tests run
on the waterfall (binaryen2, wasmobj2, other).
Reviewers: aheejin, sunfish
Subscribers: mgrang, jgravelle-google, sbc100, dschuff, llvm-commits
Differential Revision: https://reviews.llvm.org/D55467
Patch by Alon Zakai (kripken)
llvm-svn: 350367
Summary:
The previously introduced new operand type for br_table didn't have
a disassembler implementation, causing an assert.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56227
llvm-svn: 350366
Summary:
Instead of asserting on certain kinds of malformed instructions, it
now still print, but instead adds an annotation indicating the
problem, and/or indicates invalid_type etc.
We're using the InstPrinter from many contexts that can't always
guarantee values are within range (e.g. the disassembler), where having
output is more valueable than asserting.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56223
llvm-svn: 350365
This tests a case where we need to be able to compute sign bits for two insert_subvectors that is a liveout of a basic block. The result is then used as a boolean vector in another basic block.
llvm-svn: 350359
These are similar patterns, but when you throw AVX512 onto the pile,
the number of variations explodes. For FP, we really don't care about
AVX1 vs. AVX2 for FP ops. There may be some superficial shuffle diffs,
but that's not what we're testing for here, so I removed those RUNs.
Separating by type also lets us specify 'sse3' for the FP file vs. 'ssse3'
for the integer file...because x86.
llvm-svn: 350357
As noted in PR39973 and D55558:
https://bugs.llvm.org/show_bug.cgi?id=39973
...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine:
// extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index)
We want to have this in the DAG too because as we can see in some of the test diffs (reductions),
the pattern may not be visible in IR.
Given that this is already an IR canonicalization, any backend that would prefer a vector op over
a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's
a realistic expectation though). The transform is limited with a TLI hook because there's an
existing transform in CodeGenPrepare that tries to do the opposite transform.
Differential Revision: https://reviews.llvm.org/D55722
llvm-svn: 350354
Detailed description: SIFoldOperands::foldInstOperand iterates over the
operand uses calling the function that changes def-use iteratorson the
way. As a result loop exits immediately when def-use iterator is
changed. Hence, the operand is folded to the very first use instruction
only. This makes VGPR live along the whole basic block and increases
register pressure significantly. The performance drop observed in SHOC
DeviceMemory test is caused by this bug.
Proposed fix: collect uses to separate container for further processing
in another loop.
Testing: make check-llvm
SHOC performance test.
Reviewers: rampitec, ronlieb
Differential Revision: https://reviews.llvm.org/D56161
llvm-svn: 350350
It was added in r257236 but then the one use was removed in r309517. Since no
test should call %host_cc, remove the pattern.
Differential Revision: https://reviews.llvm.org/D56200
llvm-svn: 350348
Follow up for D53051
This patch introduces the tool associated with the ELF implementation of
TextAPI (previously llvm-tapi, renamed for better distinction). This
tool will house a number of features related to enalysis and
manipulation of shared object's exposed interfaces. The first major
feature for this tool is support for producing binary stubs that are
useful for compile-time linking of shared objects. This patch introduces
beginnings of support for reading binary ELF objects to work towards
that goal.
Added:
- elfabi tool.
- support for reading architecture from a binary ELF file into an
ELFStub.
- Support for writing .tbe files.
Differential Revision: https://reviews.llvm.org/D55352
llvm-svn: 350341
Summary:
Fix EntrySize, Size, and Align before doing layout calculation.
As a side cleanup, this removes a dependence on sizeof(Elf_Sym) within BinaryReader, so we can untemplatize that.
This unblocks a cleaner implementation of handling the -O<format> flag. See D53667 for a previous attempt. Actual implementation of the -O<format> flag will come in an upcoming commit, this is largely a NFC (although not _totally_ one, because alignment on binary input was actually wrong before).
Reviewers: jakehehrlich, jhenderson, alexshap, espindola
Reviewed By: jhenderson
Subscribers: emaste, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D56211
llvm-svn: 350336
Use range instead of xrange whenever possible. The extra list creation in Python2
is generally not a performance bottleneck.
Differential Revision: https://reviews.llvm.org/D56253
llvm-svn: 350309
Make sure all print statements are compatible with Python 2 and Python3 using
the `from __future__ import print_function` statement.
Differential Revision: https://reviews.llvm.org/D56249
llvm-svn: 350307
Summary:
Keeping msan a function pass requires replacing the module level initialization:
That means, don't define a ctor function which calls __msan_init, instead just
declare the init function at the first access, and add that to the global ctors
list.
Changes:
- Pull the actual sanitizer and the wrapper pass apart.
- Add a newpm msan pass. The function pass inserts calls to runtime
library functions, for which it inserts declarations as necessary.
- Update tests.
Caveats:
- There is one test that I dropped, because it specifically tested the
definition of the ctor.
Reviewers: chandlerc, fedor.sergeev, leonardchan, vitalybuka
Subscribers: sdardis, nemanjai, javed.absar, hiraditya, kbarton, bollu, atanasyan, jsji
Differential Revision: https://reviews.llvm.org/D55647
llvm-svn: 350305
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.
This patch also renames FeatureSpecRestrict to FeatureSB.
Reviewed By: olista01, LukeCheeseman
Differential Revision: https://reviews.llvm.org/D55990
llvm-svn: 350299
Summary:
The commit rL348922 introduced a means to set Metadata
section kind for a global variable, if its explicit section
name was prefixed with ".AMDGPU.metadata.".
This patch changes that prefix to ".AMDGPU.comment.",
as "metadata" in the section name might lead to
ambiguity with metadata used by AMD PAL runtime.
Change-Id: Idd4748800d6fe801441d91595fc21e5a4171e668
Reviewers: kzhuravl
Reviewed By: kzhuravl
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D56197
llvm-svn: 350292
A DBG_VALUE between a two-address instruction and a following COPY
would prevent rescheduleMIBelowKill optimization inside
TwoAddressInstructionPass.
Differential Revision: https://reviews.llvm.org/D55987
llvm-svn: 350289
There can be multiple local symbols with the same name (for e.g.
comdat sections), and thus the symbol name itself isn't enough
to disambiguate symbols.
Differential Revision: https://reviews.llvm.org/D56140
llvm-svn: 350288
The test cases are constructed to avoid folding the AND into a masked compare operation.
Currently we emit a KAND and a KORTEST for these cases.
llvm-svn: 350287
When switched to the MI scheduler for P9, the hardware is modeled as out of order.
However, inside the MI Scheduler algorithm, we still use the in-order scheduling model
as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer
the op. So, only when all the available instructions issued, the pending instruction
could be scheduled. That is not true for our P9 hw in fact.
This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is
picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017.
With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows:
x264_r: +6.95%
cactuBSSN_r: +6.94%
lbm_r: +4.11%
xz_r: -3.85%
And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved.
Reviewer: Nemanjai
Differential Revision: https://reviews.llvm.org/D55810
llvm-svn: 350285
OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it
sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have
equivalent arguments (either identical or equivalent PHIs). It then assumes that
ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead.
Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs
so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue.
This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence.
rdar://problem/47005143
Reviewed By: ahatanak
Differential Revision: https://reviews.llvm.org/D56235
llvm-svn: 350284
Summary:
Sometimes it's useful to emit assembly after LTO stage to modify it manually. Emitting precodegen bitcode file (via save-temps plugin option) and then feeding it to llc doesn't always give the same binary as original.
This patch is simpler alternative to https://reviews.llvm.org/D24020.
Patch by Denis Bakhvalov.
Reviewers: mehdi_amini, tejohnson
Reviewed By: tejohnson
Subscribers: MaskRay, inglorion, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D56114
llvm-svn: 350276
Summary:
This was previously ignored and an incorrect value generated.
Also fixed Disassembler's handling of block_type.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56092
llvm-svn: 350270
Summary:
Alias can make one (but not all) live, we still need to scan all others if this symbol is reachable
from somewhere else.
Reviewers: tejohnson, grimar
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D56117
llvm-svn: 350269
The caller to EraseInstruction had this conditional:
// ARC calls with null are no-ops. Delete them.
if (IsNullOrUndef(Arg))
but the assert inside EraseInstruction only allowed ConstantPointerNull and not
undef or bitcasts.
This adds support for both of these cases.
rdar://problem/47003805
llvm-svn: 350261
If an instruction has no demanded bits, remove it directly during BDCE,
instead of leaving it for something else to clean up.
Differential Revision: https://reviews.llvm.org/D56185
llvm-svn: 350257
INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag.
This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used.
I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag.
This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes.
Differential Revision: https://reviews.llvm.org/D55975
llvm-svn: 350245
Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced.
Improves the test case from PR38217. There may be additional opportunities after this.
Differential Revision: https://reviews.llvm.org/D56145
llvm-svn: 350239
Adding MC regressions tests to cover the XOP isa set.
This patch is part of a larger task to cover MC encoding of all X86 isa sets started in revision: https://reviews.llvm.org/D39952
Differential Revision: https://reviews.llvm.org/D41392
llvm-svn: 350237
By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined.
By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend.
This fixes the regression on X86 in D56156.
Differential Revision: https://reviews.llvm.org/D56176
llvm-svn: 350236
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.
The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.
Differential Revision: https://reviews.llvm.org/D56156
llvm-svn: 350235
PPCPreEmitPeephole pass.
PPCPreEmitPeephole will convert a BC to B when the conditional branch is
based on a constant CR by CRSET or CRUNSET. This is added in
https://reviews.llvm.org/rL343100.
When the conditional branch is known to be always taken, all branches will
be removed and a new unconditional branch will be inserted. However, when
SeenUse is false the original patch will not remove the branches, but still
insert the new unconditional branch, update the successors and create
inconsistent IR. Compiling the synthetic testcase included can show the
problem we run into.
The patch simply removes the SeenUse condition when adding branches into
InstrsToErase set.
Differential Revision: https://reviews.llvm.org/D56041
llvm-svn: 350223
Peek through shift modulo masks while matching double shift patterns.
I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now.
Differential Revision: https://reviews.llvm.org/D56199
llvm-svn: 350222
Motivated by the discussion in D38499, this patch updates BasicAA to support
arbitrary pointer sizes by switching most remaining non-APInt calculations to
use APInt. The size of these APInts is set to the maximum pointer size (maximum
over all address spaces described by the data layout string).
Most of this translation is straightforward, but this patch contains a fix for
a bug that revealed itself during this translation process. In order for
test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit
pointers, the intermediate calculations must be performed using 64-bit
integers. This is because, as noted in the patch, when GetLinearExpression
decomposes an expression into C1*V+C2, and we then multiply this by Scale, and
distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even
through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can
overflow. If this happens, later logic will draw invalid conclusions from the
(base) offset value. Thus, when initially applying the APInt conversion,
because the maximum pointer size in this test is 32 bits, it started failing.
Suspicious, I created a 64-bit version of this test (included here), and that
failed (miscompiled) on trunk for a similar reason (the multiplication can
overflow).
After fixing this overflow bug, the first test case (at least) in
Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was
relying on having 64-bit intermediate values to have BasicAA return an accurate
result. In order to fix this problem, and because I believe that it is not
uncommon to use i64 indexing expressions in 32-bit code (especially portable
code using int64_t), it seems reasonable to always use at least 64-bit
integers. In this way, we won't regress our analysis capabilities (and there's
a command-line option added, so experimenting with this should be easy).
As pointed out by Eli during the review, there are other potential overflow
conditions that this patch does not address. Fixing those is left to follow-up
work.
Patch by me with contributions from Michael Ferguson (mferguson@cray.com).
Differential Revision: https://reviews.llvm.org/D38662
llvm-svn: 350220
This was added at a time when i1 was a legal type with avx512f and there was a bug. i1 is no longer considered a legal type with avx512f so there should be no codegen difference.
llvm-svn: 350203
The final piece of IR-level analysis to allow this was committed with:
rL350188
Using the intrinsics should improve transforms based on cost models
like vectorization and inlining.
The backend should be prepared too, so we can now canonicalize more
sequences of shift/logic to the intrinsics and know that the end
result should be equal or better to the original code even if the
target does not have an actual rotate instruction.
llvm-svn: 350199
This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions.
This is inspired by the helper functions used by ARM and AArch64 for the same purpose.
The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO.
llvm-svn: 350198
This was added at a time when i1 was a legal type with avx512f and there was a bug. i1 is no longer considered a legal type with avx512f so there should be no codegen difference.
llvm-svn: 350195
When SADDO/SSUBO is used as a part of a condition, the X86 backend has to lower the instruction twice. One for the flags use and then once for the data use. These two selections should be kept in sync so they end up with one node providing the data and the flags. This doesn't seem to be happening for INC/DEC.
llvm-svn: 350194
To make it more obvious which part of the transformation is carried
out by BDCE. Also drop the CHECK-IO lines which only run -instsimplify
as they don't really seem meaningful if the main check doesn't run
-instsimplify either.
llvm-svn: 350189
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.
BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.
The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.
The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.
Differential Revision: https://reviews.llvm.org/D55563
llvm-svn: 350188
default
During the lowering of a switch that would result in the generation of a jump
table, a range check is performed before indexing into the jump table, for the
switch value being outside the jump table range and a conditional branch is
inserted to jump to the default block. In case the default block is
unreachable, this conditional jump can be omitted. This patch implements
omitting this conditional branch for unreachable defaults.
Review Reference: D52002
llvm-svn: 350186
Propagate HOME environment variable to unittests. This is necessary
to fix test failures resulting from pw_home pointing to a non-existing
directory while being overriden with HOME. Apparently Gentoo users
hit this sometimes when they override build directory for Portage.
Original bug report: https://bugs.gentoo.org/674088
Differential Revision: https://reviews.llvm.org/D56162
llvm-svn: 350175
MSan used to report false positives in the case the argument of
llvm.is.constant intrinsic was uninitialized.
In fact checking this argument is unnecessary, as the intrinsic is only
used at compile time, and its value doesn't depend on the value of the
argument.
llvm-svn: 350173
Summary:
For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo.
Then the verifier runs and it seems like we have a use of an undefined register (the register will
be reserved later, but the verifier doesn't know that).
So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know
X2 is a reserved register.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D56148
llvm-svn: 350165
This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better.
Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it.
llvm-svn: 350159
This allows us to sign extend to v4i32 first. And then share that extension to implement the final steps to v4i64 using a pcmpgt and punpckl and punpckh.
We already do something similar for SIGN_EXTEND with -x86-experimental-vector-widening-legalization.
llvm-svn: 350158
A recent patch has added custom legalization of vector conversions of
v2i16 -> v2f64. This just rounds it out for other types where the input vector
has an illegal (narrower) type than the result vector. Specifically, this will
handle the following conversions:
v2i8 -> v2f64
v4i8 -> v4f32
v4i16 -> v4f32
Differential revision: https://reviews.llvm.org/D54663
llvm-svn: 350155
The current CRBIT spill pseudo-op expansion creates a KILL instruction
that kills the CRBIT and defines the enclosing CR field. However, this
paints a false picture to the register allocator that all bits in the CR
field are killed so copies of other bits out of the field become dead and
removable.
This changes the expansion to preserve the KILL flag on the CRBIT as an
implicit use and to treat the CR field as an undef input.
Thanks to Hal Finkel for the review and Uli Weigand for implementation input.
Differential revision: https://reviews.llvm.org/D55996
llvm-svn: 350153
The following code requests 64-bit PC-relative relocations unsupported
by MIPS ABI. Now it triggers an assertion. It's better to show an error
message.
```
foo:
.quad bar - foo
```
llvm-svn: 350152
This was tricking us into making these operations and then letting them get scalarized later. But I can't prove that the scalarized version is actually better.
llvm-svn: 350141
Previously we emitted a multiply and some masking that was supposed to matched to PMULUDQ, but the masking could sometimes be removed before we got a chance to match it. So instead just emit the PMULUDQ directly.
Remove the DAG combine that was added when the ReplaceNodeResults code was originally added. Add a new DAG combine to avoid regressions in shrink_vmul.ll
Some of the shrink_vmul.ll test cases now pick PMULUDQ instead of PMADDWD/PMULLD, but I think this should be an improvement on most CPUs.
I think all of this can go away if/when we switch to -x86-experimental-vector-widening-legalization
llvm-svn: 350134
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.
This patch also moves to FeatureSB the old FeatureSpecRestrict.
Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman
Differential Revision: https://reviews.llvm.org/D55921
llvm-svn: 350126
This is the last one in a series of patches to support better code generation for bitfield insert.
BitPermutationSelector already support ISD::ZERO_EXTEND but not TRUNCATE.
This patch adds support for ISD:TRUNCATE in BitPermutationSelector.
For example of this test case,
struct s64b {
int a:4;
int b:16;
int c:24;
};
void bitfieldinsert64b(struct s64b *p, unsigned char v) {
p->b = v;
}
the selection DAG loos like:
t14: i32,ch = load<(load 4 from %ir.0)> t0, t2, undef:i64
t18: i32 = and t14, Constant:i32<-1048561>
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t22: i64 = AssertZext t4, ValueType:ch:i8
t23: i32 = truncate t22
t16: i32 = shl nuw nsw t23, Constant:i32<4>
t19: i32 = or t18, t16
t20: ch = store<(store 4 into %ir.0)> t14:1, t19, t2, undef:i64
By handling truncate in the BitPermutationSelector, we can use information from AssertZext when selecting t19 and skip the mask operation corresponding to t18.
So the generated sequences with and without this patch are
without this patch
rlwinm 5, 5, 0, 28, 11 # corresponding to t18
rlwimi 5, 4, 4, 20, 27
with this patch
rlwimi 5, 4, 4, 12, 27
Differential Revision: https://reviews.llvm.org/D49076
llvm-svn: 350118
If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have.
Differential Revision: https://reviews.llvm.org/D56078
llvm-svn: 350115
For atomic value operand which less than 4 bytes need to be masked.
And the related operation to calculate the newvalue can be done in 32 bit gprc.
So just use gprc for mask and value calculation.
Differential Revision: https://reviews.llvm.org/D56077
llvm-svn: 350113
Create PMULDQ/PMULUDQ as long as the number of elements is a power of 2.
This seems to give some improvements in our ability to use SimplifyDemandedBits.
llvm-svn: 350084
Summary:
These instructions are currently unused in our backend, but for
completeness it is good to support them, so they can be used with
the assembler in hand-written code.
Tests are very basic, signature support missing much like other blocks.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55973
llvm-svn: 350079
Summary:
It does so using a simple nesting stack, and gives clear errors upon
violation. This is unique to wasm, since most CPUs do not have
any nested constructs.
Had to add an end of file check to the general assembler for this.
Note: if/else/end instructions are not currently supported in our
tablegen defs, so these tests will be enabled in a follow-up.
They already pass the nesting check.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55797
llvm-svn: 350078
Summary:
Existing LIR recognizes CTLZ where shifting input variable right until it is zero. (Shift-Until-Zero idiom)
This commit:
1. Augments Shift-Until-Zero idiom to recognize CTTZ where input variable is shifted left.
2. Prepare for BitScan idiom recognition.
Patch by Yuanfang Chen (tabloid.adroit)
Reviewers: craig.topper, evstupac
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55876
llvm-svn: 350074
Fixes crash reported after r347354 for frontends that don't always emit
'this' pointers for methods. Now we will silently produce debug info
that makes functions like this look like static methods, which seems
reasonable.
llvm-svn: 350073
The patch adds a possibility to make library calls on NVPTX.
An important thing about library functions - they must be defined within
the current module. This basically should guarantee that we produce a
valid PTX assembly (without calls to not defined functions). The one who
wants to use the libcalls is probably will have to link against
compiler-rt or any other implementation.
Currently, it's completely impossible to make library calls because of
error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can
lower ExternalSymbol to TargetExternalSymbol and verify if the function
definition is available.
Also, there was an issue with a DAG during legalisation. When we expand
instruction into libcall, the inner call-chain isn't being "integrated"
into outer chain. Since the last "data-flow" (call retval load) node is
located in call-chain earlier than CALLSEQ_END node, the latter becomes
a leaf and therefore a dead node (and is being removed quite fast).
Proposed here solution relies on another data-flow pseudo nodes
(ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and
instruction selection phases - we remove the pseudo instructions before
register scheduling phase.
Patch by Denys Zariaiev!
Differential Revision: https://reviews.llvm.org/D34708
llvm-svn: 350069
Add widen scalar for type index 1 (i1 condition) for G_SELECT.
Select G_SELECT for pointer, s32(integer) and smaller low level
types on MIPS32.
Differential Revision: https://reviews.llvm.org/D56001
llvm-svn: 350063
Summary:
This patch is to fix the bug imported by rL341634.
In above submit , the the return type of ISD::ADDE is
14224: SDVTList VTs = DAG.getVTList(MVT::i64, MVT::i64),
but in fact, the second return type of ISD::ADDE should be
MVT::Glue not MVT::i64.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D55977
llvm-svn: 350061
This is an alternative to what I attempted in D56057.
GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users.
I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits.
Based on a patch that Simon Pilgrim sent me in email.
Fixes PR40142.
llvm-svn: 350059
This patch teaches LoopSimplifyCFG to remove dead exiting edges
from loops.
Differential Revision: https://reviews.llvm.org/D54025
Reviewed By: fedor.sergeev
llvm-svn: 350049
It's dangerous to knowingly create an illegal vector type
no matter what stage of combining we're in.
This prevents the missed folding/scalarization seen in:
https://bugs.llvm.org/show_bug.cgi?id=40146
llvm-svn: 350034
trunc (add X, C ) --> add (trunc X), C'
If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type.
This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine).
This change used to show regressions for x86, but those are gone after D55494.
This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic)
that does almost the same thing.
Differential Revision: https://reviews.llvm.org/D55866
llvm-svn: 350006
The missed load folding noticed in D55898 is visible independent of that change
either with an adjusted IR pattern to start or with AVX2/AVX512 (where the build
vector becomes a broadcast first; movddup is not produced until we get into isel
via tablegen patterns).
Differential Revision: https://reviews.llvm.org/D55936
llvm-svn: 350005
@bextr64_32_b1 is extracted from hotpath of real-world code
(RawSpeed BitStream<>::peekBitsNoFill()) after `clang -O3`.
@bextr64_32_b2/@bextr64_32_b0 is the same pattern,
but with trunc done last, showing how i think it can be handled:
https://rise4fun.com/Alive/K4Bhttps://rise4fun.com/Alive/qC9
It is possible that middle-end should do some of this, too.
https://bugs.llvm.org/show_bug.cgi?id=36419
llvm-svn: 349998
Summary:
The "single parameter" .file directive appears to be an ELF-only feature
that is intended to insert the main source filename into the string
table table.
I noticed that if you assemble an ELF .s file for COFF, typically it
will assert right away on a .file directive near the top of the file. My
first change was to make this emit a proper error in the asm parser so
that we don't assert so easily.
However, COFF actually does have some support for this directive, and if
you emit an object file, llvm-mc does not assert. When emitting a COFF
object, MC will take those file names and create "debug" symbol table
entries for them. I'm not familiar with these kinds of symbol table
entries, and I'm not aware of any users of them, but @compnerd added
them a while ago. They don't introduce absolute paths, and most main
source file paths are short enough that this extra entry shouldn't cause
any problems, so I enabled the flag in MCAsmInfoCOFF that indicates that
it's supported.
This has the side effect of adding an extra debug symbol to every object
produced by clang, which is a pretty big functional change. My question
is, should we keep the functionality or remove it in the name of symbol
table minimalism?
Reviewers: mstorsjo, compnerd
Subscribers: hiraditya, compnerd, llvm-commits
Differential Revision: https://reviews.llvm.org/D55900
llvm-svn: 349976
Originally committed in r349333, reverted in r349353.
GCC emitted these unconditionally on/before 4.4/March 2012
Clang emitted these unconditionally on/before 3.5/March 2014
This improves performance when parsing CUs (especially those using split
DWARF) that contain no code ranges (such as the mini CUs that may be
created by ThinLTO importing - though generally they should be/are
avoided, especially for Split DWARF because it produces a lot of very
small CUs, which don't scale well in a bunch of other ways too
(including size)).
The revert was due to a (Google internal) test that had some checked in old
object files missing DW_AT_ranges. That's since been fixed.
llvm-svn: 349968
This fixes the patterns that have or/and as a root. 'and' is handled differently since thy usually have a CMP wrapped around them.
I had to look for uses of the CF flag because all these nodes have non-standard CF flag behavior. A real or/xor would always clear CF. In practice we shouldn't be using the CF flag from these nodes as far as I know.
Differential Revision: https://reviews.llvm.org/D55813
llvm-svn: 349962
The BEXTR instruction documents the SF bit as undefined.
The TBM BEXTR instruction has the same issue, but I'm not sure how to test it. With the control being an immediate we can determine the sign bit is 0 or the BEXTR would have been removed.
Fixes PR40060
Differential Revision: https://reviews.llvm.org/D55807
llvm-svn: 349956
Summary:
Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing.
This is because the M0 field is of unsigned.
This patch achieves the similar goal as https://reviews.llvm.org/D55241, but keeps the optimization
if the base is known unsigned.
Reviewers:
arsemn
Differential Revision:
https://reviews.llvm.org/D55568
llvm-svn: 349951
Summary:
BasicAA has special logic for unescaped allocas, which normally applies
equally well to dynamic and static allocas. However, llvm.stackrestore
has the power to end the lifetime of dynamic allocas, without referring
to them directly.
stackrestore is already marked with the most conservative memory
modification attributes, but because the alloca is not escaped, the
normal logic produces incorrect results. I think BasicAA needs a special
case here to teach it about the relationship between dynamic allocas and
stackrestore.
Fixes PR40118
Reviewers: gbiv, efriedma, george.burgess.iv
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D55969
llvm-svn: 349945
This is admittedly a narrow fix for the problem:
https://bugs.llvm.org/show_bug.cgi?id=37502
...but as the XOP restriction shows, it's a maze to get this right.
In the motivating example, note that we have movddup before SSE4.1 and
again with AVX2. That's because insertps isn't available pre-SSE41 and
vbroadcast is (more generally) available with AVX2 (and the splat is
reduced to movddup via isel pattern).
Differential Revision: https://reviews.llvm.org/D55898
llvm-svn: 349937
This adds support for widening G_FCEIL in LegalizerHelper and
AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to
widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't
support full FP 16.
This also updates AArch64/f16-instructions.ll to show that we perform the
correct transformation.
llvm-svn: 349927
This adds an AVX512 run as suggested in D55936.
The test didn't really belong with other build vector tests
because that's not the pattern here. I don't see much value
in adding 64-bit RUNs because they wouldn't exercise the
isel patterns that we're aiming to expose.
llvm-svn: 349920
-print-after IR printing generally can not print the IR unit (Loop or SCC)
which has just been invalidated by the pass. However, when working in -print-module-scope
mode even if Loop was invalidated there is still a valid module that we can print.
Since we can not access invalidated IR unit from AfterPassInvalidated instrumentation
point we can remember the module to be printed *before* pass. This change introduces
BeforePass instrumentation that stores all the information required for module printing
into the stack and then after pass (in AfterPassInvalidated) just print whatever
has been placed on stack.
Reviewed By: philip.pfaffe
Differential Revision: https://reviews.llvm.org/D55278
llvm-svn: 349896
- When signing return addresses with -msign-return-address=<scope>{+<key>},
either the A key instructions or the B key instructions can be used. To
correctly authenticate the return address, the unwinder/debugger must know
which key was used to sign the return address.
- When and exception is thrown or a break point reached, it may be necessary to
unwind the stack. To accomplish this, the unwinder/debugger must be able to
first authenticate an the return address if it has been signed.
- To enable this, the augmentation string of CIEs has been extended to allow
inclusion of a 'B' character. Functions that are signed using the B key
variant of the instructions should have and FDE whose associated CIE has a 'B'
in the augmentation string.
- One must also be able to preserve these semantics when first stepping from a
high level language into assembly and then, as a second step, into an object
file. To achieve this, I have introduced a new assembly directive
'.cfi_b_key_frame ', that tells the assembler the current frame uses return
address signing with the B key.
- This ensures that the FDE is associated with a CIE that has 'B' in the
augmentation string.
Differential Revision: https://reviews.llvm.org/D51798
llvm-svn: 349895
It seems better to avoid using the callback if possible since
there are coverage assertions which are disabled if this is used.
Also fix missing tests. Only test the legal cases since it seems
legalization for build_vector is quite lacking.
llvm-svn: 349878
This saves materializing the immediate. The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.
I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.
Differential Revision: https://reviews.llvm.org/D55630
llvm-svn: 349857
If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end
up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback.
Add it to the switch, and add a test verifying that this happens.
llvm-svn: 349822
When deciding lazily whether a CU would be split or non-split I
accidentally dropped some handling for the line tables comp_dir (by
doing it lazily it was too late to be handled properly by the MC line
table code).
Move that bit of the code back to the non-lazy place.
llvm-svn: 349819
We have to treat constructs like this as if they were "symbolic", to use
the correct codepath to resolve them. This mostly only affects movz
etc. because the other uses of classifySymbolRef conservatively treat
everything that isn't a constant as if it were a symbol.
Differential Revision: https://reviews.llvm.org/D55906
llvm-svn: 349800
This requires a bit more code than other fixups, to distingush between
abs_g0/abs_g1/etc. Actually, I think some of the other fixups are
missing some checks, but I won't try to address that here.
I haven't seen any real-world code that uses a construct like this, but
it clearly should work, and we're considering using it in the
implementation of localescape/localrecover on Windows (see
https://reviews.llvm.org/D53540). I've verified that binutils produces
the same code as llvm-mc for the testcase.
This currently doesn't include support for the *_s variants (that
requires a bit more work to set the opcode).
Differential Revision: https://reviews.llvm.org/D55896
llvm-svn: 349799
If we found unsafe dependences other than 'unknown', we already know at
compile time that they are unsafe and the runtime checks should always
fail. So we can avoid generating them in those cases.
This should have no negative impact on performance as the runtime checks
that would be created previously should always fail. As a sanity check,
I measured the test-suite, spec2k and spec2k6 and there were no regressions.
Reviewers: Ayal, anemet, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D55798
llvm-svn: 349794
Emit static locals within the correct lexical scope so variables with the same
name will not confuse the debugger into getting the wrong value.
Differential Revision: https://reviews.llvm.org/D55336
llvm-svn: 349777
This patch enables funnel shift -> rotate building for all ROTL/ROTR custom/legal operations.
AFAICT X86 was the last target that was missing modulo support (PR38243), but I've tried to CC stakeholders for every target that has ROTL/ROTR custom handling for their final OK.
Differential Revision: https://reviews.llvm.org/D55747
llvm-svn: 349765
This is a update to D43157 to correctly handle fixup_riscv_pcrel_lo12.
Notable changes:
Rebased onto trunk
Handle and test S-type
Test case pcrel-hilo.s is merged into relocations.s
D43157 description:
VK_RISCV_PCREL_LO has to be handled specially. The MCExpr inside is
actually the location of an auipc instruction with a VK_RISCV_PCREL_HI fixup
pointing to the real target.
Differential Revision: https://reviews.llvm.org/D54029
Patch by Chih-Mao Chen and Michael Spencer.
llvm-svn: 349764
As discussed on D55894, this replaces the existing PADDS/PSUBUS intrinsics with the the sadd/ssub.sat generic intrinsics and moves the tests out of the x86 subfolder.
PR40110 has been raised to fix the regression with constant folding vectors containing undef elements.
llvm-svn: 349759
This patch fixes two deficiencies in current code that recognizes
the VLLEZ idiom:
- For the floating-point versions, we have ISel patterns that match
on a bitconvert as the top node. In more complex cases, that
bitconvert may already have been merged into something else.
Fix the patterns to match the inner nodes instead.
- For the 64-bit integer versions, depending on the surrounding code,
we may get either a DAG tree based on JOIN_DWORDS or one based on
INSERT_VECTOR_ELT. Use a PatFrags to simply match both variants.
llvm-svn: 349749
Current code in SystemZDAGToDAGISel::tryGather refuses to perform
any transformation if the Load SDNode has more than one use. This
(erronously) counts uses of the chain result, which prevents the
optimization in many cases unnecessarily. Fixed by this patch.
llvm-svn: 349748
We already have special code (DAG combine support for FP_ROUND)
to recognize cases where we an use a vector version of VLEDB to
perform two floating-point truncates in parallel, but equivalent
support for VLEDB (vector floating-point extends) has been
missing so far. This patch adds corresponding DAG combine
support for FP_EXTEND.
llvm-svn: 349746
Those intrinsics will be autoupgraded soon to @llvm.sadd.sat generics (D55894), so to keep a x86-specific case I'm replacing it with @llvm.x86.sse2.pmulhu.w
llvm-svn: 349739
These tools were assuming ABI version is 0,
that is not always true.
Patch teaches them to work with that field.
Differential revision: https://reviews.llvm.org/D55884
llvm-svn: 349737
LLVM treats void* pointers passed to assembly routines as pointers to
sized types.
We used to emit calls to __msan_instrument_asm_load() for every such
void*, which sometimes led to false positives.
A less error-prone (and truly "conservative") approach is to unpoison
only assembly output arguments.
llvm-svn: 349734
Summary:
This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp
in just two loads on X86. These were previously calling memcmp.
Reviewers: spatel, gchatelet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55263
llvm-svn: 349731
Summary:
PowerPC has scalar selects (isel) and vector mask selects (xxsel). But PowerPC
does not have vector CR selects, PowerPC does not support scalar condition
selects on vectors.
In addition to implementing this hook, isSelectSupported() should return false
when the SelectSupportKind is ScalarCondVectorVal, so that predictable selects
are converted into branch sequences.
Reviewed By: steven.zhang, hfinkel
Differential Revision: https://reviews.llvm.org/D55754
llvm-svn: 349727
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.
This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).
This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.
The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.
Differential Revision: https://reviews.llvm.org/D52116
llvm-svn: 349725
Summary:
This is a code size savings and is also important to get runnable code
while engines do not support v128.const.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55910
llvm-svn: 349724
Summary:
Gates v128.const, f32x4.sqrt, f32x4.div, i8x16.extract_lane_u, and
i16x8.extract_lane_u on the --wasm-enable-unimplemented-simd flag,
since these ops are not implemented yet in V8.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55904
llvm-svn: 349720
This code pattern is an unfortunate side effect of the way some types get split
at call lowering. Ideally we'd either not generate it at all or combine it away
in the legalizer artifact combiner.
Until then, add selection support anyway which is a significant proportion of
our current fallbacks on CTMark.
rdar://46491420
llvm-svn: 349712
Summary:
On non-Windows these are already removed by ShouldInstrumentGlobal.
On Window we will wait until we get actual issues with that.
Reviewers: pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D55899
llvm-svn: 349707
Summary:
ICF prevented by removing unnamed_addr and local_unnamed_addr for all sanitized
globals.
Also in general unnamed_addr is not valid here as address now is important for
ODR violation detector and redzone poisoning.
Before the patch ICF on globals caused:
1. false ODR reports when we register global on the same address more than once
2. globals buffer overflow if we fold variables of smaller type inside of large
type. Then the smaller one will poison redzone which overlaps with the larger one.
Reviewers: eugenis, pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D55857
llvm-svn: 349706
Creating the IR builder, then modifying the CFG, leads to an IRBuilder
where the BB and insertion point are inconsistent, so new instructions
have the wrong parent.
Modified an existing test because the test wasn't covering anything
useful (the "invoke" was not actually an invoke by the time we hit the
code in question).
Differential Revision: https://reviews.llvm.org/D55729
llvm-svn: 349693
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.
BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.
The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.
The test case has a couple of cases that are not simplified yet. In
particular, we're only looking at uses of instructions right now. I think
it would make sense to also extend this to arguments. Furthermore
DemandedBits doesn't yet know some of the tricks that InstCombine does
for the demanded bits or bitwise or/and/xor in combination with known
bits information.
Differential Revision: https://reviews.llvm.org/D55563
llvm-svn: 349674
This is to address post-commit feedback from Paul Robinson on r348954.
The original commit misinterprets count and upper bound as the same thing (I thought I saw GCC producing an upper bound the same as Clang's count, but GCC correctly produces an upper bound that's one less than the count (in C, that is, where arrays are zero indexed)).
I want to preserve the C-like output for the common case, so in the absence of a lower bound the count (or one greater than the upper bound) is rendered between []. In the trickier cases, where a lower bound is specified, a half-open range is used (eg: lower bound 1, count 2 would be "[1, 3)" and an unknown parts use a '?' (eg: "[1, ?)" or "[?, 7)" or "[?, ? + 3)").
Reviewers: aprantl, probinson, JDevlieghere
Differential Revision: https://reviews.llvm.org/D55721
llvm-svn: 349670
Summary:
The LTO/ThinLTO driver currently creates invalid bitcode by setting
symbols marked dllimport as dso_local. The compiler often has access
to the definition (often dllexport) and the declaration (often
dllimport) of an object at link-time, leading to a conflicting
declaration. This patch resolves the inconsistency by removing the
dllimport attribute.
Reviewers: tejohnson, pcc, rnk, echristo
Reviewed By: rnk
Subscribers: dmikulin, wristow, mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D55627
llvm-svn: 349667
This adds a G_FCEIL generic instruction and uses it in AArch64. This adds
selection for floating point ceil where it has a supported, dedicated
instruction. Other cases aren't handled here.
It updates the relevant gisel tests and adds a select-ceil test. It also adds a
check to arm64-vcvt.ll which ensures that we don't fall back when we run into
one of the relevant cases.
llvm-svn: 349664
The (cmp (and X, Y) 0) pattern is greedy and ends up forming a TESTrr and consuming the and when it might be better to use one of the BMI/TBM like BLSR or BLSI.
This patch moves removes the pattern from isel and adds a post processing check to combine TESTrr+ANDrr into just a TESTrr. With this patch we are able to select the BMI/TBM instructions, but we'll also emit a TESTrr when the result is compared to 0. In many cases the peephole pass will be able to use optimizeCompareInstr to remove the TEST, but its probably not perfect.
Differential Revision: https://reviews.llvm.org/D55870
llvm-svn: 349661
Fixes https://bugs.llvm.org/show_bug.cgi?id=38743
The function removeRedundantBlockingStores is supposed to remove any blocking stores contained in each other in lockingStoresDispSizeMap.
But it currently looks only at the previous one, which will miss some cases that result in assert.
This patch refine the function to check all previous layouts until find the uncontained one. So all redundant stores will be removed.
Patch by Pengfei Wang
Differential Revision: https://reviews.llvm.org/D55642
llvm-svn: 349660
Summary:
Import libraries as created by llvm-dlltool always use the same archive
member name for every object file (namely, the DLL library name). Ensure
that long names are not repeatedly stored in the string table.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D55860
llvm-svn: 349637
Now that we use the generic ISD opcodes, we can use the generic intrinsics directly as well. This fixes the poor fast-isel codegen by not expanding to an easily broken IR code sequence.
I'm intending to deal with the signed saturation equivalents as well.
Clang counterpart: https://reviews.llvm.org/D55879
Differential Revision: https://reviews.llvm.org/D55855
llvm-svn: 349630
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.
This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.
I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases.
Differential Revision: https://reviews.llvm.org/D55822
llvm-svn: 349629
As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero.
SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue.
Thanks to @dmgreen for catching this.
Differential Revision: https://reviews.llvm.org/D55883
llvm-svn: 349625
These are due to be upgraded soon, but good to replace them with generic llvm sadd_sat/ssub_sat intrinsics now.
The avx512 masked cases need doing as well but require a bit of tidyup first.
llvm-svn: 349621
Summary:
Using HI here makes no logical sense, since the dword is only
32 bits to begin with.
Current Mesa master does not look at the relocation type at all,
so this change is fine. Future Mesa will rely on this, however.
Change-Id: I91085707834c4ac0370926602b93c94b90e44cb1
Reviewers: arsenm, rampitec, mareko
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D55369
llvm-svn: 349620
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs.
This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument.
I've updated SelectionDAG::simplifyShift to demonstrate its use.
Differential Revision: https://reviews.llvm.org/D55819
llvm-svn: 349616
These are already being autoupgraded, currently to an IR sequence, but best to replace them with generic llvm uadd_sat/usub_sat intrinsics (which D55855 will be doing shortly anyhow).
The avx512 masked cases need doing as well but require a bit of tidyup first.
llvm-svn: 349615
Summary:
Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged.
This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds.
Irreducible loop test has been extended based on a CTS failure detected for GFX9.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D55602
llvm-svn: 349611
All we have to do is mark it as legal.
This allows us to select a lot of new patterns handled by TableGen. This
patch adds tests for them and splits up the existing test file for
binary operators into 2 files, one for arithmetic ops and one for
logical ones.
llvm-svn: 349610
This is an initial implementation of no-op passthrough copying of COFF
with objcopy.
Differential Revision: https://reviews.llvm.org/D54939
llvm-svn: 349605
Summary:
In addition to reducing the functions in an LLVM module, bugpoint now
reduces the function attributes associated with each of the remaining
functions.
To test this, add a -bugpoint-crashfuncattr test pass, which crashes if
a function in the module has a "bugpoint-crash" attribute. A test case
demonstrates that the IR is reduced to just that one attribute.
Reviewers: MatzeB, silvas, davide, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D55216
llvm-svn: 349601
For type v4i32/v8ii16/v16i8, do following transforms:
(vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b)
(vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b)
Differential Revision: https://reviews.llvm.org/D55812
llvm-svn: 349599
There's a mismatch internally about how we are handling these patterns.
We count loads as cheapToScalarize(), but then we don't actually
scalarize them, so that can leave extra instructions compared to where
we started when scalarizing other ops. If it's cheapToScalarize, then
we should be scalarizing.
llvm-svn: 349560
Clang uses weak linkage for objc runtime functions when they are not available on the platform.
The intrinsic has this linkage so we just need to pass that on to the runtime call.
llvm-svn: 349559
For performance reasons, clang set nonlazybind on these functions. Now that we
are using intrinsics instead of runtime calls, we should set this attribute when
creating the runtime functions.
llvm-svn: 349558
This patch adds a VectorizationSafetyStatus enum, which will be extended
in a follow up patch to distinguish between 'safe with runtime checks'
and 'known unsafe' dependences.
Reviewers: anemet, anna, Ayal, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D54892
llvm-svn: 349556
Summary:
unnamed_addr is still useful for detecting of ODR violations on vtables
Still unnamed_addr with lld and --icf=safe or --icf=all can trigger false
reports which can be avoided with --icf=none or by using private aliases
with -fsanitize-address-use-odr-indicator
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: kubamracek, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D55799
llvm-svn: 349555
The first test claims to show that the vectorizer will
generate a vector load/loop, but then this file runs
other passes which might scalarize that op. I'm removing
instcombine from the RUN line here to break that dependency.
Also, I'm generating full checks to make it clear exactly
what the vectorizer has done.
llvm-svn: 349554
SelectionDAG currently changes these intrinsics to function calls, but that won't work
for other ISel's. Also we want to eventually support nonlazybind and weak linkage coming
from the front-end which we can't do in SelectionDAG.
llvm-svn: 349552
Looks like there are valid reasons why we need to allow bitcasts in llvm.asan.globals, see discussion at https://github.com/apple/swift-llvm/pull/133. Let's look through bitcasts when iterating over entries in the llvm.asan.globals list.
Differential Revision: https://reviews.llvm.org/D55794
llvm-svn: 349544
We're moving ARC optimisation and ARC emission in clang away from runtime methods
and towards intrinsics. This is the part which actually uses the intrinsics in the ARC
optimizer when both analyzing the existing calls and emitting new ones.
Differential Revision: https://reviews.llvm.org/D55348
Reviewers: ahatanak
llvm-svn: 349534
We already had BSF here as part of __builtin_ffs improvements and I was just wondering yesterday whether we should have BSR there.
This addresses one issue from PR40090.
llvm-svn: 349531
Checking whether a number has a certain number of trailing / leading
zeros means checking whether it is of the form XXXX1000 / 0001XXXX,
which can be done with an and+icmp.
Related to https://bugs.llvm.org/show_bug.cgi?id=28668. As a next
step, this can be extended to non-equality predicates.
Differential Revision: https://reviews.llvm.org/D55745
llvm-svn: 349530
Summary: 32bit operand sizes are guaranteed by the opcode check AMDGPU::V_ADD_I32_e64 and
AMDGPU::V_ADDC_U32_e64. Therefore, we don't any additional operand size-check-assert.
Author: FarhanaAleen
llvm-svn: 349529
Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic
ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for
@llvm.sadd.sat() and @llvm.ssub.sat() intrinsics.
This is a followup to D55787 and part of PR40056.
Differential Revision: https://reviews.llvm.org/D55833
llvm-svn: 349520
InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant.
This patch recognizes this pattern and turns it into PSUBUS. Future work could improve undef element handling.
Fixes some of PR40053
Differential Revision: https://reviews.llvm.org/D55780
llvm-svn: 349519
Summary: This the initial code change to facilitate managing FMF flags from Instructions to MI wrt Intrinsics in Global Isel. Eventually the GlobalObserver interface will be added as well, where FMF additions can be tracked for the builder and CSE.
Reviewers: aditya_nandakumar, bogner
Reviewed By: bogner
Subscribers: rovka, kristof.beyls, javed.absar
Differential Revision: https://reviews.llvm.org/D55668
llvm-svn: 349514
When using clang with `-fno-unroll-loops` (implicitly added with `-O1`),
the LoopUnrollPass is not not added to the (legacy) pass pipeline. This
also means that it will not process any loop metadata such as
llvm.loop.unroll.enable (which is generated by #pragma unroll or
WarnMissedTransformationsPass emits a warning that a forced
transformation has not been applied (see
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html).
Such explicit transformations should take precedence over disabling
heuristics.
This patch unconditionally adds LoopUnrollPass to the optimizing
pipeline (that is, it is still not added with `-O0`), but passes a flag
indicating whether automatic unrolling is dis-/enabled. This is the same
approach as LoopVectorize uses.
The new pass manager's pipeline builder has no option to disable
unrolling, hence the problem does not apply.
Differential Revision: https://reviews.llvm.org/D55716
llvm-svn: 349509
Add support for s64 libcalls for G_SDIV, G_UDIV, G_SREM and G_UREM
and use integer type of correct size when creating arguments for
CLI.lowerCall.
Select G_SDIV, G_UDIV, G_SREM and G_UREM for types s8, s16, s32 and s64
on MIPS32.
Differential Revision: https://reviews.llvm.org/D55651
llvm-svn: 349499
Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes
UADDSAT and USUBSAT. As a side-effect, this also makes codegen for
the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable.
This only replaces use in the X86 backend, and does not move any of
the ADDUS/SUBUS X86 specific combines into generic codegen.
Differential Revision: https://reviews.llvm.org/D55787
llvm-svn: 349481
This is https://bugs.llvm.org/show_bug.cgi?id=39992,
If we have the following code (test.cpp):
thread_local int tdata = 24;
and build an .o file with debug information:
clang --target=x86_64-pc-linux -c bar.cpp -g
Then object produced may have R_X86_64_DTPOFF64/R_X86_64_DTPOFF32 relocations.
(clang emits R_X86_64_DTPOFF64 and gcc emits R_X86_64_DTPOFF32 for the code above for me)
Currently, llvm-dwarfdump fails to compute this TLS relocation when dumping
object and reports an
error:
failed to compute relocation: R_X86_64_DTPOFF64, Invalid data was encountered while parsing the file
This relocation represents the offset in the TLS block and resolved by the linker,
but this info is unavailable at the
point when the object file is dumped by this tool.
The patch adds the simple evaluation for such relocations to avoid emitting errors.
Resulting behavior seems to be equal to GNU dwarfdump.
Differential revision: https://reviews.llvm.org/D55762
llvm-svn: 349476
Add narrowScalar for G_AND and G_XOR.
Legalize G_AND G_OR and G_XOR for types other then s32
with clampScalar on MIPS32.
Differential Revision: https://reviews.llvm.org/D55362
llvm-svn: 349475
- Reapply changes intially introduced in r343089
- The archtecture info is no longer loaded whenever a DWARFContext is created
- The runtimes libraries (santiziers) make use of the dwarf context classes but
do not intialise the target info
- The architecture of the object can be obtained without loading the target info
- Adding a method to the dwarf context to get this information and multiplex the
string printing later on
Differential Revision: https://reviews.llvm.org/D55774
llvm-svn: 349472
This modifies the IPO pass so that it respects any explicit function
address space specified in the data layout.
In targets with nonzero program address spaces, all functions should, by
default, be placed into the default program address space.
This is required for Harvard architectures like AVR. Without this, the
functions will be marked as residing in data space, and thus not be
callable.
This has no effect to any in-tree official backends, as none use an
explicit program address space in their data layouts.
Patch by Tim Neumann.
llvm-svn: 349469
For opcodes not covered by SimplifyDemandedVectorElts, SimplifyDemandedBits might be able to help now that it supports demanded elts as well.
llvm-svn: 349466
When splitting up an alloca's uses we were dropping any explicit
alignment tags, which means they default to the ABI-required default
alignment and this can cause miscompiles if the real value was smaller.
Also refactor the TBAA metadata into a parent class since it's shared by
both children anyway.
llvm-svn: 349465
This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same).
Test change is merely a rescheduling.
llvm-svn: 349459
The pass implements tracking of control flow miss-speculation into a "taint"
register. That taint register can then be used to mask off registers with
sensitive data when executing under miss-speculation, a.k.a. "transient
execution".
This pass is aimed at mitigating against SpectreV1-style vulnarabilities.
At the moment, it implements the tracking of miss-speculation of control
flow into a taint register, but doesn't implement a mechanism yet to then
use that taint register to mask off vulnerable data in registers (something
for a follow-on improvement). Possible strategies to mask out vulnerable
data that can be implemented on top of this are:
- speculative load hardening to automatically mask of data loaded
in registers.
- using intrinsics to mask of data in registers as indicated by the
programmer (see https://lwn.net/Articles/759423/).
For AArch64, the following implementation choices are made.
Some of these are different than the implementation choices made in
the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
the instruction set characteristics result in different trade-offs.
- The speculation hardening is done after register allocation. With a
relative abundance of registers, one register is reserved (X16) to be
the taint register. X16 is expected to not clash with other register
reservation mechanisms with very high probability because:
. The AArch64 ABI doesn't guarantee X16 to be retained across any call.
. The only way to request X16 to be used as a programmer is through
inline assembly. In the rare case a function explicitly demands to
use X16/W16, this pass falls back to hardening against speculation
by inserting a DSB SYS/ISB barrier pair which will prevent control
flow speculation.
- It is easy to insert mask operations at this late stage as we have
mask operations available that don't set flags.
- The taint variable contains all-ones when no miss-speculation is detected,
and contains all-zeros when miss-speculation is detected. Therefore, when
masking, an AND instruction (which only changes the register to be masked,
no other side effects) can easily be inserted anywhere that's needed.
- The tracking of miss-speculation is done by using a data-flow conditional
select instruction (CSEL) to evaluate the flags that were also used to
make conditional branch direction decisions. Speculation of the CSEL
instruction can be limited with a CSDB instruction - so the combination of
CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL
aren't speculated. When conditional branch direction gets miss-speculated,
the semantics of the inserted CSEL instruction is such that the taint
register will contain all zero bits.
One key requirement for this to work is that the conditional branch is
followed by an execution of the CSEL instruction, where the CSEL
instruction needs to use the same flags status as the conditional branch.
This means that the conditional branches must not be implemented as one
of the AArch64 conditional branches that do not use the flags as input
(CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction
selectors to not produce these instructions when speculation hardening
is enabled. This pass will assert if it does encounter such an instruction.
- On function call boundaries, the miss-speculation state is transferred from
the taint register X16 to be encoded in the SP register as value 0.
Future extensions/improvements could be:
- Implement this functionality using full speculation barriers, akin to the
x86-slh-lfence option. This may be more useful for the intrinsics-based
approach than for the SLH approach to masking.
Note that this pass already inserts the full speculation barriers if the
function for some niche reason makes use of X16/W16.
- no indirect branch misprediction gets protected/instrumented; but this
could be done for some indirect branches, such as switch jump tables.
Differential Revision: https://reviews.llvm.org/D54896
llvm-svn: 349456
The default still is dwarf, but SEH exceptions can now be enabled
optionally for the MinGW target.
Differential Revision: https://reviews.llvm.org/D55748
llvm-svn: 349451
Power9 VABSDU* instructions can be exploited for some special vselect sequences.
Check in the orignal test case here, later the exploitation patch will update this
and reviewers can check the differences easily.
llvm-svn: 349446
Improve the current vec_abs support on P9, generate ISD::ABS node for vector types,
combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns,
do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity.
Differential Revision: https://reviews.llvm.org/D54783
llvm-svn: 349437