This is the Demorgan'd 'not' of the pattern handled in:
D63660 / rL364153
This is another intermediate IR step towards solving PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314
We can test if a value is not a power-of-2 using ctpop(X) > 1,
so combining that with an is-zero check of the input is the
same as testing if not exactly 1 bit is set:
(X == 0) || (ctpop(X) u> 1) --> ctpop(X) != 1
llvm-svn: 364246
Inference of nowrap flags in CVP has been disabled, because it
triggered a bug in LFTR (https://bugs.llvm.org/show_bug.cgi?id=31181).
This issue has been fixed in D60935, so we should be able to reenable
nowrap flag inference now.
Differential Revision: https://reviews.llvm.org/D62776
llvm-svn: 364228
This command prints a description of the referenced function's stack frame.
For each formal parameter and local variable, the tool prints:
- function name
- variable name
- file/line of declaration
- FP-relative variable location (if available)
- size in bytes
- HWASAN tag offset
This information will be used by the HWASAN runtime to identify local
variables in UAR reports.
Differential Revision: https://reviews.llvm.org/D63468
llvm-svn: 364225
Widen vector result type for ctlz_zero_undef and cttz_zero_undef the same as
ctlz and cttz.
Differential Revision: https://reviews.llvm.org/D63463
llvm-svn: 364221
There are quite some old testcases with powerpc- triple,
we should add this triple support so that we can update them with script.
Differential Revision: https://reviews.llvm.org/D63723
llvm-svn: 364213
Scalar extends to s64 can use S_BFE_{I64|U64}, but vector extends need
to extend to the 32-bit half, and then to 64.
I'm not sure what the line should be between what RegBankSelect
handles, and what instruction select does, but for now I'm erring on
the side of RegBankSelect for future post-RBS combines.
llvm-svn: 364212
Summary:
The LLVM disassembler assumes that the unused src0 operand of v_nop is
zero. Other tools can put another value in that field, which is still
valid. This commit fixes the LLVM disassembler to recognize such an
encoding as v_nop, in the same way as we already do for s_getpc.
Differential Revision: https://reviews.llvm.org/D63724
Change-Id: Iaf0363eae26ff92fc4ebc716216476adbff37a6f
llvm-svn: 364208
This patch does a few things to start cleaning up the isFNEG function.
-Remove the Op0/Op1 peekThroughBitcast calls that seem unnecessary. getTargetConstantBitsFromNode has its own peekThroughBitcast inside. And we have a separate peekThroughBitcast on the return value.
-Add a check of the scalar size after the first peekThroughBitcast to ensure we haven't changed the element size and just did something like f32->i32 or f64->i64.
-Remove an unnecessary check that Op1's type is floating point after the peekThroughBitcast. We're just going to look for a bit pattern from a constant. We don't care about its type.
-Add VT checks on several places that consume the return value of isFNEG. Due to the peekThroughBitcasts inside, the type of the return value isn't guaranteed. So its not safe to use it to build other nodes without ensuring the type matches the type being used to build the node. We might be able to replace these checks with bitcasts instead, but I don't have a test case so a bail out check seemed better for now.
Differential Revision: https://reviews.llvm.org/D63683
llvm-svn: 364206
Prefer the more exact intrinsic to remove a use of the input value
and possibly make further transforms easier (we will still need
to match patterns with funnel-shift of wider types as pieces of
bswap, especially if we want to canonicalize to funnel-shift with
constant shift amount). Discussed in D46760.
llvm-svn: 364187
This needs different handling if the source is known to be a valid
condition or not. Handle turning it into shifts or a select during
regbankselect.
llvm-svn: 364186
This matters for byval uses outside of the entry block, which appear
as copies.
Previously, the only folding done was during selection, which could
not see the underlying frame index. For any uses outside the entry
block, the frame index was materialized in the entry block relative to
the global scratch wave offset.
This may produce worse code in cases where the offset ends up not
fitting in the MUBUF offset field. A better heuristic would be helpfu
for extreme frames.
llvm-svn: 364185
This can occur under certain circumstances when undefs are created later on in the constant multipliers (e.g. in this case due to SimplifyDemandedVectorElts). Its better to let the shift by zero to occur and perform any cleanup afterward.
Fixes OSS Fuzz #15429
llvm-svn: 364179
Summary:
Handle smul.fix.sat in the scalarizer. This is done by
adding smul.fix.sat to the set of "isTriviallyVectorizable"
intrinsics.
The addition of smul.fix.sat in isTriviallyVectorizable and
hasVectorInstrinsicScalarOpd can also be seen as a preparation
to be able to use hasVectorInstrinsicScalarOpd in ConstantFolding.
Reviewers: rengolin, RKSimon, dblaikie
Reviewed By: rengolin
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63704
llvm-svn: 364177
This adds the family of loads and stores with names like VLD20.8 and
VST42.32, which load and store parts of multiple q-registers in such a
way that executing both VLD20 and VLD21, or all four of VLD40..VLD43,
will distribute 2 or 4 vectors' worth of memory data across the lanes
of the same number of registers but in a transposed order.
In addition to the Tablegen descriptions of the instructions
themselves, this patch also adds encode and decode support for the
QQPR and QQQQPR register classes (representing the range of loaded or
stored vector registers), and tweaks to the parsing system for lists
of vector registers to make it return the right format in this case
(since, unlike NEON, MVE regards q-registers as primitive, and not
just an alias for two d-registers).
llvm-svn: 364172
We do not need the elf-groups.x86_64. In one of the tests, it was
used for no solid reason, and for the second test case we can use
YAML input with SHT_GROUP sections.
The patch performs a cleanup of one of the test cases, removes another
one completely (since during the review was found out it actually
duplicates one of the existent tests) and removes the precompiled binary.
Differential revision: https://reviews.llvm.org/D63647
llvm-svn: 364167
Ideally we'd be able to represent this truncate as a any_extend to
v16i32 and a truncate, but SelectionDAG doens't know how to not
fold those together.
We have isel patterns to use a vpmovzxwd+vpdmovdb for the truncate,
but we aren't able to simultaneously fold the load and the store
from the isel pattern. By pulling the truncate into the store we
can successfully hide it from the DAG combiner. Then we can isel
pattern match the truncstore and load+any_extend separately.
llvm-svn: 364163
In rL364135, I taught IndVars to fold exiting branches in loops with a zero backedge taken count (i.e. loops that only run one iteration). This extends that to eliminate the dead comparison left around.
llvm-svn: 364155
This is another intermediate IR step towards solving PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314
We can test if a value is power-of-2-or-0 using ctpop(X) < 2,
so combining that with a non-zero check of the input is the
same as testing if exactly 1 bit is set:
(X != 0) && (ctpop(X) u< 2) --> ctpop(X) == 1
Differential Revision: https://reviews.llvm.org/D63660
llvm-svn: 364153
This turned out to be surprisingly effective. I was originally doing this just for completeness sake, but it seems like there are a lot of cases where SCEV's exit count reasoning is stronger than it's isKnownPredicate reasoning.
Once this is in, I'm thinking about trying to build on the same infrastructure to eliminate provably untaken checks. There may be something generally interesting here.
Differential Revision: https://reviews.llvm.org/D63618
llvm-svn: 364135
This is useful for allowing code to efficiently take an address
that can be later mapped onto debug info. Currently the hwasan
pass achieves this by taking the address of the current function:
http://llvm-cs.pcc.me.uk/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp#921
but this costs two instructions (plus a GOT entry in PIC code) per function
with stack variables. This will allow the cost to be reduced to a single
instruction.
Differential Revision: https://reviews.llvm.org/D63471
llvm-svn: 364126
-d code.
Summary:
Move it into `main` function so the checking is effective for all actions
user may do with llvm-objdump; notably, -r and -s in addition to existing -d.
Match GNU behavior.
Reviewers: jhenderson, grimar, MaskRay, rupprecht
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63631
llvm-svn: 364118
To help produce better diagnostics for stack use-after-return, we'd like
to be able to determine the addresses of each HWASANified function's local
variables given a small amount of information recorded on entry to the
function. Currently we require all HWASANified functions to use frame pointers
and record (PC, FP) on function entry. This works better than recording SP
because FP cannot change during the function, unlike SP which can change
e.g. due to dynamic alloca.
However, most variables currently end up using SP-relative locations in their
debug info. This prevents us from recomputing the address of most variables
because the distance between SP and FP isn't recorded in the debug info. To
address this, make the AArch64 backend prefer FP-relative debug locations
when producing debug info for HWASANified functions.
Differential Revision: https://reviews.llvm.org/D63300
llvm-svn: 364117
On Windows ARM64, intrinsic __debugbreak is compiled into brk #0xF000 which is
mapped to llvm.debugtrap in Clang. Instruction brk #F000 is the defined break
point instruction on ARM64 which is recognized by Windows debugger and
exception handling code, so llvm.debugtrap should map to it instead of
redirecting to llvm.trap (brk #1) as the default implementation.
Differential Revision: https://reviews.llvm.org/D63635
llvm-svn: 364115
The VM layout on iOS is not stable between releases. On 64-bit iOS and
its derivatives we use a dynamic shadow offset that enables ASan to
search for a valid location for the shadow heap on process launch rather
than hardcode it.
This commit extends that approach for 32-bit iOS plus derivatives and
their simulators.
rdar://50645192
rdar://51200372
rdar://51767702
Reviewed By: delcypher
Differential Revision: https://reviews.llvm.org/D63586
llvm-svn: 364105
128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg.
This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns.
Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining.
I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload.
Differential Revision: https://reviews.llvm.org/D63512
llvm-svn: 364095
With this we can now fully code generate jump tables, which is important for code size.
Differential Revision: https://reviews.llvm.org/D63223
llvm-svn: 364086
This change makes use of the newly refactored SwitchLoweringUtils code from
SelectionDAG to in order to generate jump tables and range checks where appropriate.
Much of this code is ported from SDAG with some modifications. We generate
G_JUMP_TABLE and G_BRJT instructions when JT opportunities are found. This means
that targets which previously relied on the naive one MBB per case stmt
translation will now start falling back until they add support for the new opcodes.
For range checks, we don't generate any previously unused operations. This
just recognizes contiguous ranges of case values and generates a single block per
range. Single case value blocks are just a special case of ranges so we get that
support almost for free.
There are still some optimizations missing that I haven't ported over, and
bit-tests are also unimplemented. This patch series is already complex enough.
Actual arm64 support for selection of jump tables is coming in a later patch.
Differential Revision: https://reviews.llvm.org/D63169
llvm-svn: 364085
This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).
Committed on behalf of @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D60897
llvm-svn: 364084
We already use vmovq for v2i64/v2f64 vzmovl. But we were using a
blendpd+xorpd for v4i64/v4f64/v8i64/v8f64 under opt speed. Or
movsd+xorpd under optsize.
I think the blend with 0 or movss/d is only needed for
vXi32 where we don't have an instruction that can move 32
bits from one xmm to another while zeroing upper bits.
movq is no worse than blendpd on any known CPUs.
llvm-svn: 364079
We sometimes get poor code size because constants of types < 32b are legalized
as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the
localizer can no longer sink them (although it's possible to extend it to do so).
On AArch64 however s8 and s16 constants can be selected in the same way as s32
constants, with a mov pseudo into a W register. If we make s8 and s16 constants
legal then we can avoid unnecessary truncates, they can be CSE'd, and the
localizer can sink them as normal.
There is a caveat: if the user of a smaller constant has to widen the sources,
we end up with an anyext of the smaller typed G_CONSTANT. This can cause
regressions because of the additional extend and missed pattern matching. To
remedy this, there's a new artifact combiner to generate the wider G_CONSTANT
if it's legal for the target.
Differential Revision: https://reviews.llvm.org/D63587
llvm-svn: 364075
We do not have to spread using the precompiled binaries in the tests,
when we can use YAML. This patch removes the dynrel.elf binary and adds
a few comments to the test cases.
Differential revision: https://reviews.llvm.org/D63641
llvm-svn: 364052
Summary:
The motivation for this was to propagate fast-math flags like nnan and
ninf on vector floating point operations to the corresponding scalar
operations to take advantage of follow-on optimizations. But I think
the same argument applies to all of our IR flags: if they apply to the
vector operation then they also apply to all the individual scalar
operations, and they might enable follow-on optimizations.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63593
llvm-svn: 364051
There are some test that are splitted into main part + input yaml for no visible reason.
This patch inines the yaml part for the 3 test cases I found.
Differential revision: https://reviews.llvm.org/D63644
llvm-svn: 364049
Summary:
LLVM Allows Targets to provide information that guides optimisations
made to LLVM IR. This is done with callbacks on a TargetTransformInfo object.
This patch adds a TargetTransformInfo class for RISC-V. This will allow us to
implement RISC-V specific callbacks as they become necessary.
This commit also adds the getIntImmCost callbacks, and tests them with a simple
constant hoisting test. Our immediate costs are on the conservative side, for
the moment, but we prevent hoisting in most circumstances anyway.
Previous review was on D63007
Reviewers: asb, luismarques
Reviewed By: asb
Subscribers: ributzka, MaskRay, llvm-commits, Jim, benna, psnobl, jocewei, PkmX, rkruppe, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, apazos, simoncook, johnrusso, rbar, hiraditya, mgorny
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63433
llvm-svn: 364046
This patch teaches the bottleneck analysis how to identify and print the most
expensive sequence of instructions according to the simulation. Fixes PR37494.
The goal is to help users identify the sequence of instruction which is most
critical for performance.
A dependency graph is internally used by the bottleneck analysis to describe
data dependencies and processor resource interferences between instructions.
There is one node in the graph for every instruction in the input assembly
sequence. The number of nodes in the graph is independent from the number of
iterations simulated by the tool. It means that a single node of the graph
represents all the possible instances of a same instruction contributed by the
simulated iterations.
Edges are dynamically "discovered" by the bottleneck analysis by observing
instruction state transitions and "backend pressure increase" events generated
by the Execute stage. Information from the events is used to identify critical
dependencies, and materialize edges in the graph. A dependency edge is uniquely
identified by a pair of node identifiers plus an instance of struct
DependencyEdge::Dependency (which provides more details about the actual
dependency kind).
The bottleneck analysis internally ranks dependency edges based on their impact
on the runtime (see field DependencyEdge::Dependency::Cost). To this end, each
edge of the graph has an associated cost. By default, the cost of an edge is a
function of its latency (in cycles). In practice, the cost of an edge is also a
function of the number of cycles where the dependency has been seen as
'contributing to backend pressure increases'. The idea is that the higher the
cost of an edge, the higher is the impact of the dependency on performance. To
put it in another way, the cost of an edge is a measure of criticality for
performance.
Note how a same edge may be found in multiple iteration of the simulated loop.
The logic that adds new edges to the graph checks if an equivalent dependency
already exists (duplicate edges are not allowed). If an equivalent dependency
edge is found, field DependencyEdge::Frequency of that edge is incremented by
one, and the new cost is cumulatively added to the existing edge cost.
At the end of simulation, costs are propagated to nodes through the edges of the
graph. The goal is to identify a critical sequence from a node of the root-set
(composed by node of the graph with no predecessors) to a 'sink node' with no
successors. Note that the graph is intentionally kept acyclic to minimize the
complexity of the critical sequence computation algorithm (complexity is
currently linear in the number of nodes in the graph).
The critical path is finally computed as a sequence of dependency edges. For
edges describing processor resource interferences, the view also prints a
so-called "interference probability" value (by dividing field
DependencyEdge::Frequency by the total number of iterations).
Examples of critical sequence computations can be found in tests added/modified
by this patch.
On output streams that support colored output, instructions from the critical
sequence are rendered with a different color.
Strictly speaking the analysis conducted by the bottleneck analysis view is not
a critical path analysis. The cost of an edge doesn't only depend on the
dependency latency. More importantly, the cost of a same edge may be computed
differently by different iterations.
The number of dependencies is discovered dynamically based on the events
generated by the simulator. However, their number is not fixed. This is
especially true for edges that model processor resource interferences; an
interference may not occur in every iteration. For that reason, it makes sense
to also print out a "probability of interference".
By construction, the accuracy of this analysis (as always) is strongly dependent
on the simulation (and therefore the quality of the information available in the
scheduling model).
That being said, the critical sequence effectively identifies a performance
criticality. Instructions from that sequence are expected to have a very big
impact on performance. So, users can take advantage of this information to focus
their attention on specific interactions between instructions.
In my experience, it works quite well in practice, and produces useful
output (in a reasonable amount time).
Differential Revision: https://reviews.llvm.org/D63543
llvm-svn: 364045
These instructions let you load half a vector register at once from
two general-purpose registers, or vice versa.
The assembly syntax for these instructions mentions the vector
register name twice. For the move _into_ a vector register, the MC
operand list also has to mention the register name twice (once as the
output, and once as an input to represent where the unchanged half of
the output register comes from). So we can conveniently assign one of
the two asm operands to be the output $Qd, and the other $QdSrc, which
avoids confusing the auto-generated AsmMatcher too much. For the move
_from_ a vector register, there's no way to get round the fact that
both instances of that register name have to be inputs, so we need a
custom AsmMatchConverter to avoid generating two separate output MC
operands. (And even that wouldn't have worked if it hadn't been for
D60695.)
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62679
llvm-svn: 364041
This adds the `MVE_qDest_rSrc` superclass and all its instances, plus
a few other instructions that also take a scalar input register or two.
I've also belatedly added custom diagnostic messages to the operand
classes for odd- and even-numbered GPRs, which required matching
changes in two of the existing MVE assembly test files.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62678
llvm-svn: 364040
llvm-mc or clang with -g normally produces debug info describing the
assembler source itself; however, if that source already contains some
.file/.loc directives, we should instead emit the debug info described
by those directives. For certain assembler sources seen in the wild
(particularly in the Chrome build) this was causing a crash due to
incorrect assumptions about legal sequences of assembler source text.
Fixes PR38994.
Differential Revision: https://reviews.llvm.org/D63573
llvm-svn: 364039
The sat add/sub tests still have unnecessary extract_subvector((vandnps ymm, ymm), 0) uses that should be split to (vandnps (extract_subvector(ymm, 0), extract_subvector(ymm, 0)), but its getting better.
llvm-svn: 364038
Summary:
This adds the `MVE_qDest_qSrc` superclass and all instructions that
inherit from it. It's not the complete class of _everything_ with a
q-register as both destination and source; it's a subset of them that
all have similar encodings (but it would have been hopelessly unwieldy
to call it anything like MVE_111x11100).
This category includes add/sub with carry; long multiplies; halving
multiplies; multiply and accumulate, and some more complex
instructions.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62677
llvm-svn: 364037
Many LLVM-based tools already support response files (i.e. files
containing a list of options, specified with '@'). This change simply
updates the documentation and help text for some of these tools to
include it. I haven't attempted to fix all tools, just a selection that
I am interested in.
I've taken the opportunity to add some tests for --help behaviour, where
they were missing. We could expand these tests, but I don't think that's
within scope of this patch.
This fixes https://bugs.llvm.org/show_bug.cgi?id=42233 and
https://bugs.llvm.org/show_bug.cgi?id=42236.
Reviewed by: grimar, MaskRay, jkorous
Differential Revision: https://reviews.llvm.org/D63597
llvm-svn: 364036
--help and -h are automatically supported by the command-line parser,
unless overridden by the tool. The behaviour of the PrintHelpMessage
being used for -h prior to this patch is subtly different to that
provided by --help automatically (it omits certain elements of help text
and options, such as --help-list), so overriding the default is not
desirable, without good reason. This patch removes the explicit
specification of -h and its behaviour, so that the default behaviour is
used.
Reviewed by: hintonda
Differential Revision: https://reviews.llvm.org/D63565
llvm-svn: 364029
Summary:
These take a pair of vector register to compare, and a comparison type
(written in the form of an Arm condition suffix); they output a vector
of booleans in the VPR register, where predication can conveniently
use them.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62676
llvm-svn: 364027
G_INTTOPTR can prevent the localizer from moving G_CONSTANTs, but since it's
essentially a side effect free cast instruction we can remat both instructions.
This patch changes the localizer to enable localization of the chains by
iterating over the entry block instructions in reverse order. That way, uses will
localized first, and then the defs are free to be localized as well.
This also changes the previous SmallPtrSet of localized instructions to use a
SetVector instead. We're dealing with pointers and need deterministic iteration
order.
Overall, this change improves ARM64 -O0 CTMark code size by around 0.7% geomean.
Differential Revision: https://reviews.llvm.org/D63630
llvm-svn: 364001
Also, add a FIXME for the unsafe transform on a unary FNeg. A unary FNeg can only be transformed to a FMul by -1.0 when the nnan flag is present. The unary FNeg project is a WIP, so the unsafe transformation is acceptable until that work is complete.
The bogus assert with introduced in D63445.
llvm-svn: 363998
As discussed in PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314
Improving the canonicalization for these patterns:
rL363956
...means we should adjust/enhance the related simplification.
https://rise4fun.com/Alive/w1cp
Name: isPow2 or zero
%x = and i32 %xx, 2048
%a = add i32 %x, -1
%r = and i32 %a, %x
=>
%r = i32 0
llvm-svn: 363997
Every called function could possibly need this to calculate the
absolute address of stack objectst, and this avoids inserting a copy
around every call site in the kernel. It's also somewhat cleaner to
keep this in a callee saved SGPR.
llvm-svn: 363990
Currently, many profiling tests on Solaris FAIL like
Command Output (stderr):
--
Undefined first referenced
symbol in file
__llvm_profile_register_names_function /tmp/lit_tmp_Nqu4eh/infinite_loop-9dc638.o
__llvm_profile_register_function /tmp/lit_tmp_Nqu4eh/infinite_loop-9dc638.o
Solaris 11.4 ld supports the non-standard GNU ld extension of adding
__start_SECNAME and __stop_SECNAME labels to sections whose names are valid
as C identifiers. Given that we already use Solaris 11.4-only features
like ld -z gnu-version-script-compat and fully working .preinit_array
support in compiler-rt, we don't need to worry about older versions of
Solaris ld.
The patch documents that support (although the comment in
lib/Transforms/Instrumentation/InstrProfiling.cpp
(needsRuntimeRegistrationOfSectionRange) is quite cryptic what it's
actually about), and adapts the affected testcase not to expect the
alternativeq __llvm_profile_register_functions and __llvm_profile_init.
It fixes all affected tests.
Tested on amd64-pc-solaris2.11.
Differential Revision: https://reviews.llvm.org/D41111
llvm-svn: 363984
Summary:
The getClobberingMemoryAccess API checks for clobbering accesses in a loop by walking the backedge. This may check if a memory access is being
clobbered by the loop in a previous iteration, depending how smart AA got over the course of the updates in MemorySSA (it does not occur when built from scratch).
If no clobbering access is found inside the loop, it will optimize to an access outside the loop. This however does not mean that access is safe to sink.
Given:
```
for i
load a[i]
store a[i]
```
The access corresponding to the load can be optimized to outside the loop, and the load can be hoisted. But it is incorrect to sink it.
In order to sink the load, we'd need to check no Def clobbers the Use in the same iteration. With this patch we currently restrict sinking to either
Defs not existing in the loop, or Defs preceding the load in the same block. An easy extension is to ensure the load (Use) post-dominates all Defs.
Caught by PR42294.
This issue also shed light on the converse problem: hoisting stores in this same scenario would be illegal. With this patch we restrict
hoisting of stores to the case when their corresponding Defs are dominating all Uses in the loop.
Reviewers: george.burgess.iv
Subscribers: jlebar, Prazek, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63582
llvm-svn: 363982
I added a canonicalization to create this general pattern in:
rL363956
But as noted in PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314#c11
...we have a (potentially expensive) simplification for the version
of the code that we just canonicalized away from, so we should
add/adjust that code to match.
llvm-svn: 363981
This fixes CodeGen/available-externally-suppress.c when the new pass manager is
turned on by default. available_externally was not emitted during -O2 -flto
runs when it should still be retained for link time inlining purposes. This can
be fixed by checking that we aren't LTOPrelinking when adding the
EliminateAvailableExternallyPass.
Differential Revision: https://reviews.llvm.org/D63580
llvm-svn: 363971
Summary:
BLSI sets the C flag is the input is not zero. So if its followed
by a TEST of the input where only the Z flag is consumed, we can
replace it with the opposite check of the C flag.
We should be able to do the same for BLSMSK and BLSR, but the
naive test case for those is being optimized to a subo by
CodeGenPrepare.
Reviewers: spatel, RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63589
llvm-svn: 363957
The form that compares against 0 is better because:
1. It removes a use of the input value.
2. It's the more standard form for this pattern: https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
3. It results in equal or better codegen (tested with x86, AArch64, ARM, PowerPC, MIPS).
This is a root cause for PR42314, but probably doesn't completely answer the codegen request:
https://bugs.llvm.org/show_bug.cgi?id=42314
Alive proof:
https://rise4fun.com/Alive/9kG
Name: is power-of-2
%neg = sub i32 0, %x
%a = and i32 %neg, %x
%r = icmp eq i32 %a, %x
=>
%dec = add i32 %x, -1
%a2 = and i32 %dec, %x
%r = icmp eq i32 %a2, 0
Name: is not power-of-2
%neg = sub i32 0, %x
%a = and i32 %neg, %x
%r = icmp ne i32 %a, %x
=>
%dec = add i32 %x, -1
%a2 = and i32 %dec, %x
%r = icmp ne i32 %a2, 0
llvm-svn: 363956
Thought of this case while working on something else. We appear to get it right in all of the variations I tried, but that's by accident. So, add a test which would catch the potential bug.
llvm-svn: 363953
The attribute can specify elimination for leaf or non-leaf, so it
should always be considered. I copied this bug from AArch64, which
probably should also be fixed.
llvm-svn: 363949
This includes integer arithmetic of various kinds (add/sub/multiply,
saturating and not), and the immediate forms of VMOV and VMVN that
load an immediate into all lanes of a vector.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62674
llvm-svn: 363936
Introducing VCC defs during SIFixSGPRCopies is generally
problematic. Avoid it by starting with the VOP3 form with the general
condition register. This is the easiest to fix instance, but doesn't
solve any specific problems I'm looking at.
llvm-svn: 363904
The ARMDisassembler changes allow changing between ARM and Thumb mode
based on the MCSubtargetInfo, rather than the Target, which simplifies
the other changes a bit.
I'm not really happy with adding more target-specific logic to
tools/llvm-objdump/, but there isn't any easy way around it: the logic
in question specifically applies to disassembling an object file, and
that code simply isn't located in lib/Target, at least at the moment.
Differential Revision: https://reviews.llvm.org/D60927
llvm-svn: 363903
Summary:
Stop referring to "numeric expression", using simply the term
"expression" instead. Likewise for numeric operation since operations
are only used in numeric expressions.
Reviewers: jhenderson, jdenny, probinson, arichardson
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63500
llvm-svn: 363901
Summary:
Make use of Error and Expected to bubble up diagnostics and force
checking of errors in the callers.
Reviewers: jhenderson, jdenny, probinson, arichardson
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63125
llvm-svn: 363900
Teach IndVarSimply's LinearFunctionTestReplace transform to handle multiple exit loops. LFTR does two key things 1) it rewrites (all) exit tests in terms of a common IV potentially eliminating one in the process and 2) it moves any offset/indexing/f(i) style logic out of the loop.
This turns out to actually be pretty easy to implement. SCEV already has all the information we need to know what the backedge taken count is for each individual exit. (We use that when computing the BE taken count for the loop as a whole.) We basically just need to iterate through the exiting blocks and apply the existing logic with the exit specific BE taken count. (The previously landed NFC makes this super obvious.)
I chose to go ahead and apply this to all loop exits instead of only latch exits as originally proposed. After reviewing other passes, the only case I could find where LFTR form was harmful was LoopPredication. I've fixed the latch case, and guards aren't LFTRed anyways. We'll have some more work to do on the way towards widenable_conditions, but that's easily deferred.
I do want to note that I added one bit after the review. When running tests, I saw a new failure (no idea why didn't see previously) which pointed out LFTR can rewrite a constant condition back to a loop varying one. This was theoretically possible with a single exit, but the zero case covered it in practice. With multiple exits, we saw this happening in practice for the eliminate-comparison.ll test case because we'd compute a ExitCount for one of the exits which was guaranteed to never actually be reached. Since LFTR ran after simplifyAndExtend, we'd immediately turn around and undo the simplication work we'd just done. The solution seemed obvious, so I didn't bother with another round of review.
Differential Revision: https://reviews.llvm.org/D62625
llvm-svn: 363883
Summary:
This is unfortunately needed for correctness, if we are to extend the tolerance of the update API to the way simple loop unswitch is doing cloning.
In simple loop unswitch (as opposed to loop unswitch), not all blocks are cloned. This can create unreachable cloned blocks (no predecessor), which are later cleaned up.
In MemorySSA, the APIs for supporting these kind of updates (clone + update exit blocks), make certain assumption on the integrity of the CFG. When cloning, if something was not cloned, it's values in MemorySSA default to LiveOnEntry. When updating exit blocks, it is safe to assume that we can first insert phis in the blocks merging two clones, then add additional phis in the IDF of the blocks that received phis. This no longer holds true if one of the clones being merged comes from an unreachable block. We'd conservatively need to add all phis before filling in their incoming definitions. In practice this restriction can be relaxed if we clean up trivial phis after the first round of insertion.
Reviewers: george.burgess.iv
Subscribers: jlebar, Prazek, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63354
llvm-svn: 363880
The def instruction for the vreg may not match, because it may be
folding through a reg_sequence. The assert was overly conservative and
not necessary. It's not actually important if DefMI really defined the
register, because the fold that will be done cares about the def of
the value that will be folded.
For some reason copies aren't making it through the reg_sequence,
although they should.
llvm-svn: 363876
Turns out that we can save an instruction by folding the right shift into
the compare.
Differential Revision: https://reviews.llvm.org/D63568
llvm-svn: 363874
This reapplies r363678, using the correct chain for the CopyToReg for
v0. glueCopyToM0 counterintuitively changes the operands of the
original node.
llvm-svn: 363870
This is an exception to the rule that we should prefer xmm ops to ymm ops.
As shown in PR42305:
https://bugs.llvm.org/show_bug.cgi?id=42305
...the store folding opportunity with vextractf128 may result in better
perf by reducing the instruction count.
Differential Revision: https://reviews.llvm.org/D63517
llvm-svn: 363853
We already do this for ZERO_EXTEND/ZERO_EXTEND_VECTOR_INREG - this just extends the pattern matcher to recognize cases where we don't need the zeros in the extension.
llvm-svn: 363841
This includes all the obvious bitwise operations (AND, OR, BIC, ORN,
MVN) in register-to-register forms, and the immediate forms of
AND/OR/BIC/ORN; byte-order reverse instructions; and the VMOVs that
access a single lane of a vector.
Some of those VMOVs (specifically, the ones that access a 32-bit lane)
share an encoding with existing instructions that were disassembled as
accessing half of a d-register (e.g. `vmov.32 r0, d1[0]`), but in
8.1-M they're now written as accessing a quarter of a q-register (e.g.
`vmov.32 r0, q0[2]`). The older syntax is still accepted by the
assembler.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62673
llvm-svn: 363838
The original line was there from when this test was added, but it is
checking for a switch that doesn't exist, so really has no purpose, at
least any more.
llvm-svn: 363833
The check against the output of `od` in the affected tests expect a
specific input offset format. They also expect a specific offset value,
not consistent with the EXAMPLE section for `od` in POSIX.1-2017
Chapter 4, while using the `-j` option. In particular, the example shows
that the input offset begins at 0 following the bytes skipped.
This patch adjusts the matching of the input offset to be more generic.
In order to avoid false matches, it restricts the number of bytes to be
formatted.
llvm-svn: 363829
greater-than-sign is not a BRE special character.
POSIX.1-2017 XBD Section 9.3.2 indicates that the interpretation of `\>`
is undefined. This patch replaces the pattern.
llvm-svn: 363828
Summary:
This patch teaches ConstantFolding to constant fold
both scalar and vector variants of llvm.smul.fix and
llvm.smul.fix.sat.
As described in the LangRef rounding is unspecified for
these instrinsics. If the result cannot be represented
exactly the default behavior in ConstantFolding is to
round down towards negative infinity. If a target has a
preferred rounding that is different some kind of target
hook would be needed (same strategy as used by the
SelectionDAG legalizer).
Reviewers: nikic, leonardchan, RKSimon
Reviewed By: leonardchan
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63385
llvm-svn: 363811
Vector load/store instructions support an optional alignment field
that the compiler can use to provide known alignment info to the
hardware. If the field is used (and the information is correct),
the hardware may be able (on some models) to perform faster memory
accesses than otherwise.
This patch adds support for alignment hints in the assembler and
disassembler, and fills in known alignment during codegen.
llvm-svn: 363806
The test only checks the existence of the `Types CU list` line.
Unfortunately I can't make a better test because
{gcc,clang} -fuse-ld={lld,gold} --gdb-index do not give me a non-empty types CU list.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D63537
llvm-svn: 363800
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases.
This requires us to tweak matchBinaryPredicate to allow it to (optionally) handle constants with different type widths.
llvm-svn: 363792
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024
The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:
A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.
In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.
I have set up a separate review D61933 for a fix which is required for this patch.
Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse
Reviewed By: hfinkel, jmorse
Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D60831
> llvm-svn: 363046
llvm-svn: 363786
Summary:
The test case does an (out of bounds) load from a global constant with
type <3 x float>. InstSimplify tried to turn this into an integer load
of the whole alloc size of the vector, which is 128 bits due to
alignment padding, and then bitcast this to <3 x vector> which failed
an assertion due to the type size mismatch.
The fix is to do an integer load of the normal size of the vector, with
no alignment padding.
Reviewers: tpr, arsenm, majnemer, dstuttard
Reviewed By: arsenm
Subscribers: hfinkel, wdng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63375
llvm-svn: 363784
This patch allows immediates (and CSR alias immediates) which start with
a tilde token or an exclaim (!) token to be parsed as intended.
Differential Revision: https://reviews.llvm.org/D57320
llvm-svn: 363783
Since the parser attempts to parse an operand as a register with
parentheses before parsing it as an immediate, immediates in
parentheses should not be parsed by parseRegister. However in the case
where the immediate does not start with an identifier, the LParen is not
unlexed and so the RParen causes an unexpected token error.
This patch adds the missing UnLex, and modifies the existing UnLex to
not use a buffered token, as it should always be unlexing an LParen.
Differential Revision: https://reviews.llvm.org/D57319
llvm-svn: 363782
Summary:
llvm.x86.sse.stmxcsr only writes to memory.
llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE.
Reviewers: craig.topper, RKSimon
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62896
llvm-svn: 363773
This patch adds lowering for global TLS addresses for the TLS models of
InitialExec, GlobalDynamic, LocalExec and LocalDynamic.
LocalExec support required using a 4-operand add instruction, which uses
the fourth operand to express a relocation on the symbol. The necessary
fixup is emitted when the instruction is emitted.
Differential Revision: https://reviews.llvm.org/D55305
llvm-svn: 363771
r363757 renamed ExpandISelPseudo to FinalizeISel, so the RUN line in
select-optimize-multiple.mir needed updating to refer to finalize-isel.
llvm-svn: 363762
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.
Patch by Matthias Braun
llvm-svn: 363757
Summary:
Converting the result *.{all,any}_true to a bool at the source level
generates LLVM IR that compares the result to 0. This check is
redundant since these instructions already return either 0 or 1 and
therefore conform to the BooleanContents setting for WebAssembly. This
CL adds patterns to detect and remove such redundant operations on the
result of Boolean reductions.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63529
llvm-svn: 363756
Summary:
Historically llvm-objdump prints the path to a dylib as well as the
dylib's compatibility version and current version number. This change
extends this information by adding the kind of dylib load: weak,
reexport, etc.
rdar://51383512
Reviewers: pete, lhames
Reviewed By: pete
Subscribers: rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62866
llvm-svn: 363746
[SROA] Enhance SROA to handle `addrspacecast`ed allocas
- Fix typo in original change
- Add additional handling to ensure all return pointers are properly
casted.
Summary:
- After `addrspacecast` is allowed to be eliminated in SROA, the
adjusting of storage pointer (from `alloca) needs to handle the
potential different address spaces between the storage pointer (from
alloca) and the pointer being used.
Reviewers: arsenm
Subscribers: wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63501
llvm-svn: 363743
Summary:
When identifing instructions that can be folded into a MOVCC instruction,
checking for a predicate operand is not enough, also need to check for
thumb2 function, with restrict-IT, is the machine instruction eligible for
ARMv8 IT or not.
Notes in ARMv8-A Architecture Reference Manual, section "Partial deprecation of IT"
https://usermanual.wiki/Pdf/ARM20Architecture20Reference20ManualARMv8.1667877052.pdf
"ARMv8-A deprecates some uses of the T32 IT instruction. All uses of IT that apply to
instructions other than a single subsequent 16-bit instruction from a restricted set
are deprecated, as are explicit references to the PC within that single 16-bit
instruction. This permits the non-deprecated forms of IT and subsequent instructions
to be treated as a single 32-bit conditional instruction."
Reviewers: efriedma, lebedev.ri, t.p.northover, jmolloy, aemerson, compnerd, stoklund, ostannard
Reviewed By: ostannard
Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63474
llvm-svn: 363739
Summary:
DAGCombine will normally turn a `(shl (add x, c1), c2)` into `(add (shl x, c2), c1 << c2)`, where `c1` and `c2` are constants. This can be prevented by a callback in TargetLowering.
On RISC-V, materialising the constant `c1 << c2` can be more expensive than materialising `c1`, because materialising the former may take more instructions, and may use a register, where materialising the latter would not.
This patch implements the hook in RISCVTargetLowering to prevent this transform, in the cases where:
- `c1` fits into the immediate field in an `addi` instruction.
- `c1` takes fewer instructions to materialise than `c1 << c2`.
In future, DAGCombine could do the check to see whether `c1` fits into an add immediate, which might simplify more targets hooks than just RISC-V.
Reviewers: asb, luismarques, efriedma
Reviewed By: asb
Subscribers: xbolva00, lebedev.ri, craig.topper, lewis-revill, Jim, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62857
llvm-svn: 363736
This patch teaches the Verifier how to detect broken !llvm.loop
attachments as discussed in https://reviews.llvm.org/D60831. This
allows LLVM to warn and strip out the broken debug info before
attempting an LTO compilation with input generated by LLVM predating
https://reviews.llvm.org/rL361149.
rdar://problem/51631158
Differential Revision: https://reviews.llvm.org/D63499
[Re-applies r363725 without changes after fixing a broken testcase.]
llvm-svn: 363731
This patch teaches the Verifier how to detect broken !llvm.loop
attachments as discussed in https://reviews.llvm.org/D60831. This
allows LLVM to warn and strip out the broken debug info before
attempting an LTO compilation with input generated by LLVM predating
https://reviews.llvm.org/rL361149.
rdar://problem/51631158
Differential Revision: https://reviews.llvm.org/D63499
llvm-svn: 363725
Summary:
- After `addrspacecast` is allowed to be eliminated in SROA, the
adjusting of storage pointer (from `alloca) needs to handle the
potential different address spaces between the storage pointer (from
alloca) and the pointer being used.
Reviewers: arsenm
Subscribers: wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63501
llvm-svn: 363711
This includes saturating and non-saturating shifts, both with
immediate shift count and with the shift counts given by another
vector register; VSHLC (in which the bits shifted out of each active
vector lane are shifted in to the next active lane); and also VMOVL,
which is enough like an immediate shift that it didn't fit too badly
in this category.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62672
llvm-svn: 363696
Summary:
These form a small family of their own, to go with the floating-point
VMINNM/VMAXNM instructions added in a previous commit.
They introduce the first of many special cases in the mnemonic
recognition code, because VMIN with the E suffix used by the VPT
predication system needs to avoid being interpreted as the nonexistent
instruction 'VMI' with an ordinary 'NE' condition suffix.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62671
llvm-svn: 363695
Part of fixing the X86 regression noted in D63281 - I've split this into X86 and generic parts - the generic commit will be coming shortly and will fix the vector-reduce-mul-widen.ll regression introduced here.
llvm-svn: 363693
Summary:
Their names began with a mishmash of `MVE_`, `t2` and no prefix at
all. Now they all start with `MVE_`, which seems like a reasonable
choice on the grounds that (a) NEON is the thing they're most at risk
of being confused with, and (b) MVE implies Thumb-2, so a prefix
indicating MVE is strictly more specific than one indicating Thumb-2.
Reviewers: ostannard, SjoerdMeijer, dmgreen
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63492
llvm-svn: 363690
This patch adds support for generating calls through the procedure
linkage table where required for a given ExternalSymbol or GlobalAddress
callee.
Differential Revision: https://reviews.llvm.org/D55304
llvm-svn: 363686
1) `-x foo` currently dumps one `foo`. This change makes it dump all `foo`.
2) `-x foo -x foo` currently dumps `foo` twice. This change makes it dump `foo` once.
In addition, if foo has section index 9, `-x foo -x 9` dumps `foo` once.
3) Give a warning instead of an error if `foo` does not exist.
The new behaviors match GNU readelf.
Also, print a new line as a separator between two section dumps.
GNU readelf uses two lines, but one seems good enough.
Reviewed By: grimar, jhenderson
Differential Revision: https://reviews.llvm.org/D63475
llvm-svn: 363683
This adds vector splitting for vaarg instructions during type legalization
Committed on behalf of @luke (Luke Lau)
Differential Revision: https://reviews.llvm.org/D60762
llvm-svn: 363671
Recommit of D32530 with a few small changes:
- Stopped recursively walking through aggregates in
the verifier, so that we don't impose too much
overhead on large modules under LTO (see PR42210).
- Changed tests to match; the errors are slightly
different since they only report the array or
struct that actually contains a scalable vector,
rather than all aggregates which contain one in
a nested member.
- Corrected an older comment
Reviewers: thakis, rengolin, sdesmalen
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D63321
llvm-svn: 363658
The isel patterns for these use a bitcast and load/store, but
DAG combine should have canonicalized those away.
For the purposes of the memory folding table these opcodes can be
replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes.
llvm-svn: 363644
Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt.
Use the new versions in patterns that previously used a COPY_TO_REGCLASS
to VR128. These patterns expect the upper bits to be zero. The
current set up appears to work, but I'm not sure we should be
enforcing upper bits being zero through a COPY_TO_REGCLASS.
I wanted to flip the arrangement and use a COPY_TO_REGCLASS to
FR32/FR64 for the patterns that need an f32/f64 result, but that
complicated fastisel and globalisel.
I've been doing some experiments with reducing some isel patterns
and ended up in a situation where I had a
(SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our
post-isel peephole was unable to avoid using an instruction for
the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128
instruction removes the COPY_TO_REGCLASS that was breaking this.
llvm-svn: 363643
Summary: Implements bug [[ https://bugs.llvm.org/show_bug.cgi?id=42204 | 42204 ]]. llvm-strip now warns when the same input file is used more than once, and errors when stdin is used more than once.
Reviewers: jhenderson, rupprecht, espindola, alexshap
Reviewed By: jhenderson, rupprecht
Subscribers: emaste, arichardson, jakehehrlich, MaskRay, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63122
llvm-svn: 363638
This was ignoring the flag on fneg, and using the source instruction's
flags. Also fixes tests missing from r358702.
Note the expansion itself isn't correct without nnan, but that should
be fixed separately.
llvm-svn: 363637
This saves roughly 32 bytes of instructions per function with stack objects
and causes us to preserve enough information that we can recover the original
tags of all stack variables.
Now that stack tags are deterministic, we no longer need to pass
-hwasan-generate-tags-with-calls during check-hwasan. This also means that
the new stack tag generation mechanism is exercised by check-hwasan.
Differential Revision: https://reviews.llvm.org/D63360
llvm-svn: 363636
The goal is to improve hwasan's error reporting for stack use-after-return by
recording enough information to allow the specific variable that was accessed
to be identified based on the pointer's tag. Currently we record the PC and
lower bits of SP for each stack frame we create (which will eventually be
enough to derive the base tag used by the stack frame) but that's not enough
to determine the specific tag for each variable, which is the stack frame's
base tag XOR a value (the "tag offset") that is unique for each variable in
a function.
In IR, the tag offset is most naturally represented as part of a location
expression on the llvm.dbg.declare instruction. However, the presence of the
tag offset in the variable's actual location expression is likely to confuse
debuggers which won't know about tag offsets, and moreover the tag offset
is not required for a debugger to determine the location of the variable on
the stack, so at the DWARF level it is represented as an attribute so that
it will be ignored by debuggers that don't know about it.
Differential Revision: https://reviews.llvm.org/D63119
llvm-svn: 363635
Inter-block localization is the same as what currently happens, except now it
only runs on the entry block because that's where the problematic constants with
long live ranges come from.
The second phase is a new intra-block localization phase which attempts to
re-sink the already localized instructions further right before one of the
multiple uses.
One additional change is to also localize G_GLOBAL_VALUE as they're constants
too. However, on some targets like arm64 it takes multiple instructions to
materialize the value, so some additional heuristics with a TTI hook have been
introduced attempt to prevent code size regressions when localizing these.
Overall, these changes improve CTMark code size on arm64 by 1.2%.
Full code size results:
Program baseline new diff
------------------------------------------------------------------------------
test-suite...-typeset/consumer-typeset.test 1249984 1217216 -2.6%
test-suite...:: CTMark/ClamAV/clamscan.test 1264928 1232152 -2.6%
test-suite :: CTMark/SPASS/SPASS.test 1394092 1361316 -2.4%
test-suite...Mark/mafft/pairlocalalign.test 731320 714928 -2.2%
test-suite :: CTMark/lencod/lencod.test 1340592 1324200 -1.2%
test-suite :: CTMark/kimwitu++/kc.test 3853512 3820420 -0.9%
test-suite :: CTMark/Bullet/bullet.test 3406036 3389652 -0.5%
test-suite...ark/tramp3d-v4/tramp3d-v4.test 8017000 8016992 -0.0%
test-suite...TMark/7zip/7zip-benchmark.test 2856588 2856588 0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test 765704 765704 0.0%
Geomean difference -1.2%
Differential Revision: https://reviews.llvm.org/D63303
llvm-svn: 363632
Summary: This case is related to D63405 in that we need to be propagating FMF on negates.
Reviewers: volkan, spatel, arsenm
Reviewed By: arsenm
Subscribers: wdng, javed.absar
Differential Revision: https://reviews.llvm.org/D63458
llvm-svn: 363631
This patch really contains two pieces:
Teach SCEV how to fold a phi in the header of a loop to the value on the backedge when a) the backedge is known to execute at least once, and b) the value is safe to use globally within the scope dominated by the original phi.
Teach IndVarSimplify's rewriteLoopExitValues to allow loop invariant expressions which already exist (and thus don't need new computation inserted) even in loops where we can't optimize away other uses.
Differential Revision: https://reviews.llvm.org/D63224
llvm-svn: 363619
Summary:
Change the way we deal with iterator invalidation in the extload combines as it
was still possible to neglect to visit a use. Even worse, it happened in the
in-tree test cases and the checks weren't good enough to detect it.
We now take a cheap copy of the use list before iterating over it. This
prevents iterator invalidation from occurring and has the nice side effect
of making the existing schedule-for-erase/schedule-for-insert mechanism
moot.
Reviewers: aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, javed.absar, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61813
llvm-svn: 363616
AMDGPUPropagateAttributes will not work on function bitcatsts,
so move AMDGPUFixFunctionBitcasts before it.
Differential Revision: https://reviews.llvm.org/D63455
llvm-svn: 363614
Recommit r363289 with a bug fix for crash identified in pr42279. Issue was that a loop exit test does not have to be an icmp, leading to a null dereference crash when new logic was exercised for that case. Test case previously committed in r363601.
Original commit comment follows:
This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV.
The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying. As such, our potential UB triggering use does not change the semantics of the original program.
As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well.
(Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.)
Differential Revision: https://reviews.llvm.org/D62939
llvm-svn: 363613
Summary:
The purpose of the padding is to guard against stale code being
fetched into the instruction cache by the lowest level prefetching.
We're generating relocatable ELF here, and so the padding should
arguably be added by the linker. This is in fact what Mesa does.
This also fixes multi-part shaders for Mesa.
Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82
Reviewers: arsenm, rampitec, t-tye
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63427
llvm-svn: 363602
Summary:
This is related to the changes to the groupstaticsize intrinsic in
D61494 which would otherwise make the related tests in these files
fail or much less useful.
Note that for some reason, SOPK generation is less effective in the
amdhsa OS, which is why I chose PAL. I haven't investigated this
deeper.
Change-Id: I6bb99569338f7a433c28b4c9eb1e3e036b00d166
Reviewers: arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63392
llvm-svn: 363600
Summary:
Update compare normalization in SimpleValue hashing to break ties (when
the same value is being compared to itself) by switching to the swapped
predicate if it has a lower numerical value. This brings the hashing in
line with isEqual, which already recognizes the self-compares with
swapped predicates as equal.
Fixes PR 42280.
Reviewers: spatel, efriedma, nikic, fhahn, uabelho
Reviewed By: nikic
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63349
llvm-svn: 363598
Summary:
LoopRotate doesn't create a faithful clone of an instruction, it may
simplify it beforehand. Hence the clone of an instruction that has a
MemoryDef associated may not be a definition, but a use or not a memory
alternig instruction.
Don't rely on the template when the clone may be simplified.
Reviewers: george.burgess.iv
Subscribers: jlebar, Prazek, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63355
llvm-svn: 363597
Basically porting over the behaviour in AArch64ISelLowering to GISel. See
emitComparison for reference.
When we have something like this:
```
lhs = G_SUB 0, y
...
G_ICMP lhs, rhs
```
We can fold away the G_SUB and produce a cmn instead, given that we produce
the same value in NZCV.
Add a test showing that the transformation works, and also showing that we
don't perform the transformation when it's unsafe.
Also factor out the CSet emission into emitCSetForICMP.
Differential Revision: https://reviews.llvm.org/D63163
llvm-svn: 363596
If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64).
llvm-svn: 363592
Summary:
Add all MemoryPhis in IDF before filling in their incomign values.
Otherwise, a new Phi can be added that needs to become the incoming
value of another Phi.
Test fails the verification in verifyPrevDefInPhis.
Reviewers: george.burgess.iv
Subscribers: jlebar, Prazek, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63353
llvm-svn: 363590
The pass works in two modes:
Mode 1: Just set attributes starting from kernels. This can work at
the very beginning of opt and llc pipeline, but cannot clone functions
because it must be a function pass.
Mode 2: Actually clone functions for new attributes. This can only work
after all function passes in the opt pipeline because it has to be a
module pass.
Differential Revision: https://reviews.llvm.org/D63208
llvm-svn: 363586
If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it.
llvm-svn: 363582
When considering a loop containing nontemporal stores or loads for
vectorization, suppress the vectorization if the corresponding
vectorized store or load with the aligment of the original scaler
memory op is not supported with the nontemporal hint on the target.
This adds two new functions:
bool isLegalNTStore(Type *DataType, unsigned Alignment) const;
bool isLegalNTLoad(Type *DataType, unsigned Alignment) const;
to TTI, leaving the target independent default implementation as
returning true, but with overriding implementations for X86 that
check the legality based on available Subtarget features.
This fixes https://llvm.org/PR40759
Differential Revision: https://reviews.llvm.org/D61764
llvm-svn: 363581
A target intrinsic may be defined as possibly reading memory, but the
call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic assumption
of the intrinsic definition, so the chain should still be used.
I fixed the same bug in SelectionDAG in r287593.
llvm-svn: 363580
I keep using the wrong instruction when manually writing tests. This
really needs to check the number of operands, but I don't see an easy
way to do that right now.
llvm-svn: 363579
Use -fsave-optimization-record=<format> to specify a different format
than the default, which is YAML.
For now, only YAML is supported.
llvm-svn: 363573
For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway.
First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets.
Differential Revision: https://reviews.llvm.org/D63246
llvm-svn: 363564
If an addrspacecast needed to be inserted again, this was creating a
clone of the original cast for each user. Just use the original, which
also saves losing the value name.
llvm-svn: 363562
Should also be marked writeonly, but I think that would require
splitting the version with done set to a separate intrinsic
Test change is only from renumbering the attribute group numbers,
which for some reason the generated check lines consider.
llvm-svn: 363560
The HardwareLoops pass finds exit blocks with a scevable exit count.
If the target specifies to update the loop counter in a register,
through a phi, we need to ensure that the exit block is a latch so
that we can insert the phi with the correct value for the incoming
edge.
Differential Revision: https://reviews.llvm.org/D63336
llvm-svn: 363556
Summary:
Avoid that loop vectorizer creates loads/stores of vectors
with "irregular" types when interleaving. An example of
an irregular type is x86_fp80 that is 80 bits, but that
may have an allocation size that is 96 bits. So an array
of x86_fp80 is not bitcast compatible with a vector
of the same type.
Not sure if interleavedAccessCanBeWidened is the best
place for this check, but it solves the problem seen
in the added test case. And it is the same kind of check
that already exists in memoryInstructionCanBeWidened.
Reviewers: fhahn, Ayal, craig.topper
Reviewed By: fhahn
Subscribers: hiraditya, rkruppe, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63386
llvm-svn: 363547
This patch tests the forward-referencing added in D62995 by changing
some existing intrinsics to use forward referencing of overloadable
parameters, rather than backward referencing.
This patch changes the TableGen definition/implementation of
llvm.aarch64.neon.ld2lane and llvm.aarch64.neon.ld2lane intrinsics
(and similar for ld3 and ld4). This change is intended to be
non-functional, since the behaviour of the intrinsics is
expected to be the same.
Reviewers: arsenm, dmgreen, RKSimon, greened, rnk
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D63189
llvm-svn: 363546
Some GEPs were not being split, presumably because that split would just be
undone by the DAGCombiner. Not performing those splits can prevent important
optimizations, such as preventing the element indices / member offsets from
being (partially) folded into load/store instruction immediates. This patch:
- Makes the splits also occur in the cases where the base address and the GEP
are in the same BB.
- Ensures that the DAGCombiner doesn't reassociate them back again.
Differential Revision: https://reviews.llvm.org/D60294
llvm-svn: 363544
This is already done in DAGCombiner::visitINSERT_SUBVECTOR, but this helps a number of shuffles across different vector widths recognise when they come from the same source.
llvm-svn: 363542
Second functional change following on from rL362687. Pass the
NoWrapFlags from the MulExpr to InsertBinop when we're generating a
shl or mul.
Differential Revision: https://reviews.llvm.org/D61934
llvm-svn: 363540
This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.
This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.
Reviewers: arsenm, thegameg, qcolombet
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D60137
llvm-svn: 363533
Third time's the charm.
This was reverted in r363220 due to being suspected of an internal benchmark
regression and a test failure, none of which turned out to be caused by this.
llvm-svn: 363529
SimplifyCFG has a bug that results in inconsistent prof branch_weights metadata
if unreachable switch cases are removed. This patch fixes this bug by making use
of the newly introduced SwitchInstProfUpdateWrapper class (see patch D62122).
A new test is created.
Differential Revision: https://reviews.llvm.org/D62186
llvm-svn: 363527
Summary:
SPE passes doubles the same as soft-float, in register pairs as i32
types. This is all handled by the target-independent layer. However,
this is not optimal when splitting or reforming the doubles, as it
pushes to the stack and loads from, on either side.
For instance, to pass a double argument to a function, assuming the
double value is in r5, the sequence currently looks like this:
evstdd 5, X(1)
lwz 3, X(1)
lwz 4, X+4(1)
Likewise, to form a double into r5 from args in r3 and r4:
stw 3, X(1)
stw 4, X+4(1)
evldd 5, X(1)
This optimizes the fence to use SPE instructions. Now, to pass a double
to a function:
mr 4, 5
evmergehi 3, 5, 5
And to form a double into r5 from args in r3 and r4:
evmergelo 5, 3, 4
This is comparable to the way that gcc generates the double splits.
This also fixes a bug with expanding builtins to libcalls, where the
LowerCallTo() code path was generating intermediate illegal type nodes.
Reviewers: nemanjai, hfinkel, joerg
Subscribers: kbarton, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54583
llvm-svn: 363526
Summary:
Currently, MachOWriter::writeSectionData writes dummy data (0xdeadbeef) to fill section data areas in the file even if the section is a virtual one. Since virtual sections don't occupy any space in the file, writing dummy data could results the "OS.tell() - fileStart <= Sec.offset" assertion failure.
This patch fixes the bug by simply not writing any dummy data for virtual sections.
Reviewers: beanz, jhenderson, rupprecht, alexshap
Reviewed By: alexshap
Subscribers: compnerd, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62991
llvm-svn: 363525
Summary:
The "sparc"/"sparcel" architectures appears in ArchMap (used by -B option) but not in OutputFormatMap (used by -I/-O option). Add their targets into OutputFormatMap for consistency.
Note that AFAIK there're no targets for 32-bit little-endian SPARC ("elf32-sparcel") in GNU binutils.
Reviewers: espindola, alexshap, rupprecht, jhenderson, compnerd, jakehehrlich
Reviewed By: jhenderson, compnerd, jakehehrlich
Subscribers: jyknight, emaste, arichardson, fedor.sergeev, jakehehrlich, MaskRay, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63238
llvm-svn: 363524
Fix folds of addo and subo with an undef operand to be:
`@llvm.{u,s}{add,sub}.with.overflow` all fold to `{ undef, false }`,
as per LLVM undef rules.
Same for commuted variants.
Based on the original version of the patch by @nikic.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42209 | PR42209 ]]
Differential Revision: https://reviews.llvm.org/D63065
llvm-svn: 363522
Summary:
Instead of encoding a high-word of 0 using a fake TargetGlobalAddress,
just use a literal target constant. This simplifies some subsequent changes.
The generated assembly is now more explicit about the kind of relocation
that is to be used.
Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61491
llvm-svn: 363516
This is based on the example/discussion in PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428
Proper vector shift instructions don't appear until AVX2, so we may generate several
extra instructions within a loop trying to compensate for that. It's difficult to
recover from that shift expansion later than this, so use the existing TLI hook and
splat analysis to enable better codegen.
This extends CGP functionality introduced with:
rL201655
Differential Revision: https://reviews.llvm.org/D63233
llvm-svn: 363511
This is similar logic/motivation to the select splitting in D62969.
In D63233, the pattern changes so that we no longer have an extract_subvector of vselect,
but the operands of the select are still being concatenated.
The closest case is represented in either the first or last test diffs here - we have an
extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions.
I think that's the right trade-off for most AVX1 targets.
In the example based on PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428
...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx).
Differential Revision: https://reviews.llvm.org/D63364
llvm-svn: 363508
Summary:
If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that
we can decrease cache misses and branch-prediction misses. Actual alignment of
the loop will depend on the hotness check and other logic in alignBlocks.
The old code will only align hot loop to 32 bytes when the LoopSize larger than
16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop
to 32 bytes not only for the hot loop whose size is 16~32 bytes.
Reviewed By: steven.zhang, jsji
Differential Revision: https://reviews.llvm.org/D61228
llvm-svn: 363495
Based on D59959, this switches SCEV to use unsigned/signed range
intersection based on the sign hint. This will prefer non-wrapping
ranges in the relevant domain. I've left the one intersection in
getRangeForAffineAR() to use the smallest intersection heuristic,
as there doesn't seem to be any obvious preference there.
Differential Revision: https://reviews.llvm.org/D60035
llvm-svn: 363490
If we can detect that saturating math that depends on an IV cannot
overflow, replace it with simple math. This is similar to the CVP
optimization from D62703, just based on a different underlying
analysis (SCEV vs LVI) that catches different cases.
Differential Revision: https://reviews.llvm.org/D62792
llvm-svn: 363489
This reverts rL363474. -debug-only=isel was added to some tests that
don't specify `REQUIRES: asserts`. This causes failures on
-DLLVM_ENABLE_ASSERTIONS=off builds.
I chose to revert instead of fixing the tests because I'm not sure
whether we should add `REQUIRES: asserts` to more tests.
llvm-svn: 363482
Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:
* a latch block
* it has two successors, one is loop header, another is exit
* it has more than one predecessors
If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.
Differential Revision: https://reviews.llvm.org/D43256
llvm-svn: 363471
with 'objc_arc_inert'
Those calls are no-ops, so they can be safely deleted.
rdar://problem/49839633
Differential Revision: https://reviews.llvm.org/D62433
llvm-svn: 363468
Currently you get extra waits, because waits are inserted for the
register dependencies of the call, and the function prolog waits on
everything.
Currently waits are still inserted on returns. It may make sense to
not do this, and wait in the caller instead.
llvm-svn: 363465
There is a circular dependency between SROA and InferAddressSpaces
today that requires running both multiple times in order to be able to
eliminate all simple allocas and addrspacecasts. InferAddressSpaces
can't remove addrspacecasts when written to memory, and SROA helps
move pointers out of memory.
This should avoid inserting new commuting addrspacecasts with GEPs,
since there are unresolved questions about pointer wrapping between
different address spaces.
For now, don't replace volatile operations that don't match the alloca
addrspace, as it would change the address space of the access. It may
be still OK to insert an addrspacecast from the new alloca, but be
more conservative for now.
llvm-svn: 363462
The way SelectionDAG treats memory operands is very frustrating, and
by default drops them unless a property is set on the pattern. There
is no pattern for manually selected instructions, so this requires
manually setting them.
llvm-svn: 363455
This is a branch opcode that takes a jump table pointer, jump table index and an
index into the table to do an indirect branch.
We pass both the table pointer and JTI to allow targets like ARM64 to more
easily use the existing jump table compression optimization without having to
walk up the block to find a paired G_JUMP_TABLE.
Differential Revision: https://reviews.llvm.org/D63159
llvm-svn: 363434