These are going to be useful in TargetLowering::SimplifyDemandedBits, so expose these helpers outside of SelectionDAG.cpp
Also add an getValidShiftAmountConstant early-out to getValidMinimumShiftAmountConstant/getValidMaximumShiftAmountConstant so we can use them for scalar cases as well.
Produce an unmerge to a narrower type and introduce a narrower shift
if needed. I wasn't sure if there was a better way to parameterize the
target's preferred shift type for the GICombineRule, so manually call
the combine helper.
Summary:
This patch implements the part of the calling convention
where SVE Vectors are passed by reference. This means the
caller must allocate stack space for these objects and
pass the address to the callee.
Reviewers: efriedma, rovka, cameron.mcinally, c-rhodes, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71216
This adds another pattern to the combiner for a case that we were not handling
to generate the REV16 instruction for ARM/Thumb2 and a bswap+ror on X86.
Differential Revision: https://reviews.llvm.org/D74032
Summary:
When using strict fp, it is required to update the
chain when performing integer type promotion of a
operand to a integer to floating point conversion.
Reviewers: craig.topper, john.brawn
Reviewed By: craig.topper
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74597
Follow-up for D74006.
When the integrated assembler is used, we use SHF_LINK_ORDER. The
linked-to symbol is part of ELFSectionKey, thus we can omit the unique
ID.
In https://reviews.llvm.org/rG8b737688c21a9755cae14cb9343930e0882164ab I
switched the condition gating the creation of the descriptor symbol from
checking the MCAsmInfo if we need to support descriptors, to if the OS
was AIX. Technically the 2 should be interchangeable: if we are
targeting AIX then we need to emit XCOFF object files, and the MCAsmInfo
must return true for needing function descriptors.
This doesn't account for lit test with runsteps that only set the arch.
Eg: test/CodeGen/XCore/section-name.ll
which when run natively on AIX we end up with a target xcore-ibm-aix and
needFunctionDescriptors is false.
This patch reverts to using the MCAsmInfo and adds an assert that the
target OS must be AIX since that is the only target using the descriptor
hook.
Differential Revision: https://reviews.llvm.org/D74622
This is more or less directly ported from the AMDGPU custom lowering
for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES
instead of creating shift/trunc to extract the two halves, and zexting
an inverted compare instead of select_cc).
This also does not include the fast math expansion the DAG which
converts to f32 and then to f16. I think that belongs in a
pre-legalize combine instead.
Like COPY instructions explained in D70616, we don't check the constraints
when combining G_UNMERGE_VALUES. Use the same logic used in D70616 to check
if registers can be replaced, or a COPY instruction needs to be built.
https://reviews.llvm.org/D70564
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.
== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.
By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.
This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.
== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".
== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).
When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.
When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.
Differential Revision: https://reviews.llvm.org/D71775
replaceDbgDeclare is used to update the descriptions of stack variables
when they are moved (e.g. by ASan or SafeStack). A side effect of
replaceDbgDeclare is that it moves dbg.declares around in the
instruction stream (typically by hoisting them into the entry block).
This behavior was introduced in llvm/r227544 to fix an assertion failure
(llvm.org/PR22386), but no longer appears to be necessary.
Hoisting a dbg.declare generally does not create problems. Usually,
dbg.declare either describes an argument or an alloca in the entry
block, and backends have special handling to emit locations for these.
In optimized builds, LowerDbgDeclare places dbg.values in the right
spots regardless of where the dbg.declare is. And no one uses
replaceDbgDeclare to handle things like VLAs.
However, there doesn't seem to be a positive case for moving
dbg.declares around anymore, and this reordering can get in the way of
understanding other bugs. I propose getting rid of it.
Testing: stage2 RelWithDebInfo sanitized build, check-llvm
rdar://59397340
Differential Revision: https://reviews.llvm.org/D74517
Patchable statepoint is lowered into sequence of nops, so zeroed call target
should not be on register. It is better to use getTargetConstant instead
of getConstant to select zero constant for call target.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D74465
Current tail duplication embedded in MBP duplicates a BB into all or none of its predecessors without too much cost analysis. So sometimes it is duplicated into cold predecessors, and in other cases it may miss the duplication into hot predecessors.
This patch improves tail duplication in 3 aspects:
A successor can be duplicated into part of its predecessors.
A more fine-grained benefit analysis, combined with 1, now a successor is duplicated into hot predecessors only.
If a successor can't be duplicated into one predecessor, it doesn't impact the duplication into other predecessors.
Differential Revision: https://reviews.llvm.org/D73387
Summary:
This was a very odd API, where you had to pass a flag into a zext
function to say whether the extended bits really were zero or not. All
callers passed in a literal true or false.
I think it's much clearer to make the function name reflect the
operation being performed on the value we're tracking (rather than on
the KnownBits Zero and One fields), so zext means the value is being
zero extended and new function anyext means the value is being extended
with unknown bits.
NFC.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74482
The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not.
This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied.
It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on.
Differential Revision: https://reviews.llvm.org/D74221
Summary:
Right now the alignment of the lower half of a store is computed as
align/2, which fails for unaligned stores (align = 1), and is overly
pessimitic for, e.g. a 8 byte store aligned to 4 bytes.
Fixes PR44851
Fixes PR44877
Reviewers: gchatelet, spatel, lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74311
This patch enables the debug entry values feature.
- Remove the (CC1) experimental -femit-debug-entry-values option
- Enable it for x86, arm and aarch64 targets
- Resolve the test failures
- Leave the llc experimental option for targets that do not
support the CallSiteInfo yet
Differential Revision: https://reviews.llvm.org/D73534
Summary:
The method attempts to find loads that can be legally clustered by
looking for loads consuming the same chain glue token.
However, the old code looks at _all_ users of values produced by the
chain node -- including uses of the loaded/returned value of volatile
loads or atomics. This could lead to circular dependencies which then
failed during scheduling.
With this change, we filter out users by getResNo, i.e. by which
SDValue value they use, to ensure that we only look at users of the
chain glue token.
This appears to be a rather old bug, which is perhaps surprising.
However, the test case is actually quite fragile (i.e., it is hidden
by fairly small changes), and the test _must_ use volatile loads for
the bug to manifest.
Reviewers: arsenm, bogner, craig.topper, foad
Subscribers: MatzeB, jvesely, wdng, hiraditya, javed.absar, jfb, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74253
This adds a strict version of FP16_TO_FP and FP_TO_FP16 and uses
them to implement soft promotion for the half type. This is
enough to provide basic support for __fp16 with strictfp.
Add the necessary X86 support to use VCVTPS2PH/VCVTPH2PS when F16C
is enabled.
Instructions marked as FrameSetup do not cause requestLabelAfterInsn to
be called and so no such label is generated. Call instructions which
require call site entries to be generated require this label to be
present in order to calculate the return PC offset/address, but the
check for whether the call instruction is marked as FrameSetup was not
present.
Therefore in the case where a call instruction is marked as FrameSetup,
an assertion failure occurs if a call site entry is to be generated.
This is the case with RISC-V's implementation of save/restore via
library calls.
Differential Revision: https://reviews.llvm.org/D71593
Fixup the UserValue methods to use FragmentInfo instead of DIExpression because
the DIExpression is only ever used to get the to get the FragmentInfo. The
DIExpression is meaningless in the UserValue class because each definition point
added to a UserValue may have a unique DIExpression.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D74057
Rename the class DbgValueLocation to DbgVariableValue and instances from Loc to
DbgValue. These names better express the new semantics introduced in D74053.
The class previously represented a { Location } only. It now represents a
{ Location, DIExpression } pair which together describe a value.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D74055
LiveDebugVariables uses interval maps to explicitly represent DBG_VALUE
intervals. DBG_VALUEs are filtered into an interval map based on their {
Variable, DIExpression }. The interval map will coalesce adjacent entries that
use the same { Location }. Under this model, DBG_VALUEs which refer to the same
bits of the same variable will be filtered into different interval maps if they
have different DIExpressions which means the original intervals will not be
properly preserved.
This patch fixes the problem by using { Variable, Fragment } to filter the
DBG_VALUEs into maps, and coalesces adjacent entries iff they have the same
{ Location, DIExpression } pair.
The solution is not perfect because we see the similar issues appear when
partially overlapping fragments are encountered, but is far simpler than a
complete solution (i.e. D70121).
Fixes: pr41992, pr43957
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D74053
A downstream test exposed a simple logic bug with the manual pointer
stripping code, fix that by just using stripPointerCasts() on the value.
I don't think there's a way to expose this issue upstream.
SUMMARY:
The patch is enable to support Mergeable2ByteCString and Mergeable4ByteCString
Reviewers: daltenty
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D74164
Add a simplification to fuse a manual vector extract with shifts and
truncate into a bitcast.
Unpacking and packing values into vectors is only optimized with
extractelement instructions, not when manually unpacked using shifts
and truncates.
This patch simplifies shifts and truncates into a bitcast if possible.
Simplify (build_vec (trunc $1)
(trunc (srl $1 width))
(trunc (srl $1 (2 * width))) ...)
to (bitcast $1)
Differential Revision: https://reviews.llvm.org/D73892
Use the isCandidateForCallSiteEntry().
This should mostly be an NFC, but there are some parts ensuring
the moveCallSiteInfo() and copyCallSiteInfo() operate with call site
entry candidates (both Src and Dest should be the call site entry
candidates).
Differential Revision: https://reviews.llvm.org/D74122
Remove code from LegalizeTypes that allowed this to work.
We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
Summary:
For CTTZ we place a set bit just past where the non-promoted type
stopped so the extended bits won't be used for the count. For
CTTZ_ZERO_UNDEF we don't care what happens if no bits are set in
the original type and we end up counting into the extended bits.
So we can just use ANY_EXTEND for both cases.
This matches what is done in type legalization for these operations.
We make no effort to force the upper bits to zero.
Differential Revision: https://reviews.llvm.org/D74111
Calls to ObjC's objc_msgSend function are done by bitcasting the function global
to the required function type signature. This patch looks through this bitcast
so that we can do a direct call with bl on arm64 instead of using an indirect blr.
Differential Revision: https://reviews.llvm.org/D74241
Add the isCandidateForCallSiteEntry predicate to MachineInstr to
determine whether a DWARF call site entry should be created for an
instruction.
For now, it's enough to have any call instruction that doesn't belong to
a blacklisted set of opcodes. For these opcodes, a call site entry isn't
meaningful.
Differential Revision: https://reviews.llvm.org/D74159
Allows more flexible use of buildMerge in places where
use operands are available as SrcOp since it does not
require explicit conversion to Register.
Simplify code with new buildMerge.
Differential Revision: https://reviews.llvm.org/D74223
This is a one off special case, since actually implementing full inline asm
support will be much more involved. This lets us compile a lot more code as a
common simple case.
Differential Revision: https://reviews.llvm.org/D74201
Printing floating point number in decimal is inconvenient for humans.
Verbose asm output will print out floating point values in comments, it
helps.
But in lots of cases, users still need additional work to covert the
decimal back to hex or binary to check the bit patterns,
especially when there are small precision difference.
Hexadecimal form is one of the supported form in LLVM IR, and easier for
debugging.
This patch try to print all FP constant in hex form instead.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D73566
The type passed to lower was invalid, so I'm not sure how this was
even working before. The source and destination type also do not have
to match, so make sure to use the right ones.
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.
Reviewers: courbet
Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73964
Summary:
This patch reorders the emission of debug_str section, so that
string can come after macros.
This is necessary for macro forms like DW_MACRO_define_strp,
which emits macro as a string in debug_str section.
"linked-to section" is used by the ELF spec. By analogy, "linked-to
symbol" is a good name for the signature symbol. The word "linked-to"
implies a directed edge and makes it clear its relation with "sh_link",
while one can argue that "associated" means an undirected edge.
Also, combine tests and add precise SMLoc to improve diagnostics.
Reviewed By: eugenis, grimar, jhenderson
Differential Revision: https://reviews.llvm.org/D74082
This reverts commit ed29dbaafa.
I'm backing out D68945, which as the discussion for D73526 shows, doesn't
seem to handle the -O0 path through the codegen backend correctly. I'll
reland the patch when a fix is worked out, apologies for all the churn.
The two parent commits are part of this revert too.
Conflicts:
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
llvm/test/DebugInfo/X86/dbg-addr-dse.ll
SelectionDAGBuilder conflict is due to a nearby change in e39e2b4a79
that's technically unrelated. dbg-addr-dse.ll conflicted because
41206b61e3 (legitimately) changes the order of two lines.
There are further modifications to dbg-value-func-arg.ll: it landed after
the patch being reverted, and I've converted indirection to be represented
by the isIndirect field rather than DW_OP_deref.
This reverts commit 3137fe4d23.
I'm backing out D68945, which this patch is a follow up for. It'll be
re-landed when D68945 is fixed.
The changes to dbg-value-func-arg.ll occur because our handling of certain
kinds of location now mixes up indirection that happens at different points
in a DIExpression. While this is a regression, it's a return to the prior
behaviour while a better patch is sought.
This reverts commit 2d3174c4df.
The overall solution for this problem is reverting D68945, which wasn't
handling the -O0 path through the codegen backend correctly. See:
discussion in D73526.
To find the instruction in the block for a given ID, first a count and then a
lookup was performed in the map, which is almost the same thing, thus doing
double the work.
Differential Revision: https://reviews.llvm.org/D73866
Summary:
ARM Type Promotion pass does not clear
the container that defines if one variable
was visited or not, missing optimization
opportunities by luck when two llvm:Values
from different functions are allocated at
the same memory address.
Also fixes a comment and uses existing
method to pop and obtain last element
of the worklist.
Reviewers: samparker
Reviewed By: samparker
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73970
This is a compile-time optimization for PHIElimination (splitting of critical
edges), which was reported at https://bugs.llvm.org/show_bug.cgi?id=44249. As
discussed there, the way to remedy the slowdowns with huge functions is to
pre-compute the live-in registers for each MBB in an efficient way in
PHIElimination.cpp and then pass that information along to
LiveVariabless::addNewBlock().
In all the huge test programs where this slowdown has been noticable, it has
dissapeared entirely with this patch.
Review: Björn Pettersson, Quentin Colombet.
Differential Revision: https://reviews.llvm.org/D73152
The legalizer produces a lot of these, and they make reading legalized
MIR annoying. For some reason, this does seem to sometimes introduce
copies of implicit def, which is dumb.
contractCrossBankCopyIntoStore() finds the instruction defines the
source register and uses its output to replace the register. There are,
however, instructions that have multiple outputs, e.g. G_UNMERGE_VALUES.
Current implementation hardcodes to operand 0 and has no way of knowing
which output should be used.
This change adds another function to directly return the register that
is the source of the register and use that for folding.
This fixes https://bugs.llvm.org/show_bug.cgi?id=44783
Differential Revision: https://reviews.llvm.org/D74005
We currently only handle mem instructions with a single define.
Avoid the call site parameter debug info when we find the case with
multiple defs, rather than throwing an assert.
Differential Revision: https://reviews.llvm.org/D73954
Summary:
This reverts commit 3ef169e586. The
purpose of this commit was to allow stack machines to perform
instruction selection for instructions with variadic defs. However,
MachineInstrs fundamentally cannot support variadic defs right now, so
this change does not turn out to be useful.
Depends on D73927.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73928
Originally committed in: 1ced28cbe7
Reverted in: f75301d16d
(reverted due to tests failing on non-linux/x86 targets, tests have since been
generalized and specialized... since Split DWARF isn't supported on non-elf
targets anyway and we have no way to run on "whatever elf target is available"
so they fail on MacOS without an explicit target triple)
This code was incorrectly emitting extra bytes into arbitrary parts of
the object file when it was meant to be hashing them to compute the DWO
ID.
Follow-up patch(es) will refactor this API somewhat to make such bugs
harder to introduce, hopefully.
This extends the RemarkStreamer to allow for other emitters (e.g.
frontends, SIL, etc.) to emit remarks through a common interface.
See changes in llvm/docs/Remarks.rst for motivation and design choices.
Differential Revision: https://reviews.llvm.org/D73676
The CATCHPAD node mostly existed to be selected into the EH_RESTORE
instruction, which sets the frame back up when 32-bit Windows exceptions
return to the parent function. However, creating this MachineInstr early
increases the risk that other passes will come along and insert
instructions that use the stack before ESP and EBP are restored. That
happened in PR44697.
Instead of representing these in the instruction stream early, delay it
until PEI. Mark the blocks where this needs to happen as EHPads, but not
funclet entry blocks. Passes after PEI have to be careful not to hoist
instructions that can use stack across frame setup instructions, so this
should be relatively reliable.
Fixes PR44697
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D73752
shouldOptimizeForSize is showing up in a profile, spending around 10%
of the pass time in one function. This should probably not be so slow,
but the much cheaper attribute check should be done first anyway.
AMDGPU and x86 at least both have separate controls for whether
denormal results are flushed on output, and for whether denormals are
implicitly treated as 0 as an input. The current DAGCombiner use only
really cares about the input treatment of denormals.
This patch reverts part of r362750 / D62650, which stopped
LiveDebugVariables from trimming leading variable location ranges down
to only covering those instructions that are in scope. I've observed some
circumstances where the number of DBG_VALUEs in a function can be
amplified in an un-necessary way, to cover more instructions that are
out of scope, leading to very slow compile times. Trimming the range
of instructions that the variables cover solves the slow compile times.
The specific problem that r362750 tries to fix is addressed by the
assignment to RStart that I've added. Any variable location that begins
at the first instruction of a block will now be considered to begin at the
start of the block. While these sound the same, the have different
SlotIndexes, and the register allocator may shoehorn additional
instructions in between the two. The test added in the past
(wrong_debug_loc_after_regalloc.ll) still works with this modification.
live-debug-variables.ll has a range trimmed to not cover the prologue of
the function, while dbg-addr-dse.ll has a DBG_VALUE sink past one
instruction with no DebugLoc, which is expected behaviour.
Differential Revision: https://reviews.llvm.org/D73691
Ensure that OptLevelChanger::SavedFastISel is initialized in the constructor.
This should be NFC - as the equivalent 'same opt level' early-out is used in the destructor as well, so SavedFastISel is only actually referenced in the general case.
Differential Revision: https://reviews.llvm.org/D73875
Under MVE, we do not have any lowering for fminimum, which a
vector_reduce_fmin without NoNan will be expanded into. As with the
other recent patches, force this to expand in the pre-isel pass. Note
that Neon lowering would be OK because the scalar fminimum uses the
vector VMIN instruction, but is probably better to just rely on the
scalar operations, which is what is done here.
Also fixes what appears to be the reversal of INF vs -INF in the
vector_reduce_fmin widening code.
This code was incorrectly emitting extra bytes into arbitrary parts of
the object file when it was meant to be hashing them to compute the DWO
ID.
Follow-up patch(es) will refactor this API somewhat to make such bugs
harder to introduce, hopefully.
Start using a new strategy with a combination of merge and unmerges.
This allows scalarizing before lowering, which in cases like
<2 x s128> avoids producing giant illegal shifts.
RegAllocGreedy uses a fairly compile time intensive splitting heuristic
called region splitting. This heuristic was disabled via another heuristic
when it is likely that it won't be worth the compile time. The only way
to control this other heuristic was via a command line option (huge-size-for-split).
This commit gives more control on this heuristic by making it overridable
by the target using a target hook in TargetRegisterInfo called
shouldRegionSplitForVirtReg.
The default implementation of this hook keeps the heuristic as it was
before this patch.
We have to be careful in SimplifyDemandedBits with loads in case we attempt to combine back to a constant (which then gets turned into a constant pool load again), but we can at least set the upper KnownBits for a ZEXTLoad to zero.
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73885
Summary:
A Copy with a source that is zeros is the same as a Set of zeros.
This fixes the invariant that SrcAlign should always be non-null.
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73791
The code paths in the absence of TargetMachine, TargetLowering or
TargetRegisterInfo are poorly tested. As rL285987 said, requiring
TargetPassConfig allows us to delete many (untested) checks littered
everywhere.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D73754
The current FirstMI.getDebugLoc() is actually null in almost all cases.
If it isn't, the generated .loc will be considered initial. The .loc
will have the prologue_end flag and terminate the prologue prematurely.
Also use an overload of BuildMI that will not prepend
PATCHABLE_FUNCTION_ENTRY to a MachineInstr bundle.
This is based on this llvm-dev thread http://lists.llvm.org/pipermail/llvm-dev/2019-December/137521.html
The current strategy for f16 is to promote type to float every except where the specific width is required like loads, stores, and bitcasts. This results in rounding occurring in odd places instead of immediately after arithmetic operations. This interacts in weird ways with the __fp16 type in clang which is a storage only type where arithmetic is always promoted to float. InstCombine can remove some fpext/fptruncs around such arithmetic and turn it into arithmetic on half. This wouldn't be so bad if SelectionDAG was able to put those fpext/fpround back in when it promotes.
It is also not obvious how to handle to make the existing strategy work with STRICT fp. We need to use STRICT versions of the conversions which require chain operands. But if the conversions are created for a bitcast, there is no place to get an appropriate chain from.
This patch implements a different strategy where conversions are emitted directly around arithmetic operations. And otherwise its passed around as an i16 including in arguments and return values. This can result in more conversions between arithmetic operations, but is closer to matching the IR the frontend generates for __fp16. And it will allow us to use the chain from constrained arithmetic nodes to link the STRICT_FP_TO_FP16/STRICT_FP16_TO_FP that will need to be added. I've set it up so that each target can opt into the new behavior. Converting all the targets myself was more than I was able to handle.
Differential Revision: https://reviews.llvm.org/D73749
Significant missing hashing - as per the comment this was only meant to
skip member functions (unspecified, but I think it's legible as member
function declarations, not definitions) but was skipping all named
subprograms (so only hashed child DIEs for member function definitions -
because they didn't have a direct name, but only a name given indirectly
in the DW_AT_specification-referenced DIE)
Summary:
Applying this cleanup:
- MIRBuilder.buildInstr(TargetOpcode::G_ASHR)
- .addDef(Shifted)
- .addUse(Res)
- .addUse(ShiftAmt);
+ MIRBuilder.buildAShr(Shifted, Res, ShiftAmt);
caused an assertion failure here:
llc: /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:404: llvm::MachineInstr *llvm::MachineRegisterInfo::getVRegDef(unsigned int) const: Assertion `(I.atEnd() || std::next(I) == def_instr_end()) && "getVRegDef assumes a single definition or no definition"' failed.
#4 0x00000000050a6d96 in llvm::MachineRegisterInfo::getVRegDef (this=0x74606a0, Reg=2147483650) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:403
#5 0x00000000066148f6 in llvm::getConstantVRegValWithLookThrough (VReg=2147483650, MRI=..., LookThroughInstrs=false, HandleFConstant=true) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:244
#6 0x00000000066147da in llvm::getConstantVRegVal (VReg=2147483650, MRI=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:210
#7 0x0000000006615367 in llvm::ConstantFoldBinOp (Opcode=101, Op1=2147483650, Op2=2147483656, MRI=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:341
#8 0x000000000657eee0 in llvm::CSEMIRBuilder::buildInstr (this=0x7465010, Opc=101, DstOps=..., SrcOps=..., Flag=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp:160
#9 0x0000000003645958 in llvm::MachineIRBuilder::buildAShr (this=0x7465010, Dst=..., Src0=..., Src1=..., Flags=...) at /home/jayfoad2/git/llvm-project/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h:1298
#10 0x00000000065c35b1 in llvm::LegalizerHelper::lower (this=0x7fffffffb5f8, MI=..., TypeIdx=0, Ty=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2020
because at this point there are two instructions defining Res: the
original G_SMULO/G_UMULO and the new G_MUL that we built. The fix is
to modify the original mul in place, so that there is only ever one
definition of Res.
Reviewers: arsenm, aditya_nandakumar
Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72842
This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::INSERT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
Some code gen passes use MBFIWrapper to keep track of the frequency of new
blocks. This was not taken into account and could lead to incorrect frequencies
as MBFI silently returns zero frequency for unknown/new blocks.
Add a variant for MBFIWrapper in the PGSO query interface.
Depends on D73494.
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.
Reviewers: courbet
Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73785
One of the exit criteria of computeKnownBits is whether we reach the max
recursive call depth. Before this patch we would check that the
depth is exactly equal to max depth to exit.
Depth may get bigger than max depth if it gets passed to a different
GISelKnownBits object.
This may happen when say a generic part uses a GISelKnownBits object
with some max depth, but then we hit TL.computeKnownBitsForTargetInstr
which creates a new GISelKnownBits object with a different and smaller
depth. In that situation, when we hit the max depth check for the first
time in the target specific GISelKnownBits object, depth may already
be bigger than the current max depth. Hence we would continue to compute
the known bits, until we ran through the full depth of the chain of
computation or ran out of stack space.
For instance, let say we have
GISelKnownBits Info(/*MaxDepth*/ = 10);
Info.getKnownBits(Foo)
// 9 recursive calls to computeKnownBitsImpl.
// Then we hit a target specific instruction.
// The target specific GISelKnownBits does this:
GISelKnownBits TargetSpecificInfo(/*MaxDepth*/ = 6)
TargetSpecificInfo.computeKnownBitsImpl() // <-- next max depth checks would
// always return false.
This commit does not have any test case, none of the in-tree targets
use computeKnownBitsForTargetInstr.
This patch addresses the issue found in https://bugs.llvm.org/show_bug.cgi?id=44585
where a DW_OP_deref was placed at the end of a dwarf expression, resulting in corrupt
symbols when debugging.
This is an attempt to reland with a few fixes for buildbot since I
haven't merged from master in a bit.
Differential Revision: https://reviews.llvm.org/D73526
We can have geps that have a scalar base pointer, and a vector index value, which
means that the base pointer must be splatted into a vector of pointers.
This fixes crashes on arm64 GlobalISel with optimizations enabled.
- Extends the comments related to function descriptors, noting how they
are only used on AIX.
- Changes the condition used to gate the creation of the current function
symbol in AsmPrinter::SetupMachineFunction to reflect being AIX
specific. The creation of the symbol is different because of AIXs
linkage conventions, not because AIX uses function descriptors.
Differential Revision: https://reviews.llvm.org/D73115
Summary:
For -fpatchable-function-entry=N,0 -mbranch-protection=bti, after
9a24488cb6, we place the NOP sled after
the initial BTI.
```
.Lfunc_begin0:
bti c
nop
nop
.section __patchable_function_entries,"awo",@progbits,f,unique,0
.p2align 3
.xword .Lfunc_begin0
```
This patch adds a label after the initial BTI and changes the __patchable_function_entries entry to reference the label:
```
.Lfunc_begin0:
bti c
.Lpatch0:
nop
nop
.section __patchable_function_entries,"awo",@progbits,f,unique,0
.p2align 3
.xword .Lpatch0
```
This placement is compatible with the resolution in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 .
A local linkage function whose address is not taken does not need a BTI.
Placing the patch label after BTI has the advantage that code does not
need to differentiate whether the function has an initial BTI.
Reviewers: mrutland, nickdesaulniers, nsz, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73680
Commit 9965b12fd1 was supposed to change the offset constant when
lowering load/stores, but only introduced this change for loads. This
patch adds the same fix for stores.
This is passed to legalizeCustom, but not intrinsic. Also remove the
MRI argument, since you can get that from the MachineIRBuilder.
I'm not sure why MachineIRBuilder has a private observer member, and
this is passed separately.
For pow2 constants we should use G_SHL for pattern matching (and perf)
purposes later.
Vector support not yet implemented.
Differential Revision: https://reviews.llvm.org/D73659
For `MC_GlobalAddress` operands referencing **certain** GlobalObjects,
we can lower them to STB_LOCAL aliases to avoid costs brought by
assembler/linker's conservative decisions about symbol interposition:
* An assembler conservatively assumes a global default visibility symbol interposable (ELF
semantics). So relocations in object files are needed even if the code generator assumed
the definition exact and non-interposable.
* The relocations can cause the creation of PLT entries on some targets for -shared links.
A linker conservatively assumes a global default visibility symbol interposable (if not
otherwise constrained by -Bsymbolic/--dynamic-list/VER_NDX_LOCAL/etc).
"certain" refers to GlobalObjects in the intersection of
`hasExactDefinition() and !isInterposable()`: `external`, `appending`, `internal`, `private`.
Local linkages (`internal` and `private`) cannot be interposed. `appending` is for very
few objects LLVM interpret specially. So the set just includes `external`.
This patch emits STB_LOCAL aliases (.Lfoo$local) for such GlobalObjects, so that targets can lower
MC_GlobalAddress operands to STB_LOCAL aliases if applicable.
We may extend the scope and include GlobalAlias in the future.
LLVM's existing -fno-semantic-interposition behaviors give us license to do such optimizations:
* Various optimizations (ipconstprop, inliner, sccp, sroa, etc) treat normal ExternalLinkage
GlobalObjects as non-interposable.
* Before D72197, MC resolved a PC-relative VK_None fixup to a non-local symbol at assembly time (no
outstanding relocation), if the target is defined in the same section. Put it simply, even if IR
optimizations failed to optimize and allowed interposition for the function call in
`void foo() {} void bar() { foo(); }`, the assembler would disallow it.
This patch sets up AsmPrinter infrastructure to make -fno-semantic-interposition more so.
With and without the patch, the object file output should be identical:
`.Lfoo$local` does not take a symbol table entry.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D73228
Summary:
BaseMemOpClusterMutation::apply forms store chains by looking for
control (i.e. non-data) dependencies from one mem op to another.
In the test case, clusterNeighboringMemOps successfully clusters the
loads, and then adds artificial edges to the loads' successors as
described in the comment:
// Copy successor edges from SUa to SUb. Interleaving computation
// dependent on SUa can prevent load combining due to register reuse.
The effect of this is that *data* dependencies from one load to a store
are copied as *artificial* dependencies from a different load to the
same store.
Then when BaseMemOpClusterMutation::apply looks at the stores, it finds
that some of them have a control dependency on a previous load, which
breaks the chains and means that the stores are not all considered part
of the same chain and won't all be clustered.
The fix is to only consider non-artificial control dependencies when
forming chains.
Subscribers: MatzeB, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71717
Add several new helpers to RDA:
- hasLocalDefBefore
- isRegDefinedAfter
- isSafeToDefRegAt
And move two bits of logic from ARMLowOverheadLoops into RDA:
- isSafeToMove
- isSafeToRemove
Both of these have some wrappers too to make them more convienent to
use.
Differential Revision: https://reviews.llvm.org/D73460
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
Symbols created for merged external global variables have default
visibility. This can break programs when compiling with -Oz
-fvisibility=hidden as symbols that should be hidden will be exported at
link time.
Differential Revision: https://reviews.llvm.org/D73235
Summary:
To avoid header file circular dependency issues in passing updated MBFI (in
MBFIWrapper) to the interface of profile guided size optimizations.
A prep step for (and split off of) D73381.
Reviewers: davidxl
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73494
Summary:
This is a follow up on D61634. It adds an LLVM IR intrinsic to allow better implementation of memcpy from C++.
A follow up CL will add the intrinsics in Clang.
Reviewers: courbet, theraven, t.p.northover, jdoerfert, tejohnson
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71710
Summary:
This is mostly NFC. computeForAddSub may give more precise results in
some cases, but that doesn't seem to affect any existing GlobalISel
tests.
Subscribers: rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73431
This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
Differential Revision: This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
This patch fixes an assertion failure in DwarfExpression that is
triggered when a complex fragment has exactly the size of a
subregister of the register the DBG_VALUE points to *and* there is no
DWARF encoding for the super-register.
I took the opportunity to replace/document some magic values with
static constructor functions to make this code less confusing to read.
rdar://problem/58489125
Differential Revision: https://reviews.llvm.org/D72938
This is a revert-of-revert (i.e. this reverts commit 802bec89, which
itself reverted fa4701e1 and 79daafc9) with a fix folded in. The problem
was that call site tags weren't emitted properly when LTO was enabled
along with split-dwarf. This required a minor fix. I've added a reduced
test case in test/DebugInfo/X86/fission-call-site.ll.
Original commit message:
This allows a call site tag in CU A to reference a callee DIE in CU B
without resorting to creating an incomplete duplicate DIE for the callee
inside of CU A.
We already allow cross-CU references of subprogram declarations, so it
doesn't seem like definitions ought to be special.
This improves entry value evaluation and tail call frame synthesis in
the LTO setting. During LTO, it's common for cross-module inlining to
produce a call in some CU A where the callee resides in a different CU,
and there is no declaration subprogram for the callee anywhere. In this
case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram
in order to fill in the call site tag. That empty 'definition' defeats
entry value evaluation etc., because the debugger can't figure out what
it means.
As a follow-up, maybe we could add a DWARF verifier check that a
DW_TAG_subprogram at least has a DW_AT_name attribute.
Update #1:
Reland with a fix to create a declaration DIE when the declaration is
missing from the CU's retainedTypes list. The declaration is left out
of the retainedTypes list in two cases:
1) Re-compiling pre-r266445 bitcode (in which declarations weren't added
to the retainedTypes list), and
2) Doing LTO function importing (which doesn't update the retainedTypes
list).
It's possible to handle (1) and (2) by modifying the retainedTypes list
(in AutoUpgrade, or in the LTO importing logic resp.), but I don't see
an advantage to doing it this way, as it would cause more DWARF to be
emitted compared to creating the declaration DIEs lazily.
Update #2:
Fold in a fix for call site tag emission in the split-dwarf + LTO case.
Tested with a stage2 ThinLTO+RelWithDebInfo build of clang, and with a
ReleaseLTO-g build of the test suite.
rdar://46577651, rdar://57855316, rdar://57840415, rdar://58888440
Differential Revision: https://reviews.llvm.org/D70350
The Version was used only to determine the size of an operand of
DW_OP_call_ref. The size was 4 for all versions apart from 2, but
the DW_OP_call_ref operation was introduced only in DWARF3. Thus,
the code may be simplified and using of Version may be eliminated.
Differential Revision: https://reviews.llvm.org/D73264
Summary:
This fixes PR44118. For cases where we have a chain like this:
R8 = R1 (entry value)
R0 = R8
call @foo R0
the code that emits call site entries using entry values would not
follow that chain, instead emitting a call site entry with R8 as
location rather than R0. Such a case was discovered when originally
adding dbgcall-site-orr-moves.mir. This patch fixes that issue. This is
done by changing the ForwardedRegWorklist set to a map in which the
worklist registers always map to the parameter registers that they
describe.
Another thing this patch fixes is that worklist registers now can
describe more than one parameter register at a time. Such a case
occurred in dbgcall-site-interpretation.mir, resulting in a call site
entry not being emitted for one of the parameters.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: vsk
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D73168
Summary:
Since D70431 the describeLoadedValue() hook takes a parameter register,
meaning that it can now be asked to describe any register. This means
that we can drop the difference between explicit and implicit defines
that we previously had in collectCallSiteParameters().
I have not found any case for any upstream targets where a parameter
register is only implicitly defined, and does not overlap with any
explicit defines. I don't know if such a case would even make sense. So
as far as I have tested, this patch should be a non-functional change.
However, this reduces the complexity of the code a bit, and it will
simplify the implementation of an upcoming patch which solves PR44118.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: djtodoro, vsk
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D73167
G_CTPOP is generated from llvm.ctpop.<type> intrinsics, clang generates
these intrinsics from __builtin_popcount and __builtin_popcountll.
Add lower and narrow scalar for G_CTPOP.
Lower G_CTPOP for MIPS32.
Differential Revision: https://reviews.llvm.org/D73216
llvm.cttz.<type> intrinsic has additional i1 argument is_zero_undef,
it tells whether zero as the first argument produces a defined result.
G_CTTZ is generated from llvm.cttz.<type> (<type> <src>, i1 false)
intrinsics, clang generates these intrinsics from __builtin_ctz and
__builtin_ctzll.
G_CTTZ_ZERO_UNDEF comes from llvm.cttz.<type> (<type> <src>, i1 true).
Clang generates such intrinsics as parts of expansion of builtin_ffs
and builtin_ffsll. It is also traditionally part of and many
algorithms that are now predicated on avoiding zero-value inputs.
Add narrow scalar (algorithm uses G_CTTZ_ZERO_UNDEF) for G_CTTZ.
Lower G_CTTZ and G_CTTZ_ZERO_UNDEF for MIPS32.
Differential Revision: https://reviews.llvm.org/D73215
llvm.ctlz.<type> intrinsic has additional i1 argument is_zero_undef,
it tells whether zero as the first argument produces a defined result.
MIPS clz instruction returns 32 for zero input.
G_CTLZ is generated from llvm.ctlz.<type> (<type> <src>, i1 false)
intrinsics, clang generates these intrinsics from __builtin_clz and
__builtin_clzll.
G_CTLZ_ZERO_UNDEF can also be generated from llvm.ctlz with true as
second argument. It is also traditionally part of and many algorithms
that are now predicated on avoiding zero-value inputs.
Add narrow scalar for G_CTLZ (algorithm uses G_CTLZ_ZERO_UNDEF).
Lower G_CTLZ_ZERO_UNDEF and select G_CTLZ for MIPS32.
Differential Revision: https://reviews.llvm.org/D73214
and macro FUNCTION likewise. NFCI.
Some functions like fmuladd don't really have a node, we should divide
the declaration form those have node to avoid introducing fake nodes.
Differential Revision: https://reviews.llvm.org/D72871
... as well as:
Revert "[DWARF] Defer creating declaration DIEs until we prepare call site info"
This reverts commit fa4701e197.
This reverts commit 79daafc903.
There have been reports of this assert getting hit:
CalleeDIE && "Could not find DIE for call site entry origin
Teach the GISelKnowBits analysis how to deal with PHI operations.
PHIs are essentially COPYs happening on edges, so we can just reuse
the code for COPY.
This is NFC COPY-wise has we leave Depth untouched when calling
computeKnownBitsImpl for COPYs, like it was before this patch.
Increasing Depth is however required for PHIs as they may loop back to
themselves and we would end up in an infinite loop if we were not
increasing Depth.
Differential Revision: https://reviews.llvm.org/D73317
Updated FoldConstantArithmetic method signature to match that of
FoldConstantVectorArithmetic in preparation for merging the two
functions together
https://bugs.llvm.org/show_bug.cgi?id=36544
This is the first step in combining the various
FoldConstantVectorArithmetic and FoldConstantVectorArithmetic
functions into one FoldConstantArithmetic function.
Differential Revision: https://reviews.llvm.org/D72870
Unlike the existing code that I modified here, I only handle the
case where the strict_fsetcc has a single use. Not sure exactly
how to handle multiples uses.
Testing this on X86 is hard because we already have a other
combines that get rid of lowered version of the integer setcc that
this xor will eventually become. So this combine really just
saves a bunch of extra nodes being created. Not sure about other
targets.
Differential Revision: https://reviews.llvm.org/D71816
Scheduler sends NumLoads argument into shouldClusterMemOps()
one less the actual cluster length. So for 2 instructions
it will pass just 1. Correct this number.
This is NFC for in tree targets.
Differential Revision: https://reviews.llvm.org/D73292
Previously LiveDebugValues pass would consider meta instructions that 'fiddle' with liveness of registers as register definitions when transfering register defs. This would mean that, for example, a KILL instruction would cause LiveDebugValues to terminate the range of an earlier DBG_VALUE instruction resulting in the none propogation of said DBG_VALUE instructions into later blocks.
This patch adds the check and a helpful comment, fixes a test that previously tested for the broken behaviour by coincidence and adds a test specifically for this.
reviewers: vsk, dstenb, djtodoro
Differential Revision: https://reviews.llvm.org/D73210
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.
Reviewers: xbolva00, courbet, bollu
Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73099
Similar to the function attribute `prefix` (prefix data),
"patchable-function-prefix" inserts data (M NOPs) before the function
entry label.
-fpatchable-function-entry=2,1 (1 NOP before entry, 1 NOP after entry)
will look like:
```
.type foo,@function
.Ltmp0: # @foo
nop
foo:
.Lfunc_begin0:
# optional `bti c` (AArch64 Branch Target Identification) or
# `endbr64` (Intel Indirect Branch Tracking)
nop
.section __patchable_function_entries,"awo",@progbits,get,unique,0
.p2align 3
.quad .Ltmp0
```
-fpatchable-function-entry=N,0 + -mbranch-protection=bti/-fcf-protection=branch has two reasonable
placements (https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01185.html):
```
(a) (b)
func: func:
.Ltmp0: bti c
bti c .Ltmp0:
nop nop
```
(a) needs no additional code. If the consensus is to go for (b), we will
need more code in AArch64BranchTargets.cpp / X86IndirectBranchTracking.cpp .
Differential Revision: https://reviews.llvm.org/D73070
Match the approach in SimplifyDemandedBits where we calculate the demanded elts and then have a common path for the ComputeKnownBits/ComputeNumSignBits call.
Match the approach in SimplifyDemandedBits/ComputeNumSignBits where we calculate the demanded elts and then have a common path for the ComputeKnownBits call.
Match the approach in SimplifyDemandedBits where we calculate the demanded elts and then have a common path for the ComputeKnownBits/ComputeNumSignBits call, additionally we only ever need original demanded elts of the base vector even if the index is unknown.
The low_pc is analog to the DW_AT_call_return_pc, since it describes
the return address after the call. The DW_AT_call_pc is the address
of the call instruction, and we don't use it at the moment.
Differential Revision: https://reviews.llvm.org/D73173
Summary:
We create a number of standard types of control sections in multiple places for
things like the function descriptors, external references and the TOC anchor
among others, so it is possible for their properties to be defined
inconsistently in different places. This refactor moves their creation and
properties into functions in the TargetLoweringObjectFile class hierarchy, where
functions for retrieving various special types of sections typically seem
to reside.
Note: There is one case in PPCISelLowering which is specific to function entry
points which we don't address since we don't have access to the TLOF there.
Reviewers: DiggerLin, jasonliu, hubert.reinterpretcast
Reviewed By: jasonliu, hubert.reinterpretcast
Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72347
Summary:
Without the BFI update, some hot blocks are incorrectly treated as cold code.
This fixes a FDO perf regression in the TSVC benchmark from D71288.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73146
This patch also fixes up a number of cases in DAGCombine and
SelectionDAGBuilder where the size of a scalable vector is used in a
fixed-width context (thus triggering an assertion failure).
Reviewers: efriedma, c-rhodes, rovka, cameron.mcinally
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71215
The generic BaseMemOpClusterMutation calls into TargetInstrInfo to
analyze the address of each load/store instruction, and again to decide
whether two instructions should be clustered. Previously this had to
represent each address as a single base operand plus a constant byte
offset. This patch extends it to support any number of base operands.
The old target hook getMemOperandWithOffset is now a convenience
function for callers that are only prepared to handle a single base
operand. It calls the new more general target hook
getMemOperandsWithOffset.
The only requirements for the base operands returned by
getMemOperandsWithOffset are:
- they can be sorted by MemOpInfo::Compare, such that clusterable ops
get sorted next to each other, and
- shouldClusterMemOps knows what they mean.
One simple follow-on is to enable clustering of AMDGPU FLAT instructions
with both vaddr and saddr (base register + offset register). I've left
a FIXME in the code for this case.
Differential Revision: https://reviews.llvm.org/D71655
In LLVM IR, vscale can be represented with an intrinsic. For some targets,
this is equivalent to the constexpr:
getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1
This can be used to propagate the value in CodeGenPrepare.
In ISel we add a node that can be legalized to one or more
instructions to materialize the runtime vector length.
This patch also adds SVE CodeGen support for VSCALE, which maps this
node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD]
instructions (scaled multiples of 2, 4, or 8 bytes, respectively).
Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner
Reviewed by: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68203
In GlobalISel we may in some unfortunate circumstances generate PHIs with
operands that are on separate banks. If-conversion doesn't currently check for
that case and ends up generating a CSEL on AArch64 with incorrect register
operands.
Differential Revision: https://reviews.llvm.org/D72961
Summary:
WebAssembly is unique among upstream targets in that it does not at
any point use physical registers to store values. Instead, it uses
virtual registers to model positions in its value stack. This means
that some target-independent lowering activities that would use
physical registers need to use virtual registers instead for
WebAssembly and similar downstream targets. This CL generalizes the
existing `usesPhysRegsForPEI` lowering hook to
`usesPhysRegsForValues` in preparation for using it in more places.
One such place is in InstrEmitter for instructions that have variadic
defs. On register machines, it only makes sense for these defs to be
physical registers, but for WebAssembly they must be virtual registers
like any other values. This CL changes InstrEmitter to check the new
target lowering hook to determine whether variadic defs should be
physical or virtual registers.
These changes are necessary to support a generalized CALL instruction
for WebAssembly that is capable of returning an arbitrary number of
arguments. Fully implementing that instruction will require additional
changes that are described in comments here but left for a follow up
commit.
Reviewers: aheejin, dschuff, qcolombet
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71484
These names have been changed from CamelCase to camelCase, but there were
many places (comments mostly) that still used the old names.
This change is NFC.
This was unconditionally folding this to the source operand, even if the access was out of bounds. Use undef instead of the extract is not the first element.
This helps with some cases where 3-vectors are legalized and avoids processing the 4th component.
Original Patch by: arsenm (Matt Arsenault)
Differential Revision: https://reviews.llvm.org/D51589
Summary:
This was reverted in 328e0f3dca due to
chromium bot failure. This revision addresses that case.
Original commit message:
Summary:
This patch will provide support for auto return type for the C++ member
functions. Before this return type of the member function is deduced and
stored in the DIE.
This patch includes llvm side implementation of this feature.
Patch by: Awanish Pandey <Awanish.Pandey@amd.com>
Reviewers: dblaikie, aprantl, shafik, alok, SouraVX, jini.susan.george
Reviewed by: dblaikie
Differential Revision: https://reviews.llvm.org/D70524
StackColoring::remapInstructions() remaps MachineOperand frame index (e.g. %stack.1 -> %stack.0)
but does not remap FixedStackPseudoSourceValue frame index (e.g. store 4 into %stack.1.ap2.i.i)
referenced by MachineMemoryOperand.
This can cause an assertion failure when LiveDebugValues references a dead stack object.
It is difficult to craft a test case. -g, va_copy and stack-coloring are required.
I can only reproduce it on ppc32.
This intention is to move patchable-function before aarch64-branch-targets
(configured in AArch64PassConfig::addPreEmitPass) so that we emit BTI before NOPs
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424).
This also allows addPreEmitPass() passes to know the precise instruction sizes if they want.
Tried x86-64 Debug/Release builds of ccls with -fxray-instrument -fxray-instruction-threshold=1.
No output difference with this commit and the previous commit.
This makes the SectionLabel handling more resilient - specifically for
future PROPELLER work which will have more CU ranges (rather than just
one per function).
Ultimately it might be nice to make this more general/resilient to
arbitrary labels (rather than relying on the labels being created for CU
ranges & then being reused by ranges, loclists, and possibly other
addresses). It's possible that other (non-rnglist/loclist) uses of
addresses will need the addresses to be in SectionLabels earlier (eg:
move the CU.addRange to be done on function begin, rather than function
end, so during function emission they are already populated for other
use).
This change has 2 components:
Target-independent: add a method getDwarfFrameBase to TargetFrameLowering. It
describes how the Dwarf frame base will be encoded. That can be a register (the
default), the CFA (which replaces NVPTX-specific logic in DwarfCompileUnit), or
a DW_OP_WASM_location descriptr.
WebAssembly: Allow WebAssemblyFunctionInfo::getFrameRegister to return the
correct virtual register instead of FP32/SP32 after WebAssemblyReplacePhysRegs
has run. Make WebAssemblyExplicitLocals store the local it allocates for the
frame register. Use this local information to implement getDwarfFrameBase
The result is that the DW_AT_frame_base attribute is correctly encoded for each
subprogram, and each param and local variable has a correct DW_AT_location that
uses DW_OP_fbreg to refer to the frame base.
This is a reland of rG3a05c3969c18 with fixes for the expensive-checks
and Windows builds
Differential Revision: https://reviews.llvm.org/D71681
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.
- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute
AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).
Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.
Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.
-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.
Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.
This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.
AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
Summary:
Detect a run of memory tagging instructions for adjacent stack frame slots,
and replace them with a shorter instruction sequence
* replace STG + STG with ST2G
* replace STGloop + STGloop with STGloop
This code needs to run when stack slot offsets are already known, but before
FrameIndex operands in STG instructions are eliminated; that's the
reason for the new hook in PrologueEpilogue.
This change modifies STGloop and STZGloop pseudos to take the size as an
immediate integer operand, and adds _untied variants of those pseudos
that are allowed to take the base address as a FI operand. This is needed to
simplify recognizing an STGloop instruction as operating on a stack slot
post-regalloc.
This improves memtag code size by ~0.25%, and it looks like an additional ~0.1%
is possible by rearranging the stack frame such that consecutive STG
instructions reference adjacent slots (patch pending).
Reviewers: pcc, ostannard
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70286
Extend -fxray-instrumentation-bundle to split function-entry and
function-exit into two separate options, so that it is possible to
instrument only function entry or only function exit. For use cases
that only care about one or the other this will save significant overhead
and code size.
Differential Revision: https://reviews.llvm.org/D72890
XRay allows tuning by minimum function size, but also always instruments
functions with loops in them. If the minimum function size is set to a
large value the loop instrumention ends up causing most functions to be
instrumented anyway. This adds a new flag, xray-ignore-loops, to disable
the loop detection logic.
Differential Revision: https://reviews.llvm.org/D72659
[this re-applies c0176916a4
with the correct commit message and phabricator link]
This addresses point 1 of PR44213.
https://bugs.llvm.org/show_bug.cgi?id=44213
The DW_AT_LLVM_sysroot attribute is used for Clang module debug info,
to allow LLDB to import a Clang module from source. Currently it is
part of each DW_TAG_module, however, it is the same for all modules in
a compile unit. It is more efficient and less ambiguous to store it
once in the DW_TAG_compile_unit.
This should have no effect on DWARF consumers other than LLDB.
Differential Revision: https://reviews.llvm.org/D71732
This is a purely cosmetic change that is NFC in terms of the binary
output. I bugs me that I called the attribute DW_AT_LLVM_isysroot
since the "i" is an artifact of GCC command line option syntax
(-isysroot is in the category of -i options) and doesn't carry any
useful information otherwise.
This attribute only appears in Clang module debug info.
Differential Revision: https://reviews.llvm.org/D71722
Introduce a method to walk through use-def chains to decide whether
it's possible to remove a given instruction and its users. These
instructions are then stored in a set until the end of the transform
when they're erased. This is now used to perform checks on the
iteration count (LoopDec chain), element count (VCTP chain) and the
possibly redundant iteration count.
As well as being able to remove chains of instructions, we know also
check that the sub feeding the vctp is producing the expected value.
Differential Revision: https://reviews.llvm.org/D71837
Add DemandedElts handling to ISD::ANY_EXTEND and add missing ISD::ANY_EXTEND_VECTOR_INREG handling. Despite the lack of test changes this code IS being used - its just that the ANY_EXTEND ops are legalized later on (typically to ZERO_EXTEND equivalents) so we typically manage to combine later on.
This change has 2 components:
Target-independent: add a method getDwarfFrameBase to TargetFrameLowering. It
describes how the Dwarf frame base will be encoded. That can be a register (the
default), the CFA (which replaces NVPTX-specific logic in DwarfCompileUnit), or
a DW_OP_WASM_location descriptr.
WebAssembly: Allow WebAssemblyFunctionInfo::getFrameRegister to return the
correct virtual register instead of FP32/SP32 after WebAssemblyReplacePhysRegs
has run. Make WebAssemblyExplicitLocals store the local it allocates for the
frame register. Use this local information to implement getDwarfFrameBase
The result is that the DW_AT_frame_base attribute is correctly encoded for each
subprogram, and each param and local variable has a correct DW_AT_location that
uses DW_OP_fbreg to refer to the frame base.
Differential Revision: https://reviews.llvm.org/D71681
This was assuming the narrow target was the source type. Respect the
requested type when these don't match by using intermediate
merges. This avoids producing very wide, illegal shift expansions.
The algorithm here only works if the sint_to_fp doesn't do any
rounding. Otherwise it can round before the offset fixup is
applied. Add an assert to protect this.
To avoid breaking the one test in tree that tested this code
with a set of types that fail the assert, I've enabled i32->f32
to use the i64->f32 algorithm. This only occurs when f64 isn't
a legal type. If f64 is legal then we do i32->f64->f32 instead.
Differential Revision: https://reviews.llvm.org/D72794
This was dropping the invariant metadata on dead argument loads, so
they weren't deleted.
Atomics still need to be fixed the same way. Also, apparently store
was never preserving dereferencable which should also be fixed.
Recommitting e93e0d413f after reverting due to test failures, which
will hopefully now be fixed. Original commit message:
After expanding the pseudo instructions, update the liveness info.
We do this in a post-order traversal of the loop, including its
exit blocks and preheader(s).
Differential Revision: https://reviews.llvm.org/D72131
Testing compiler-rt, a new assertion failure occurs when building
the GwpAsanTestObjects object. I'm uploading a reproducer to D70597.
This reverts commit 75188b01e9.
If there are DBG_VALUEs between phi and label (after phi and before label),
DBG_VALUE will block PHI lowering after the LABEL. Moving all DBG_VALUEs
after Labels in the function ScheduleDAGSDNodes::EmitSchedule to avoid
impacting PHI lowering.
before:
PHI
DBG_VALUE
LABEL
after: (move DBG_VALUE after label)
PHI
LABEL
DBG_VALUE
then: (phi lowering after label)
LABEL
COPY
DBG_VALUE
Fixes the issue: https://bugs.llvm.org/show_bug.cgi?id=43859
Differential Revision: https://reviews.llvm.org/D70597
This was moved in October 2018, but we don't appear to be using
this for vectors on any in tree target.
Moving it back simplifies D72794 so we can share the code for i32->f32.
When tail duplication estimates a size of tail it uses instruction
count. Account for a number of instrictions in a bundle too.
Differential Revision: https://reviews.llvm.org/D72783
This now develops the same problem G_ZEXT/G_ANYEXT have where the
requested type is assumed to be the source type. This will be fixed
separately by creating intermediate merges.
Summary:
InlineResult is used both in APIs assessing whether a call site is
inlinable (e.g. llvm::isInlineViable) as well as in the function
inlining utility (llvm::InlineFunction). It means slightly different
things (can/should inlining happen, vs did it happen), and the
implicit casting may introduce ambiguity (casting from 'false' in
InlineFunction will default a message about hight costs,
which is incorrect here).
The change renames the type to a more generic name, and disables
implicit constructors.
Reviewers: eraman, davidxl
Reviewed By: davidxl
Subscribers: kerbowa, arsenm, jvesely, nhaehnle, eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72744
This reverts D53469, which changed llvm's DWARF emission to emit
DW_AT_call_return_pc as a function-local offset. Such an encoding is not
compatible with post-link block re-ordering tools and isn't standards-
compliant.
In addition to reverting back to the original DW_AT_call_return_pc
encoding, teach lldb how to fix up DW_AT_call_return_pc when the address
comes from an object file pointed-to by a debug map. While doing this I
noticed that lldb's support for tail calls that cross a DSO/object file
boundary wasn't covered, so I added tests for that. This latter case
exercises the newly added return PC fixup.
The dsymutil changes in this patch were originally included in D49887:
the associated test should be sufficient to test DW_AT_call_return_pc
encoding purely on the llvm side.
Differential Revision: https://reviews.llvm.org/D72489
Bitcast only really applies between scalars and vectors. Implement as
an unmerge and remerge. The test needs to tolerate failure since one
of the unmerges currently fails to legalize.
All the callers of this function will be ScheduleDAGMI from the
MachineScheduler. This allows us to use the extra info available in
ScheduleDAGMI without resorting to awkward casts.
Summary:
- `dead-mi-elimination` assumes MIR in the SSA form and cannot be
arranged after phi elimination or DeSSA. It's enhanced to handle the
dead register definition by skipping use check on it. Once a register
def is `dead`, all its uses, if any, should be `undef`.
- Re-arrange the DIE in RA phase for AMDGPU by placing it directly after
`detect-dead-lanes`.
- Many relevant tests are refined due to different register assignment.
Reviewers: rampitec, qcolombet, sunfish
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72709
This code is untested in tree because the "APFloat::semanticsPrecision(sem) >= SrcVT.getSizeInBits() - 1" check is false for most combinations for int and fp types except maybe i32 and f64. For that you would need i32 to be an illegal type, but f64 to be legal and have custom handling for legalizing the split sint_to_fp. The precision check itself was added in 2010 to fix a double rounding issue in the algorithm that would occur if the sint_to_fp was not able to do the conversion without rounding.
Differential Revision: https://reviews.llvm.org/D72728
Summary:
Mem op clustering adds a weak edge in the DAG between two loads or
stores that should be clustered, but the direction of this edge is
pretty arbitrary (it depends on the sort order of MemOpInfo, which
represents the operands of a load or store). This often means that two
loads or stores will get reordered even if they would naturally have
been scheduled together anyway, which leads to test case churn and goes
against the scheduler's "do no harm" philosophy.
The fix makes sure that the direction of the edge always matches the
original code order of the instructions.
Reviewers: atrick, MatzeB, arsenm, rampitec, t.p.northover
Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72706
SUMMARY:
In this patch we put the global variable in a Csect which's SectionKind is "ReadOnlyWithRel" into Data Section.
Reviewers: hubert.reinterpretcast,jasonliu,Xiangling_L
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D72461
Code in getRoot made the assumption that every node in PendingLoads
must always itself have a dependency on the current DAG root node.
After the changes in 04a8696, it turns out that this assumption no
longer holds true, causing wrong codegen in some cases (e.g. stores
after constrained FP intrinsics might get deleted).
To fix this, we now need to make sure that the TokenFactor created
by getRoot always includes the previous root, if there is no implicit
dependency already present.
The original getControlRoot code already has exactly this check,
so this patch simply reuses that code now for getRoot as well.
This fixes the regression.
NFC if no constrained FP intrinsic is present.
As mentioned by @nikic on rGef5debac4302, we can merge the guaranteed bottom zero bits from the shifted value, and then, if a min shift amount is known, zero out the bottom bits as well.
As mentioned by @nikic on rGef5debac4302 (although that was just about SHL), we can merge the guaranteed top zero bits from the shifted value, and then, if a min shift amount is known, zero out the top bits as well.
SHL tests / handling will be added in a follow up patch.
We're planning to remove the shufflemask operand from ShuffleVectorInst
(D72467); fix GlobalISel so it doesn't depend on that Constant.
The change to prelegalizercombiner-shuffle-vector.mir happens because
the input contains a literal "-1" in the mask (so the parser/verifier
weren't really handling it properly). We now treat it as equivalent to
"undef" in all contexts.
Differential Revision: https://reviews.llvm.org/D72663
Summary:
be15dfa88f broke GlobalISel's usage of getSetCCInverse() which currently
appears to be limited to our out-of-tree backend. GlobalISel doesn't use
EVT's and isn't able to derive them from the information it has as it
doesn't distinguish between integer and floating point types (that
distinction is made by operations rather than values). Bring back the
bool version of getSetCCInverse() in a way that doesn't break the intent
of be15dfa88f but also allows GlobalISel to continue using it.
Reviewers: spatel, bogner, arichardson
Reviewed By: arichardson
Subscribers: rovka, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72309
This patch makes it so that cases where multiple instructions that differ only
in their FrameIndex MachineOperand values no longer collide. For instance:
%1:_(p0) = G_FRAME_INDEX %stack.0
%2:_(p0) = G_FRAME_INDEX %stack.1
Prior to this patch these instructions would collide together.
Differential Revision: https://reviews.llvm.org/D71583
Some target like arm/riscv with soft-float will have compiling crash when using -fno-unsafe-math-optimization option.
This patch will add the missing strict FP support to SoftenFloatRes_XINT_TO_FP.
Differential Revision: https://reviews.llvm.org/D72277
We need to ensure that fpexcept.strict nodes are not optimized away even if
the result is unused. To do that, we need to chain them into the block's
terminator nodes, like already done for PendingExcepts.
This patch adds two new lists of pending chains, PendingConstrainedFP and
PendingConstrainedFPStrict to hold constrained FP intrinsic nodes without
and with fpexcept.strict markers. This allows not only to solve the above
problem, but also to relax chains a bit further by no longer flushing all
FP nodes before a store or other memory access. (They are still flushed
before nodes with other side effects.)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D72341
As detailed in https://blog.regehr.org/archives/1709 we don't make use of the known leading/trailing zeros for shifted values in cases where we don't know the shift amount value.
This patch adds support to SelectionDAG::ComputeKnownBits to use KnownBits::countMinTrailingZeros and countMinLeadingZeros to set the minimum guaranteed leading/trailing known zero bits.
Differential Revision: https://reviews.llvm.org/D72573
Summary:
This patch will provide support for auto return type for the C++ member
functions. Before this return type of the member function is deduced and
stored in the DIE.
This patch includes llvm side implementation of this feature.
Patch by: Awanish Pandey <Awanish.Pandey@amd.com>
Reviewers: dblaikie, aprantl, shafik, alok, SouraVX, jini.susan.george
Reviewed by: dblaikie
Differential Revision: https://reviews.llvm.org/D70524
.section name, "flags"G, @type, GroupName[, linkage]
As of binutils 2.33, linkage cannot be 'unique'. For integrated
assembler, we use both 'o' flag and 'unique' linkage to support
--gc-sections and COMDAT with lld.
https://sourceware.org/ml/binutils/2019-11/msg00266.html
Current implementation of BaseMemOpsClusterMutation is a little bit
obscure. This patch directly uses a map from store chain ID to set of
memory instrs to make it simpler, so that future improvements are easier
to read, update and review.
Reviewed By: evandro
Differential Revision: https://reviews.llvm.org/D72070
Custom legalization can produce MERGE_VALUES to return multiple
results. We can expand them immediately instead of leaving them
around for DAG combine to clean up.
Some of the simplest handlers just call TLI and if that fails,
they fall back to unrolling. For those just inline the TLI call
and share the unrolling call with the default case of Expand.
For ExpandFSUB and ExpandBITREVERSE so that its obvious they
don't return results sometimes and want to defer to LegalizeDAG.
Summary:
This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare.
This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code.
Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn
Reviewed By: efriedma
Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72536
This reverts the AMDGPU DAG mutation implemented in D72487 and gives
a more general way of adjusting BUNDLE operand latency.
It also replaces FixBundleLatencyMutation with adjustSchedDependency
callback in the AMDGPU, fixing not only successor latencies but
predecessors' as well.
Differential Revision: https://reviews.llvm.org/D72535
ONE is currently softened to OGT | OLT. But the libcalls for OGT and OLT libcalls will trigger an exception for QNAN. At least for X86 with libgcc. UEQ on the other hand uses UO | OEQ. The UO and OEQ libcalls will not trigger an exception for QNAN.
This patch changes ONE to use the inverse of the UEQ lowering. So we now produce O & UNE. Technically the existing behavior was correct for a signalling ONE, but since I don't know how to generate one of those from clang that seemed like something we can deal with later as we would need to fix other predicates as well. Also removing spurious exceptions seemed better than missing an exception.
There are also problems with quiet OGT/OLT/OLE/OGE, but those are harder to fix.
Differential Revision: https://reviews.llvm.org/D72477
This system wasn't very well designed for multi-result nodes. As
a consequence they weren't consistently registered in the
LegalizedNodes map leading to nodes being revisited for different
results.
I've removed the "Result" variable from the main LegalizeOp method
and used a SDNode* instead. The result number from the incoming
Op SDValue is only used for deciding which result to return to the
caller. When LegalizeOp is called it should always register a
legalized result for all of its results. Future calls for any other
result should be pulled for the LegalizedNodes map.
Legal nodes will now register all of their results in the map
instead of just the one we were called for.
The Expand and Promote handling to use a vector of results similar
to LegalizeDAG. Each of the new results is then re-legalized and
logged in the LegalizedNodes map for all of the Results for the
node being legalized. None of the handles register their own
results now. And none call ReplaceAllUsesOfValueWith now.
Custom handling now always passes result number 0 to LowerOperation.
This matches what LegalizeDAG does. Since the introduction of
STRICT nodes, I've encountered several issues with X86's custom
handling being called with an SDValue pointing at the chain and
our custom handlers using that to get a VT instead of result 0.
This should prevent us from having any more of those issues. On
return we will update the LegalizedNodes map for all results so
we shouldn't call the custom handler again for each result number.
I want to push SDNode* further into the Expand and Promote
handlers, but I've left that for a follow to keep this patch size
down. I've created a dummy SDValue(Node, 0) to keep the handlers
working.
Differential Revision: https://reviews.llvm.org/D72224
The Linux kernel uses -fpatchable-function-entry to implement DYNAMIC_FTRACE_WITH_REGS
for arm64 and parisc. GCC 8 implemented
-fpatchable-function-entry, which can be seen as a generalized form of
-mnop-mcount. The N,M form (function entry points before the Mth NOP) is
currently only used by parisc.
This patch adds N,0 support to AArch64 codegen. N is represented as the
function attribute "patchable-function-entry". We will use a different
function attribute for M, if we decide to implement it.
The patch reuses the existing patchable-function pass, and
TargetOpcode::PATCHABLE_FUNCTION_ENTER which is currently used by XRay.
When the integrated assembler is used, __patchable_function_entries will
be created for each text section with the SHF_LINK_ORDER flag to prevent
--gc-sections (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93197) and
COMDAT (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93195) issues.
Retrospectively, __patchable_function_entries should use a PC-relative
relocation type to avoid the SHF_WRITE flag and dynamic relocations.
"patchable-function-entry"'s interaction with Branch Target
Identification is still unclear (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 for GCC discussions).
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D72215
In D71841 we inverted the sense of the SDNode-level flag to ensure all nodes
default to potentially raising FP exceptions unless otherwise specified --
i.e. if we forget to propagate the flag somewhere, the effect is now only
lost performance, not incorrect code.
However, the related flag at the MI level still defaults to nodes not raising
FP exceptions unless otherwise specified. To be fully on the (conservatively)
safe side, we should invert that flag as well.
This patch does so by replacing MIFlag::FPExcept with MIFlag::NoFPExcept.
(Note that this does also introduce an incompatible change in the MIR format.)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D72466
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
As an intermediate step, some TLI functions can be converted to using
LLT instead of MVT. Move this somewhere out of GlobalISel so DAG
functions can use these.
If we're doing a compare that only tests the sign bit and only the sign bit is demanded, we can just bypass the node. This removes one of the blend dependencies in our v2i64->v2f32 uint_to_fp codegen on pre-sse4.2 targets.
Differential Revision: https://reviews.llvm.org/D72356
If we are extracting a chunk of a vector that's a fraction of an
operand of the concatenated vector operand, we can extract directly
from one of those original operands.
This is another suggestion from PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024#c2
But I'm not sure yet if it will make any difference on those patterns.
It seems to help a few existing AVX512 tests though.
Differential Revision: https://reviews.llvm.org/D72361
After expanding the pseudo instructions, update the liveness info.
We do this in a post-order traversal of the loop, including its
exit blocks and preheader(s).
Differential Revision: https://reviews.llvm.org/D72131
This is a positive combination as long as the NEG is NOT free,
as we are reducing the number of NEG from two to one.
Differential Revision: https://reviews.llvm.org/D72312
Summary:
Added MIRFormatter for target specific MIR formating and parsing with
immediate and custom pseudo source values. Target machine can subclass
MIRFormatter and implement custom logic for printing and parsing
immediate and custom pseudo source values for better readability.
* Target specific immediate mnemonic need to start with "." follows by
identifier string. When MIR parser sees immediate it will call target
specific parsing function.
* Custom pseudo source value need to start with custom follows by
double-quoted string. MIR parser will pass the quoted string to target
specific PSV parsing function.
* MIRFormatter have 2 helper functions to facilitate LLVM value printing
and parsing for custom PSV if they refers LLVM values.
Patch by Peng Guo
Reviewers: dsanders, arsenm
Reviewed By: dsanders
Subscribers: wdng, jvesely, nhaehnle, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69836
Summary:
Added MIRFormatter for target specific MIR formating and parsing with
immediate and custom pseudo source values. Target machine can subclass
MIRFormatter and implement custom logic for printing and parsing
immediate and custom pseudo source values for better readability.
* Target specific immediate mnemonic need to start with "." follows by
identifier string. When MIR parser sees immediate it will call target
specific parsing function.
* Custom pseudo source value need to start with custom follows by
double-quoted string. MIR parser will pass the quoted string to target
specific PSV parsing function.
* MIRFormatter have 2 helper functions to facilitate LLVM value printing
and parsing for custom PSV if they refers LLVM values.
Reviewers: dsanders, arsenm
Reviewed By: dsanders
Subscribers: wdng, jvesely, nhaehnle, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69836
MachineVerifier::visitMachineFunctionAfter() is extended to check the
live-through case for live-in lists. This is only done for registers without
aliases and that are neither allocatable or reserved, such as the SystemZ::CC
register.
The MachineVerifier earlier only catched the case of a live-in use without an
entry in the live-in list (as "using an undefined physical register").
A comment in LivePhysRegs.h has been added stating a guarantee that
addLiveOuts() can be trusted for a full register both before and after
register allocation.
Review: Quentin Colombet
Differential Revision: https://reviews.llvm.org/D68267
Summary:
Detect a run of memory tagging instructions for adjacent stack frame slots,
and replace them with a shorter instruction sequence
* replace STG + STG with ST2G
* replace STGloop + STGloop with STGloop
This code needs to run when stack slot offsets are already known, but before
FrameIndex operands in STG instructions are eliminated; that's the
reason for the new hook in PrologueEpilogue.
This change modifies STGloop and STZGloop pseudos to take the size as an
immediate integer operand, and base address as a FI operand when
possible. This is needed to simplify recognizing an STGloop instruction
as operating on a stack slot post-regalloc.
This improves memtag code size by ~0.25%, and it looks like an additional ~0.1%
is possible by rearranging the stack frame such that consecutive STG
instructions reference adjacent slots (patch pending).
Reviewers: pcc, ostannard
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70286
Summary:
This patch adds intrinsics and ISelDAG nodes for
signed and unsigned fixed-point division:
llvm.sdiv.fix.*
llvm.udiv.fix.*
These intrinsics perform scaled division on two
integers or vectors of integers. They are required
for the implementation of the Embedded-C fixed-point
arithmetic in Clang.
Patch by: ebevhan
Reviewers: bjope, leonardchan, efriedma, craig.topper
Reviewed By: craig.topper
Subscribers: Ka-Ka, ilya, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70007
This patch moves `InPQueue` into function arguments instead of template
arguments of `releaseNode`, which is a cleaner approach.
Differential Revision: https://reviews.llvm.org/D72125
Summary:
This patch relands D71271. The problem with D71271 is that it has cyclic dependency:
CodeGen->AsmPrinter->DebugInfoDWARF->CodeGen. To avoid cyclic dependency this patch
puts implementation for DWARFOptimizer into separate library: lib/DWARFLinker.
Thus the difference between this patch and D71271 is in that DWARFOptimizer renamed into
DWARFLinker and it`s files are put into lib/DWARFLinker.
Reviewers: JDevlieghere, friss, dblaikie, aprantl
Reviewed By: JDevlieghere
Subscribers: thegameg, merge_guards_bot, probinson, mgorny, hiraditya, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D71839
Summary:
Remove the restrictions that preventing "asm goto" from returning non-void
values. The values returned by "asm goto" are only valid on the "fallthrough"
path.
Reviewers: jyknight, nickdesaulniers, hfinkel
Reviewed By: jyknight, nickdesaulniers
Subscribers: rsmith, hiraditya, llvm-commits, cfe-commits, craig.topper, rnk
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69876
Conservatively always save + restore LR in noreturn functions.
These functions do not end in a RET, and so they aren't guaranteed to have an
instruction which uses LR in any way. So, as a result, you can end up in
unfortunate situations where you can't backtrace out of these functions in a
debugger.
Remove the old noreturn test, and add a new one which is more descriptive.
Remove the restriction that we can't outline from noreturn functions as well
since we now do the right thing.
SUMMARY:
In this patch, we map mergeable const objects to the read-only section in the same manner as const objects that are not mergeable.
Reviewers: hubert.reinterpretcast,jasonliu
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D71551
Remove the chance of non-deterministic insertion of zexts of the
sources by using a SetVector instead of SmallPtrSet. Do the same for
sinks for consistency and to negate the small issue from possibly
happening. The SafeWrap instructions are now also stored in a
SmallVector. The IRPromoter members of these structures have been
changed to references.
Differential Revision: https://reviews.llvm.org/D72322
This is possibly a small part towards solving PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024
The vectorizer is creating shuffles of concat like this:
%63 = shufflevector <4 x i64> %x, <4 x i64> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
%64 = shufflevector <8 x i64> %63, <8 x i64> undef, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
That might be fixable in the vectorizers, but we're not allowed to fold that into a single shuffle in instcombine,
so we should have a backend backstop to convert that into the likely simpler form:
%64 = shufflevector <4 x i64> %x, <4 x i64> undef, <8 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3>
Differential Revision: https://reviews.llvm.org/D72300
This is a recommit of D71330, but with a few things fixed and changed:
1) ReachingDefAnalysis: this was not running with optnone as it was checking
skipFunction(), which other analysis passes don't do. I guess this is a
copy-paste from a codegen pass.
2) VPTBlockPass: here I've added skipFunction(), because like most/all
optimisations, we don't want to run this with optnone.
This fixes the issues with the initial/previous commit: the VPTBlockPass was
running with optnone, but ReachingDefAnalysis wasn't, and so VPTBlockPass was
crashing querying ReachingDefAnalysis.
I've added test case mve-vpt-block-optnone.mir to check that we don't run
VPTBlock with optnone.
Differential Revision: https://reviews.llvm.org/D71470
Summary:
It's not necessary to use an 'l'(ell) modifier when referencing a label.
Treat block addresses and MBB references as if the modifier is used
anyway. This prevents us from generating references to ficticious
labels.
Reviewers: jyknight, nickdesaulniers, hfinkel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71849
A random set of attributes are implemented by llc/opt forcing the
string attributes on the IR functions before processing anything. This
would not happen for MIR functions, which have not yet been created at
this point.
Use a callback in the MIR parser, purely to avoid dealing with the
ugliness that the command line flags are in a .inc file, and would
require allowing access to these flags from multiple places (either
from the MIR parser directly, or a new utility pass to implement these
flags). It would probably be better to cleanup the flag handling into
a separate library.
This is in preparation for treating more command line flags with a
corresponding function attribute in a more uniform way. The fast math
flags in particular have a messy system where the command line flag
sets the behavior from a function attribute if present, and otherwise
the command line flag. This means if any other pass tries to inspect
the function attributes directly, it will be inconsistent with the
intended behavior. This is also inconsistent with the current behavior
of -mcpu and -mattr, which overwrites any pre-existing function
attributes. I would like to move this to consistenly have the command
line flags not overwrite any pre-existing attributes, and to always
ensure the command line flags are consistent with the function
attributes.
This patch adds widening which really just scalarizes because we don't have a strategy for the extra elements we would need to pad with.
Differential Revision: https://reviews.llvm.org/D72193
This isn't a functonal change since we also check the bit width is the
same and the input type is integer. This guarantees the input and
output type are the same. But passing the input type makes the code
more readable.
This is the DAG node for SIGN_EXTEND_INREG :
t21: v4i32 = sign_extend_inreg t18, ValueType:ch:v4i16
It has two operands. The first one is the value it want to extend, and the second
one is the type to specify how to extend the value. For this example, it means
that, it is signed extend the t18(v4i32) from v4i16 to v4i32. That is
the semantics of c code:
vector int foo(vector int m) {
return m << 16 >> 16;
}
And it could be any vector type that hardware support the operation, though
the type 'v4i16' is NOT legal for the target. When we are trying to combine
the srl + sra, what we did now is calling the TLI.isOperationLegal(), which
will also check the legality of the type. That doesn't make sense.
Differential Revision: https://reviews.llvm.org/D70230
The code here isn't great in all caess. Particularly v4f64->v4i32
on 64-bit AVX targets. But there is some improvement in some
configurations.
There's definitely some issues with computeNumSignBits with
X86ISD::STRICT_FCMP. As well as not being able to propagate sign
bits through merge_values nodes that get created during custom
legalization.
Previously, for vectors we created a vselect with a condition that
didn't match what the target wanted according to getSetCCResultType.
To make up for this, X86 had a special DAG combine to detect if
the condition was all sign bits and then insert its own truncate
or extend. By adding the extend/truncate here explicitly we can
avoid that.
ExpandStrictFPOp calls ExpandUINT_TO_FLOAT. Previously, ExpandUINT_TO_FLOAT
returned SDValue() if it wasn't able to handle and needed to unroll.
Then ExpandStrictFPOp would detect his SDValue() and do the unroll.
After this change, ExpandUINT_TO_FLOAT will directly call
UnrollStrictFPOp and return the unrolled result.
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.
Reviewers: sanjoy.google, efriedma, reames
Reviewed By: sanjoy.google
Differential Revision: https://reviews.llvm.org/D71537
This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.
In particular this helps remove some unnecessary scalar->vector->scalar patterns.
The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.
Reapplied after reversion at rL368660 due to PR42982 which was fixed at rGca7fdd41bda0.
Differential Revision: https://reviews.llvm.org/D65887
AMDGPU can't unambiguously go back from the selected instruction
register class to the register bank without knowing if this was used
in a boolean context.
UpdateNodeOperands might CSE to another existing node. So we should make sure we're legalizing that node otherwise we might fail to hook up the operands properly. I've moved the result registration up to the caller to avoid having to pass both Result and Op into the functions where it might be confusing which is which.
This address 2 other issues pointed out in D71861.
Differential Revision: https://reviews.llvm.org/D72021
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).
There's no major functionality change, except for targets that never
implemented this check.
This LLVM attribute was originally added in
d9699bc7bd (2015).
Reviewers: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D72118
The fold 'A - (A & (B - 1))' -> 'A & (0 - B)'
added in 8dab0a4a7d
is too specific. It should/can just be 'A - (A & B)' -> 'A & (~B)'
Even if we don't manage to fold `~` into B,
we have likely formed `ANDN` node.
Also, this way there's less similar-but-duplicate folds.
Name: X - (X & Y) -> X & (~Y)
%o = and i32 %X, %Y
%r = sub i32 %X, %o
=>
%n = xor i32 %Y, -1
%r = and i32 %X, %n
https://rise4fun.com/Alive/kOUl
See
https://bugs.llvm.org/show_bug.cgi?id=44448https://reviews.llvm.org/D71499
The fold 'A - (A & (B - 1))' -> 'A & (0 - B)'
added in 8dab0a4a7d
is too specific. It should just be 'A - (A & B)' -> 'A & (~B)',
but we currently fail to sink that '~' into `(B - 1)`.
Name: ~(X - 1) -> (0 - X)
%o = add i32 %X, -1
%r = xor i32 %o, -1
=>
%r = sub i32 0, %X
https://rise4fun.com/Alive/rjU
While we do manage to fold integer-typed IR in middle-end,
we can't do that for the main motivational case of pointers.
There is @llvm.ptrmask() intrinsic which may or may not be helpful,
but i'm not sure it is fully considered canonical yet,
not everything is fully aware of it likely.
Name: PR44448 ptr - (ptr & C) -> ptr & (~C)
%bias = and i32 %ptr, C
%r = sub i32 %ptr, %bias
=>
%r = and i32 %ptr, ~C
See
https://bugs.llvm.org/show_bug.cgi?id=44448https://reviews.llvm.org/D71499
While we do manage to fold integer-typed IR in middle-end,
we can't do that for the main motivational case of pointers.
There is @llvm.ptrmask() intrinsic which may or may not be helpful,
but i'm not sure it is fully considered canonical yet,
not everything is fully aware of it likely.
https://rise4fun.com/Alive/ZVdp
Name: ptr - (ptr & (alignment-1)) -> ptr & (0 - alignment)
%mask = add i64 %alignment, -1
%bias = and i64 %ptr, %mask
%r = sub i64 %ptr, %bias
=>
%highbitmask = sub i64 0, %alignment
%r = and i64 %ptr, %highbitmask
See
https://bugs.llvm.org/show_bug.cgi?id=44448https://reviews.llvm.org/D71499
For now, we didn't set the default operation action for SIGN_EXTEND_INREG for
vector type, which is 0 by default, that is legal. However, most target didn't
have native instructions to support this opcode. It should be set as expand by
default, as what we did for ANY_EXTEND_VECTOR_INREG.
Differential Revision: https://reviews.llvm.org/D70000
The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all
other such flags. This is a problem, because it implies that all code that
transforms SDNodes without copying flags can introduce a correctness bug,
not just a missed optimization.
This patch changes the default to false. This makes it necessary to move
setting the (No)FPExcept flag for constrained intrinsics from the
visitConstrainedIntrinsic routine to the generic visit routine at the
place where the other flags are set, or else the intersectFlagsWith
call would erase the NoFPExcept flag again.
In order to avoid making non-strict FP code worse, whenever
SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes
none of which can raise FP exceptions, it will preserve this property
on all results nodes generated, by setting the NoFPExcept flag on
those result nodes that would otherwise be considered as raising
an FP exception.
To check whether or not an SD node should be considered as raising
an FP exception, the following logic applies:
- For machine nodes, check the mayRaiseFPException property of
the underlying MI instruction
- For regular nodes, check isStrictFPOpcode
- For target nodes, check a newly introduced isTargetStrictFPOpcode
The latter is implemented by reserving a range of target opcodes,
similarly to how memory opcodes are identified. (Note that there a
bit of a quirk in identifying target nodes that are both memory nodes
and strict FP nodes. To simplify the logic, right now all target memory
nodes are automatically also considered strict FP nodes -- this could
be fixed by adding one more range.)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71841
resize only writes to elements that get added. Any elements that
already existed maintain their previous value. In this case we're
trying to erase cached information so we should use assign which
will write to every element.
Found while trying to add new tests to an existing X86 test and
noticed register allocation changing in other functions.
The 'SchedBoundary::releaseNode' is merely invoked for releasing the Top/Bottom root nodes.
However, 'SchedBoundary::releasePending' uses its same logic to check if the Pending queue
has any releasable SUnit.
It is possible to slightly modify the body of the two, allowing re-use of the former ('releaseNode')
in the latter.
Patch by Lorenzo Casalino <lorenzo.casalino93@gmail.com>
Reviewers: MatzeB, fhahn, atrick
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D65506
This was increasing the number of instructions when fsub was legalized
on AMDGPU with no signed zeros enabled. This fold should be guarded by
hasOneUse, and I don't think getNode should be doing that. The same
fold is already done as a regular combine through isNegatibleForFree.
This does require duplicating, even though isNegatibleForFree does
this combine already (and properly checks hasOneUse) to avoid one PPC
regression. In the regression, the outer fneg has nsz but the fsub
operand does not. isNegatibleForFree only sees the operand, and
doesn't see it's used from a nsz context. A nsz parameter needs to be
added and threaded through isNegatibleForFree to avoid this.
These operations are needed as building blocks for promoting so they
can't be promoted themselves.
This appeared to work because the fp_extend query type for operation
actions is the result type, not the input type so it never triggered
in the legalizer.
For fp_round, the vector op legalizer just ended up creating a
nop fp_extend that was elided by getNode, followed by a nop
fp_round that was also elided by getNode. This was followed by
a final fp_round from v4f32 back to vf416 which was CSEd to the
original node. Then legalize vector ops just believed that node
legalized to itself. LegalizeDAG took another crack at promoting
it, but didn't have a handler so just skipped it with a debug
message saying it wasn't promoted.
This patch just removes the operation actions to avoid this
non-sense. Found while trying to refactor LegalizeVectorOps to
handle multiple result nodes better.
This allows us to clean up some places that were peeking through
the MERGE_VALUES node after the call. By returning the SDValues
directly, we can clean that up.
Unfortunately, there are several call sites in AMDGPU that wanted
the MERGE_VALUES and now need to create their own.
D56351 (included in LLVM 8.0.0) introduced "frame-pointer". All tests
which use "no-frame-pointer-elim" or "no-frame-pointer-elim-non-leaf"
have been migrated to use "frame-pointer".
Implement UpgradeFramePointerAttributes to upgrade the two obsoleted
function attributes for bitcode. Their semantics are ignored.
Differential Revision: https://reviews.llvm.org/D71863
G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics,
clang genrates these intrinsics from __builtin_bitreverse32 and
__builtin_bitreverse64.
Add lower and narrowscalar for G_BITREVERSE.
Lower G_BITREVERSE on MIPS32.
Recommit notes:
Introduce temporary variables in order to make sure
instructions get inserted into MachineFunction in same order
regardless of compiler used to build llvm.
Differential Revision: https://reviews.llvm.org/D71363
G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics,
clang genrates these intrinsics from __builtin_bitreverse32 and
__builtin_bitreverse64.
Add lower and narrowscalar for G_BITREVERSE.
Lower G_BITREVERSE on MIPS32.
Differential Revision: https://reviews.llvm.org/D71363
G_BSWAP is generated from llvm.bswap.<type> intrinsics, clang genrates
these intrinsics from __builtin_bswap32 and __builtin_bswap64.
Add lower and narrowscalar for G_BSWAP.
Lower G_BSWAP on MIPS32, select G_BSWAP on MIPS32 revision 2 and later.
Differential Revision: https://reviews.llvm.org/D71362
This allows us to delete InlineAsm::Constraint_i workarounds in
SelectionDAGISel::SelectInlineAsmMemoryOperand overrides and
TargetLowering::getInlineAsmMemConstraint overrides.
They were introduced to X86 in r237517 to prevent crashes for
constraints like "=*imr". They were later copied to other targets.
The early tail duplicator pass introduces new ones, so a MIR test that
infers no phis since there were none on the input would fail the
verifier after running.
SelectionDAG::transferDbgValues() can 'reattach' SDDbgValue from one to
another node, but doesn't change its source order. If the destination node has
the order greater than the SDDbgValue, there are two possible issues
revealed later:
* If debug info is attached to an instruction that is the first definition
of a register, this ends up with a def-after-use and the debug info
gets 'undef' later.
* If MIR has another definition of a register above the debug info,
the debug info may represent a source variable incorrectly because
it appears (significantly) before an instruction corresponded
to this debug info.
So, the patch changes the order of an SDDbgValue when it is moved
to a node with greater order.
Reviewers: dblaikie, jmorse, aprantl
Reviewed By: aprantl
Subscribers: aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71175
Having TypeSize as a static class variable was causing problems
with multi-threading. Several static functions have now been
converted into methods of TypePromotion and a few other members
of TypePromotion and IRPromoter have been added or removed.
Differential Revision: https://reviews.llvm.org/D71832
Fix several several additional problems with the int <-> FP conversion
logic both in common code and in the X86 target. In particular:
- The STRICT_FP_TO_UINT expansion emits a floating-point compare. This
compare can raise exceptions and therefore needs to be a strict compare.
I've made it signaling (even though quiet would also be correct) as
signaling is the more usual default for an LT. This code exists both
in common code and in the X86 target.
- The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode:
it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one
of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP
that ends up not chosen. I've fixed the algorithm to use only a single
STRICT_SINT_TO_FP instead.
- The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do
the wrong thing because it calls getOperationAction using the result VT.
But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to
be called using the operand VT.
- Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate
STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily.
Reviewed by: craig.topper
Differential Revision: https://reviews.llvm.org/D71840
This moves the X86 specific transform from rL364407
into DAGCombiner to generically handle 'little to big' cases
(for example: extract_subvector(v2i64 bitcast(v16i8))). This
allows us to remove both the x86 implementation and the aarch64
bitcast(extract_subvector(bitcast())) combine.
Earlier patches that dealt with regressions initially exposed
by this patch:
rG5e5e99c041e4
rG0b38af89e2c0
Patch by: @RKSimon (Simon Pilgrim)
Differential Revision: https://reviews.llvm.org/D63815
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).
Improve the classifyGlobalFunctionReference method to set
MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in
AArch64TargetLowering::LowerCall to use the return value from
classifyGlobalFunctionReference for these cases.
Add code in both AArch64FastISel and GlobalISel/IRTranslator to
bail out for function calls to extern weak functions on windows,
to let SelectionDAG handle them.
This matches what was done for X86 in 6bf108d77a.
Differential Revision: https://reviews.llvm.org/D71721
Summary:
Without this check unnecessary FMA instructions are generated when the FSUB terms are reused.
This also has the side-effect that the same value is computed to different levels of precision, which can create undesirable effects if the results are used together in subsequent computation.
Reviewers: arsenm, nhaehnle, foad, tpr, dstuttard, spatel
Reviewed By: arsenm
Subscribers: jvesely, wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71656
Summary:
We noticed in Julia that the sequence below no longer turned into
a sequence of FMA instructions in LLVM 7+, but it did in LLVM 6.
```
%29 = fmul contract <4 x double> %wide.load, %wide.load16
%30 = fmul contract <4 x double> %wide.load13, %wide.load17
%31 = fmul contract <4 x double> %wide.load14, %wide.load18
%32 = fmul contract <4 x double> %wide.load15, %wide.load19
%33 = fadd fast <4 x double> %vec.phi, %29
%34 = fadd fast <4 x double> %vec.phi10, %30
%35 = fadd fast <4 x double> %vec.phi11, %31
%36 = fadd fast <4 x double> %vec.phi12, %32
```
Unlike Clang, Julia doesn't set the `unsafe-fp-math=true` function
attribute, but rather emits more local instruction flags.
This partially undoes https://reviews.llvm.org/D46854 and if required I can try to minimize the test further.
Reviewers: spatel, mcberg2017
Reviewed By: spatel
Subscribers: chriselrod, merge_guards_bot, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71495
This reverts commit ee7579409b.
It causes crashes during ThinLTO. I suspect the issue is related to
races on the global TypeSize variable, which is 80 at the time of the
crash.
Summary:
This is documented as the appropriate template modifier for call operands.
Fixes PR44272, and adds a regression test.
Also adds support for operand modifiers in Intel-style inline assembly.
Reviewers: rnk
Reviewed By: rnk
Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71677
Having this function be recursive could use up way too much stack space.
Rewrite it as an iterative traversal in the tree instead to prevent this.
Fixes PR44344.
It isn't necessary to create DIEs for all of the declaration subprograms
in a CU's retainedTypes list. We can defer creating these subprograms
until we need to prepare a call site tag that refers to one.
This cleanup was mentioned in passing in D70350.
This allows a call site tag in CU A to reference a callee DIE in CU B
without resorting to creating an incomplete duplicate DIE for the callee
inside of CU A.
We already allow cross-CU references of subprogram declarations, so it
doesn't seem like definitions ought to be special.
This improves entry value evaluation and tail call frame synthesis in
the LTO setting. During LTO, it's common for cross-module inlining to
produce a call in some CU A where the callee resides in a different CU,
and there is no declaration subprogram for the callee anywhere. In this
case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram
in order to fill in the call site tag. That empty 'definition' defeats
entry value evaluation etc., because the debugger can't figure out what
it means.
As a follow-up, maybe we could add a DWARF verifier check that a
DW_TAG_subprogram at least has a DW_AT_name attribute.
Update:
Reland with a fix to create a declaration DIE when the declaration is
missing from the CU's retainedTypes list. The declaration is left out
of the retainedTypes list in two cases:
1) Re-compiling pre-r266445 bitcode (in which declarations weren't added
to the retainedTypes list), and
2) Doing LTO function importing (which doesn't update the retainedTypes
list).
It's possible to handle (1) and (2) by modifying the retainedTypes list
(in AutoUpgrade, or in the LTO importing logic resp.), but I don't see
an advantage to doing it this way, as it would cause more DWARF to be
emitted compared to creating the declaration DIEs lazily.
Tested with a stage2 ThinLTO+RelWithDebInfo build of clang, and with a
ReleaseLTO-g build of the test suite.
rdar://46577651, rdar://57855316, rdar://57840415
Differential Revision: https://reviews.llvm.org/D70350
Extends DWARF expression language to express locals/globals locations. (via
target-index operands atm) (possible variants are: non-virtual registers
or address spaces)
The WebAssemblyExplicitLocals can replace virtual registers to targertindex
operand type at the time when WebAssembly backend introduces
{get,set,tee}_local instead of corresponding virtual registers.
Reviewed By: aprantl, dschuff
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D52634
This is a purely cosmetic change that is NFC in terms of the binary
output. I bugs me that I called the attribute DW_AT_LLVM_isysroot
since the "i" is an artifact of GCC command line option syntax
(-isysroot is in the category of -i options) and doesn't carry any
useful information otherwise.
This attribute only appears in Clang module debug info.
Differential Revision: https://reviews.llvm.org/D71722
The calculator was considering instructions such as KILLs as clobbers
of a physical address. This is wrong as meta instructions such as KILLs
produce no output in the final program and thus don't clobber or change
any physical location's value. As a result they're safe to ignore whilst
calculating location list ranges.
reviewers: aprantl, vsk
diff revision: https://reviews.llvm.org/D70497
fixes: https://bugs.llvm.org/show_bug.cgi?id=38753
1) Fix an issue with the incorrect value being used for the number of
elements being passed to [d|w]lstp. We were trying to check that
the value was available at LoopStart, but this doesn't consider
that the last instruction in the block could also define the
register. Two helpers have been added to RDA for this.
2) Insert some code to now try to move the element count def or the
insertion point so that we can perform more tail predication.
3) Related to (1), the same off-by-one could prevent us from
generating a low-overhead loop when a mov lr could have been
the last instruction in the block.
4) Fix up some instruction attributes so that not all the
low-overhead loop instructions are labelled as branches and
terminators - as this is not true for dls/dlstp.
Differential Revision: https://reviews.llvm.org/D71609
Recommit after making the same API change in non-x86 targets. This has been build for all targets, and tested for effected ones. Why the difference? Because my disk filled up when I tried make check for all.
For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
This is in advance of assembler padding directives support where we'll need to bundle the label w/the corresponding faulting instruction to avoid padding being inserted between.
Since the address pool doesn't get populated in this case (due to the
lack of inlining, no child DIEs are added to the CU - so no addresses
are needed for the DIEs themselves) until the range list is emitted - at
the time the attributes are added to the CU, the address pool is empty.
So check whether the address pool will be used for the range lists & add
an addr_base if that's the case.
Move these data structures closer together so their emission code can
eventually share more of its implementation.
Was an egregious bug (completely untested, evidently) where I hadn't
inverted a DWARFv5 test as needed, so it was doing the exact opposite of
what was required & thus tried to emit a DWARFv5 range list header in
DWARFv4.
Reapply 8e04896288 which was
reverted in a8154e5e0c.
Add new intrinsics
llvm.experimental.constrained.minimum
llvm.experimental.constrained.maximum
as strict versions of llvm.minimum and llvm.maximum.
Includes SystemZ back-end support.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71624
getTargetConstant prevents any optimizations from operating on the
value and basically says its already been iseled. But since we
want the index to be in a register, this isn't true.
Prior to this we were generating a vbroadcast with an immediate
argument which is illegal and was flagged by the expensive checks
bot.
This reverts commit 1f3dd83cc1, reapplying
commit bb1b0bc4e5.
The original commit failed on some builds seemingly due to the use of a
bracketed constructor with an std::array, i.e. `std::array<> arr({...})`.
Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop
casts impossible. This patch enables the salvaging of casts by using the
DW_OP_LLVM_convert operator for SExt and Trunc instructions.
There is another issue which is exposed by this fix, in which fragment
DIExpressions (which are preserved more readily by this patch) for
values that must be split across registers in ISel trigger an assertion,
as the 'split' fragments extend beyond the bounds of the fragment
DIExpression causing an error. This patch also fixes this issue by
checking the fragment status of DIExpressions which are to be split, and
dropping fragments that are invalid.
Summary:
r347747 added support for clustering mem ops with FI base operands
including support for fixed stack objects in shouldClusterFI, but
apparently this was never tested.
This patch fixes shouldClusterFI to work with scaled as well as
unscaled load/store instructions, and fixes the ordering of memory ops
in MemOpInfo::operator< to ensure that memory addresses always
increase, regardless of which direction the stack grows.
Subscribers: MatzeB, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71334
Add an extra parameter so alignment can be taken under
consideration in gather/scatter legalization.
Differential Revision: https://reviews.llvm.org/D71610
Summary: Add calculation for elements in structures in getting uniform
base for the Gather/Scatter intrinsic.
Reviewers: craig.topper, c-rhodes, RKSimon
Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71442
The caller will assert for nodes with more than 2 results unless
we return a null SDValue.
I tried to test this by copying an AArch64 test for ScalarizeVecOp_FP_ROUND.
While it did hit the assert and this commited fixed that. It also
hit a later problem that couldn't be fixed without adding strict
FP support to AArch64.
This started with adding a test to support get code coverage on
ScalarizeVecOp_UnaryOp_StrictFP by copying an existing AArch64 test
and using constrained sitofp/uitofp intrinsics.
This found 3 separate issues:
-ScalarizeVecOp_UnaryOp_StrictFP needs to do its own replacement
because the caller can't handle replacing multiple results.
-Missing integer promotion support for sitofp/uitofp
-Chain result not always assigned in ExpandLegalINT_TO_FP.
Committing them together so I can add the test case.
added a test case for macinfo.dwo emission."
This was reverted in caa4120906,
since it was causing an assertion failure on Windows bots.
This revision is revised to fix that.
Original commit message -
[DebugInfo] Refactored macro related generation, added a test case for macinfo.dwo emission.
Reviewers: dblaikie, aprantl, jini.susan.george
Tags: #debug-info #llvm
Differential Revision: https://reviews.llvm.org/D71008
This is an alternate fix for the bug discussed in D70595.
This also includes minimal tests for other in-tree targets to show the problem more
generally.
We check the number of uses as a predicate for whether some value is free to negate,
but that use count can change as we rewrite the expression in getNegatedExpression().
So something that was marked free to negate during the cost evaluation phase becomes
not free to negate during the rewrite phase (or the inverse - something that was not
free becomes free). This can lead to a crash/assert because we expect that everything
in an expression that is negatible to be handled in the corresponding code within
getNegatedExpression().
This patch adds a hack to work-around the case where we probably no longer detect
that either multiply operand of an FMA isNegatibleForFree which is assumed to be
true when we started rewriting the expression.
Differential Revision: https://reviews.llvm.org/D70975
This is an alternate fix for the bug discussed in D70595.
This also includes minimal tests for other in-tree targets to show the problem more
generally.
We check the number of uses as a predicate for whether some value is free to negate,
but that use count can change as we rewrite the expression in getNegatedExpression().
So something that was marked free to negate during the cost evaluation phase becomes
not free to negate during the rewrite phase (or the inverse - something that was not
free becomes free). This can lead to a crash/assert because we expect that everything
in an expression that is negatible to be handled in the corresponding code within
getNegatedExpression().
This patch adds a hack to work-around the case where we probably no longer detect
that either multiply operand of an FMA isNegatibleForFree which is assumed to be
true when we started rewriting the expression.
Differential Revision: https://reviews.llvm.org/D70975
Summary:
Right now, DAGCombiner process the nodes in an iplementation defined order. This tends to be fragile as optimisation may or may not kick in depending on the traversal order.
This is part of a larger effort to get the DAGCombiner to process its node in topological order.
Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70921
of integers to floating point.
This includes some of Craig Topper's changes for promotion support from
D71130.
Differential Revision: https://reviews.llvm.org/D69275
Summary:
This is a resubmit of D71473.
This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients.
Functions will be deprecated one by one and as in tree code is cleaned up.
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: aaron.ballman, courbet
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71547
Summary:
With DWARF5 it is no longer possible to distinguish normal methods and methods with `__attribute__((objc_direct))` by just looking at the debug information
as they are both now children of the of the DW_TAG_structure_type that defines them (before only the `__attribute__((objc_direct))` methods were children).
This means that in LLDB we are no longer able to create a correct Clang AST of a module by just looking at the debug information. Instead we would
need to call the Objective-C runtime to see which of the methods have a `__attribute__((objc_direct))` and then add the attribute to our own Clang AST
depending on what the runtime returns. This would mean that we either let the module AST be dependent on the Objective-C runtime (which doesn't
seem right) or we retroactively add the missing attribute to the imported AST in our expressions.
A third option is to annotate methods with `__attribute__((objc_direct))` as `DW_AT_APPLE_objc_direct` which is what this patch implements. This way
LLDB doesn't have to call the runtime for any `__attribute__((objc_direct))` method and the AST in our module will already be correct when we create it.
Reviewers: aprantl, SouraVX
Reviewed By: aprantl
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D71201
It doesn't seem to do anything that SplitVecRes_StrictFPOp can't
do. SplitVecRes_StrictFPOp already handles nodes with a variable
number of arguments and a mix of scalar and vector arguments.
This patch makes it so that cases where multiple instructions that
differ only in their ConstantInt or ConstantFP MachineOperand values no
longer collide. For instance:
%0:_(s1) = G_CONSTANT i1 true
%1:_(s1) = G_CONSTANT i1 false
%2:_(s32) = G_FCONSTANT float 1.0
%3:_(s32) = G_FCONSTANT float 0.0
Prior to this patch the first two instructions would collide together.
Also, the last two G_FCONSTANT instructions would also collide. Now they
will no longer collide.
Differential Revision: https://reviews.llvm.org/D71558
Currently -fuse-init-array option is not effective when target triple
does not specify os, on x86,x86_64.
i.e.
// -fuse-init-array is not honored.
$ clang -target i386 -fuse-init-array test.c -S
// -fuse-init-array is honored.
$ clang -target i386-linux -fuse-init-array test.c -S
This patch fixes first case.
And does cleanup.
Reviewers: rnk, craig.topper, fhahn, echristo
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D71360