One of the cases identified in PR45116 - we don't need to limit load combines (in this case for fp->int load/store copies) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.
Differential Revision: https://reviews.llvm.org/D108318
One of the cases identified in PR45116 - we don't need to limit load combines (in this case for ISD::BUILD_PAIR) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.
This helps in particular for 32-bit X86 cases loading 64-bit size data, reducing codegen diffs vs x86_64.
Differential Revision: https://reviews.llvm.org/D108307
D106408 enables rematerialization of instructions with virtual
register uses. That has uncovered the bug in the allUsesAvailableAt
implementation: https://bugs.llvm.org/show_bug.cgi?id=51516.
In the majority of cases canRematerializeAt() called to check if
an instruction can be rematerialized before the given UseIdx.
However, SplitEditor::enterIntvAtEnd() calls it to rematerialize
an instruction at the end of a block passing LIS.getMBBEndIdx()
into the check. In the testcase from the bug it has attempted to
rematerialize ADDXri after STRXui in bb.17. The use operand %55
of the ADD is killed by the STRX but that is undetected by the check
because it adjusts passed UseIdx to the reg slot, before the kill.
The value is dead at the index passed to the check however.
This change uses a later of passed UseIdx and its reg slot. This
shall be correct because if are checking an availability of operands
before an instruction that instruction cannot be the one defining
these operands. If we are checking for late rematerialization we
are really interested if operands live past the instruction.
The bug is not exploitable without D106408 but needed to reland
reverted D106408.
Differential Revision: https://reviews.llvm.org/D108475
Basically the same as G_LROUND. Handles the llvm.llround family of intrinsics.
Also add a helper function to the MachineVerifier for checking if all of the
(virtual register) operands of an instruction are scalars. Seems like a useful
thing to have.
Differential Revision: https://reviews.llvm.org/D108429
Issue Details:
The addresses for SEH tables for Windows are incorrect as 1 was unconditionally being added to all addresses. +1 is required for the SEH end address (as it is exclusive), but the SEH start addresses is inclusive and so should be used as-is.
In the IP2State tables, the addresses are +1 for AMD64 to handle the return address for a call being after the actual call instruction but are as-is for ARM and ARM64 as the `StateFromIp` function in the VC runtime automatically takes this into account and adjusts the address that is it looking up.
Fix Details:
* Split the `getLabel` function into two: `getLabel` (used for the SEH start address and ARM+ARM64 IP2State addresses) and `getLabelPlusOne` (for the SEH end address, and AMD64 IP2State addresses).
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D107784
This patch makes InstrRefBasedLDV "safe" to work with DBG_VALUE_LISTs. It
doesn't actually interpret them, but it recognises that they specify
variable locations and avoids propagating false locations, which is better
than the current state. Observe the attached tes
* We avoid propagating DBG_VALUE_LISTs into successor blocks, as they're
not "currently" supported,
* We don't propagate other variable locations across DBG_VALUE_LISTs,
because we know that the variable location is terminated by the
DBG_VALUE_LIST.
Differential Revision: https://reviews.llvm.org/D108143
This patch removes an assertion, and adds a regression test showing why the
assertion is broken.
For context, LocIdx is a key/index number for machine locations, so that we
can describe locations as a single integer and ignore whether they're on
the stack, in registers or otherwise. Back when InstrRefBasedLDV was added,
I happened to bake in a "special" zero number for various reasons, which
Vedant identified as undesirable in this review comment:
https://reviews.llvm.org/D83047#inline-765495 . I subsequently removed that
special zero number, but it looks like I didn't delete this assertion at
the time, which assumes that a zero LocIdx is invalid.
The attached test shows that this assertion is reachable on valid code --
on x86 $rsp always gets the LocIdx number zero, and if you transfer a
variable value into it, InstrRefBasedLDV crashes on that assertion. The
code might be a bit wild to be storing variables to $rsp like that, however
we shouldn't crash on it.
Differential Revision: https://reviews.llvm.org/D108134
Produce remarks when atomic instructions are expanded into hardware instructions
in SIISelLowering.cpp. Currently, these remarks are only emitted for atomic fadd
instructions.
Differential Revision: https://reviews.llvm.org/D108150
Translate the `@llvm.lround.*` family to G_LROUND via
`IRTranslator::translateSimpleIntrinsic`.
Differential Revision: https://reviews.llvm.org/D108418
For some reductions like G_VECREDUCE_OR on AArch64, we need to scalarize
completely if the source is <= 64b. This change adds support for that in
the legalizer. If the source has a pow-2 num elements, then we can do
a tree reduction using the scalar operation in the individual elements.
Otherwise, we just create a sequential chain of operations.
For AArch64, we only need to scalarize if the input is <64b. If it's great than
64b then we can first do a fewElements step to 64b, taking advantage of vector
instructions until we reach the point of scalarization.
I also had to relax the verifier checks for reductions because the intrinsics
support <1 x EltTy> types, which we lower to scalars for GlobalISel.
Differential Revision: https://reviews.llvm.org/D108276
This changes the lowering of saddsat and ssubsat so that instead of
using:
r,o = saddo x, y
c = setcc r < 0
s = c ? INTMAX : INTMIN
ret o ? s : r
into using asr and xor to materialize the INTMAX/INTMIN constants:
r,o = saddo x, y
s = ashr r, BW-1
x = xor s, INTMIN
ret o ? x : r
https://alive2.llvm.org/ce/z/TYufgD
This seems to reduce the instruction count in most testcases across most
architectures. X86 has some custom lowering added to compensate for
cases where it can increase instruction count.
Differential Revision: https://reviews.llvm.org/D105853
Previously we pre-calculated this and cached it for every
instruction in the function. Most of the calculated results will
never be used. So instead calculate it only on the first use, and
then cache it.
The cache was originally added to fix a compile time issue which
caused r216066 to be reverted.
This change exposed that we weren't pre-computing the Value for
Arguments. I've explicitly disabled that for now as it seemed to
regress some tests on AArch64 which has sext built into its compare
instructions.
Spotted while investigating how to improve heuristics to work better
with RISCV preferring sign extend for unsigned compares for i32 on RV64.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D107976
This patch adds the beginnings of more thorough support in the
legalizers for vector-predicated (VP) operations.
The first step is the ability to widen illegal vectors. The more
complicated scenario in which the result/operands need widening but the
mask doesn't has not been handled here. That would require a lot of code
without an in-tree target on which to test it.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D107904
This patch implements Flow Sensitive Sample FDO (FSAFDO) profile
loader. We have two profile loaders for FS profile,
one before RegAlloc and one before BlockPlacement.
To enable it, when -fprofile-sample-use=<profile> is specified,
add "-enable-fs-discriminator=true \
-disable-ra-fsprofile-loader=false \
-disable-layout-fsprofile-loader=false"
to turn on the FS profile loaders.
Differential Revision: https://reviews.llvm.org/D107878
Translate the `@llvm.isnan` intrinsic to G_ISNAN when we see it.
This is pretty much the same as the associated SelectionDAGBuilder code. Main
difference is that we don't expand it here. It makes more sense to do that
during legalization in GlobalISel. GlobalISel will just legalize the generated
illegal types.
Differential Revision: https://reviews.llvm.org/D108226
Add a generic opcode equivalent to the `llvm.isnan` intrinsic +
MachineVerifier support for it.
We need an opcode here because we may want target-specific lowering later on.
Differential Revision: https://reviews.llvm.org/D108222
It was introduced in 1a6dc92 and only enabled on PowerPC/AMDGPU. That
should be enabled for all targets.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D108010
This patch adds vector-predicated ("VP") reduction intrinsics corresponding to
each of the existing unpredicated `llvm.vector.reduce.*` versions. Unlike the
unpredicated reductions, all VP reductions have a start value. This start value
is returned when the no vector element is active.
Support for expansion on targets without native vector-predication support is
included.
This patch is based on the ["reduction
slice"](https://reviews.llvm.org/D57504#1732277) of the LLVM-VP reference patch
(https://reviews.llvm.org/D57504).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D104308
Combine two G_PTR_ADDs, but keep the register bank of the constant.
That way, the combine can be used in post-regbank-select combines.
Introduce two helper methods in CombinerHelper, getRegBank and
setRegBank that get and set an optional register bank to a register.
That way, they can be used before and after register bank selection.
Differential Revision: https://reviews.llvm.org/D103326
In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree,
which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D107262
This reapplies 54a61c94f9, its follow up in 547b712500, which were
reverted 95fe61e639. Original commit message:
VarLoc based LiveDebugValues will abandon variable location propagation if
there are too many blocks and variable assignments in the function. If it
didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end
up with 1 million DBG_VALUEs just at the start of blocks.
Instruction-referencing LiveDebugValues should honour this limitation too
(because the same limitation applies to it). Hoist the relevant command
line options into LiveDebugValues.cpp and pass it down into the
implementation classes as an argument to ExtendRanges. I've duplicated all
the run-lines in live-debug-values-cutoffs.mir to have an
instruction-referencing flavour.
Differential Revision: https://reviews.llvm.org/D107823
Basic block pointer is dereferenced unconditionally for MBBs with
hasAddressTaken property.
MBBs might have hasAddressTaken property without reference to BB.
Backend developers must assign fake BB to MBB to workaround this issue
and it should be fixed.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D108092
Currently isReallyTriviallyReMaterializableGeneric() implementation
prevents rematerialization on any virtual register use on the grounds
that is not a trivial rematerialization and that we do not want to
extend liveranges.
It appears that LRE logic does not attempt to extend a liverange of
a source register for rematerialization so that is not an issue.
That is checked in the LiveRangeEdit::allUsesAvailableAt().
The only non-trivial aspect of it is accounting for tied-defs which
normally represent a read-modify-write operation and not rematerializable.
The test for a tied-def situation already exists in the
/CodeGen/AMDGPU/remat-vop.mir,
test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.
The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets
where I more or less understand the asm it seems to reduce spilling
(as expected) or be neutral. However, it needs a review by all targets'
specialists.
Differential Revision: https://reviews.llvm.org/D106408
Check if a remateralizable nstruction does not have any virtual
register uses. Even though rematerializable RA might not actually
rematerialize it in this scenario. In that case we do not want to
hoist such instruction out of the loop in a believe RA will sink
it back if needed.
This already has impact on AMDGPU target which does not check for
this condition in its isTriviallyReMaterializable implementation
and have instructions with virtual register uses enabled. The
other targets are not impacted at this point although will be when
D106408 lands.
Differential Revision: https://reviews.llvm.org/D107677
SwitchInst should have a void result type.
Add a check to the verifier to catch this error.
Reviewed By: samparker
Differential Revision: https://reviews.llvm.org/D108084
Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal.
This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands.
Differential Revision: https://reviews.llvm.org/D107597
VarLoc based LiveDebugValues will abandon variable location propagation if
there are too many blocks and variable assignments in the function. If it
didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end
up with 1 million DBG_VALUEs just at the start of blocks.
Instruction-referencing LiveDebugValues should honour this limitation too
(because the same limitation applies to it). Hoist the relevant command
line options into LiveDebugValues.cpp and pass it down into the
implementation classes as an argument to ExtendRanges. I've duplicated all
the run-lines in live-debug-values-cutoffs.mir to have an
instruction-referencing flavour.
Differential Revision: https://reviews.llvm.org/D107823
visitEXTRACT_SUBVECTOR can sometimes create illegal BITCASTs when
removing "redundant" INSERT_SUBVECTOR operations. This patch adds
an extra check to ensure such combines only occur after operation
legalisation if any resulting BITBAST is itself legal.
Differential Revision: https://reviews.llvm.org/D108086
This is a fairly common pattern:
```
%mask = G_CONSTANT iN <mask val>
%add = G_ADD %lhs, %rhs
%and = G_AND %add, %mask
```
We have combines to eliminate G_AND with a mask that does nothing.
If we combined the above to this:
```
%mask = G_CONSTANT iN <mask val>
%narrow_lhs = G_TRUNC %lhs
%narrow_rhs = G_TRUNC %rhs
%narrow_add = G_ADD %narrow_lhs, %narrow_rhs
%ext = G_ZEXT %narrow_add
%and = G_AND %ext, %mask
```
We'd be able to take advantage of those combines using the trunc + zext.
For this to work (or be beneficial in the best case)
- The operation we want to narrow then widen must only be used by the G_AND
- The G_TRUNC + G_ZEXT must be free
- Performing the operation at a narrower width must not produce a different
value than performing it at the original width *after masking.*
Example comparison between SDAG + GISel: https://godbolt.org/z/63jzb1Yvj
At -Os for AArch64, this is a 0.2% code size improvement on CTMark/pairlocalign.
Differential Revision: https://reviews.llvm.org/D107929
AttributeList::hasAttribute() is confusing, use clearer methods like
hasParamAttr()/hasRetAttr().
Add hasRetAttr() since it was missing from AttributeList.
We may use several COPY instructions to copy the needed sub-registers
during split. But the way we split the lanes during the COPYs may be
different from the subranges of the old register. This would fail when we
extend the subranges of the new register because the LaneMasks do not
match exactly between subranges of new register and old register.
Since we are bundling the COPYs, I think there is no need to further refine the
subranges of the new register based on the set of LaneMasks of the inserted COPYs.
I am not sure if there will be further breaking cases. But as the subranges of
new register are created based on the LaneMasks of the subranges of old register,
it will be highly possible we will always find an exact LaneMask match.
We can think about how to make the extendPHIKillRanges() work for
subrange mask mismatch case if we meet more such cases in the future.
The test case was from D105065 by @arsenm.
Differential Revision: https://reviews.llvm.org/D107829
This patch adds Pass1 of MIRADDFSDiscriminatorsPass before register
allocation, and Pass2 of MIRAddFSDiscriminatorsPass before
Block-Placement. This is still under --enable-fs-discrmininator
option (default false).
This would reduce the turn-around time for FSAFDO transition.
Differential Revision: https://reviews.llvm.org/D104579
The introduction of `SHF_GNU_RETAIN` has caused massive problems on Solaris.
Initially, as reported in Bug 49437, it caused dozens of testsuite failures
on both sparc and x86. The objects were marked as `ELFOSABI_NONE`, but
`SHF_GNU_RETAIN` is a GNU extension. In the native Solaris ABI, that flag
(in the range for OS-specific values) is `SHF_SUNW_ABSENT` with a
completely different semantics, which confuses Solaris `ld` very much.
Later, the objects became (correctly) marked `ELFOSABI_GNU`, which Solaris
`ld` doesn't support, causing it to SEGV and break the build. The linker
is currently being hardened to not accept non-native OS ABIs to avoid this.
The need for linker support is already documented in
`clang/include/clang/Basic/AttrDocs.td`, but not currently checked.
This patch avoids all this by not emitting `SHF_GNU_RETAIN` on Solaris at all.
Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and
`x86_64-pc-linux-gnu`.
Differential Revision: https://reviews.llvm.org/D107747
We were calling find and then using operator[]. Instead keep the
iterator from find and use it to get the value.
Just happened to notice while investigating how we decide what extends
to use between basic blocks.
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.
Differential Revision: https://reviews.llvm.org/D107528
This patch refactors / simplifies salvageDebugInfoImpl(). The goal
here is to simplify the implementation of coro::salvageDebugInfo() in
a followup patch.
1. Change the return value to I.getOperand(0). Currently users of
salvageDebugInfoImpl() assume that the first operand is
I.getOperand(0). This patch makes this information explicit. A
nice side-effect of this change is that it allows us to salvage
expressions such as add i8 1, %a in the future.
2. Factor out the creation of a DIExpression and return an array of
DIExpression operations instead. This change allows users that
call salvageDebugInfoImpl() in a loop to avoid the costly
creation of temporary DIExpressions and to defer the creation of
a DIExpression until the end.
This patch does not change any functionality.
rdar://80227769
Differential Revision: https://reviews.llvm.org/D107383
We may call lowerRelativeReference in MC to determine whether target
supports this lowering. We should return nullptr instead of crashing
when we haven't implemented the real lowering.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D107830
If a G_SHL is fed by a G_CONSTANT, the lower and upper bits of the source can be
shifted individually by the constant shift amount.
However in case the shift amount came from a G_TRUNC(G_CONSTANT), the generic shift legalization
code was used, producing intermediate shifts that are potentially illegal on some targets.
This change teaches narrowScalarShift to look through G_TRUNCs and G_*EXTs.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D89100
This patch is a revert of e08f205f5c. In that patch, DW_TAG_subprograms
were permitted to be referenced across CU boundaries, to improve stack
trace construction using call site information. Unfortunately, as
documented in PR48790, the way that subprograms are "owned" by dwarf units
is sufficiently complicated that subprograms end up in unexpected units,
invalidating cross-unit references.
There's no obvious way to easily fix this, and several attempts have
failed. Revert this to ensure correct DWARF is always emitted.
Three tests change in addition to the reversion, but they're all very
light alterations.
Differential Revision: https://reviews.llvm.org/D107076
We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
instead of eraseFromParent().
We should probably use that in other places too but fix this issue which
affects clang bootstrap builds for now.
This commit adds the isnan intrinsic and provides a default expansion
for it in the SDAG. However, it makes the assumption that types
it operates on are IEEE-compliant types. This is not always the case.
An example of that is PPC "double double" which has a representation
that
- Does not need to conform to IEEE requirements for isnan as it is
not an IEEE-compliant type
- Does not have a representation that allows for straightforward
reinterpreting as an integer and use of integer operations
The result was that this commit broke __builtin_isnan for ppc_fp128
making many valid numeric values report a NaN.
This patch simply changes the expansion to always expand to unordered
comparison (regardless of whether FP exceptions are tracked). This
is inline with previous semantics.
This isn't optimal, but prevents crashing when the libcall isn't
available. It just calculates the full product and makes sure the high bits
match the sign of the low half. Each of the pieces should go through their own
type legalization.
This can make D107420 unnecessary.
Needs tests, but I wanted to start discussion about D107420.
Reviewed By: FreddyYe
Differential Revision: https://reviews.llvm.org/D107581
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.
* The most common mechanism is using an unordered comparison made by
instruction 'fcmp uno'. This simple solution is target-independent
and works well in most cases. It however is not suitable if floating
point exceptions are tracked. Corresponding IEEE 754 operation and C
function must never raise FP exception, even if the argument is a
signaling NaN. Compare instructions usually does not have such
property, they raise 'invalid' exception in such case. So this
mechanism is unsuitable when exception behavior is strict. In
particular it could result in unexpected trapping if argument is SNaN.
* Another solution was implemented in https://reviews.llvm.org/D95948.
It is used in the cases when raising FP exceptions by 'isnan' is not
allowed. This solution implements 'isnan' using integer operations.
It solves the problem of exceptions, but offers one solution for all
targets, however some can do the check in more efficient way.
* Solution implemented by https://reviews.llvm.org/D96568 introduced a
hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
specific code into IR. Now only SystemZ implements this hook and it
generates a call to target specific intrinsic function.
Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:
* The operation 'isnan' is hidden behind generic integer operations or
target-specific intrinsics. It complicates analysis and can prevent
some optimizations.
* IR can be created by tools other than clang, in this case treatment
of 'isnan' has to be duplicated in that tool.
Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':
std::isnan(std::numeric_limits<float>::quiet_NaN())
The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.
To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.
Differential Revision: https://reviews.llvm.org/D104854
Fixes issue where late materialized constants can be more strictly
aligned then their containing csect.
Differential Revision: https://reviews.llvm.org/D103103
We don't have real demanded bits support for MULHU, but we can
still use the known bits based constant folding support at the end
of SimplifyDemandedBits to simplify a MULHU. This helps with cases
where we know the LHS and RHS have enough leading zeros so that
the high multiply result is always 0.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D106471
IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes.
This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector.
This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector.
Fixes PR50053
Differential Revision: https://reviews.llvm.org/D107068
It's entirely possible (because it actually happened) for a bool
variable to end up with a 256-bit DW_AT_const_value. This came about
when a local bool variable was initialized from a bitfield in a
32-byte struct of bitfields, and after inlining and constant
propagation, the variable did have a constant value. The sequence of
optimizations had it carrying "i256" values around, but once the
constant made it into the llvm.dbg.value, no further IR changes could
affect it.
Technically the llvm.dbg.value did have a DIExpression to reduce it
back down to 8 bits, but the compiler is in no way ready to emit an
oversized constant *and* a DWARF expression to manipulate it.
Depending on the circumstances, we had either just the very fat bool
value, or an expression with no starting value.
The sequence of optimizations that led to this state did seem pretty
reasonable, so the solution I came up with was to invent a DWARF
constant expression folder. Currently it only does convert ops, but
there's no reason it couldn't do other ops if that became useful.
This broke three tests that depended on having convert ops survive
into the DWARF, so I added an operator that would abort the folder to
each of those tests.
Differential Revision: https://reviews.llvm.org/D106915
Instructions that produceSameValue produce same values for operands with
same index. matchEqualDefs used to return true for any two values from
different instructions that produce same values. Fix this by checking if
values are defined by operands with the same index.
Differential Revision: https://reviews.llvm.org/D107362
The LegalizeAction for this node should follow the logic for
`VECREDUCE_SEQ_FADD` and be determined using the vector operand's type.
here isn't an in-tree target that makes use of this, but I think it's safe to
say this is how it should behave, should a target want to customize the action
for this node.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D107478
to `lib/CodeGen/CommandFlags.cpp`. It can replace
-x86-experimental-pref-loop-alignment=.
The loop alignment is only used by MachineBlockPlacement.
The implementation uses a new `llvm::TargetOptions` for now, as
an IR function attribute/module flags metadata may be overkill.
This is the llvm part of D106701.
This allows special constants like to 0 to be recognized. It's also
expected by isel patterns if a target had a mulh with immediate instructions.
The commuting done by tablegen won't commute patterns with immediates since it
expects DAGCombine to have done it.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D107486
This attempts to make more of RDA aware of potentially overlapping
subregisters. Some of this was already in place, with it iterating
through MCRegUnitIterators. This also replaces calls to
LiveRegs.contains(..) with !LiveRegs.available(..), and updates the
isValidRegUseOf and isValidRegDefOf to search subregs.
Differential Revision: https://reviews.llvm.org/D107351
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.
* The most common mechanism is using an unordered comparison made by
instruction 'fcmp uno'. This simple solution is target-independent
and works well in most cases. It however is not suitable if floating
point exceptions are tracked. Corresponding IEEE 754 operation and C
function must never raise FP exception, even if the argument is a
signaling NaN. Compare instructions usually does not have such
property, they raise 'invalid' exception in such case. So this
mechanism is unsuitable when exception behavior is strict. In
particular it could result in unexpected trapping if argument is SNaN.
* Another solution was implemented in https://reviews.llvm.org/D95948.
It is used in the cases when raising FP exceptions by 'isnan' is not
allowed. This solution implements 'isnan' using integer operations.
It solves the problem of exceptions, but offers one solution for all
targets, however some can do the check in more efficient way.
* Solution implemented by https://reviews.llvm.org/D96568 introduced a
hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
specific code into IR. Now only SystemZ implements this hook and it
generates a call to target specific intrinsic function.
Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:
* The operation 'isnan' is hidden behind generic integer operations or
target-specific intrinsics. It complicates analysis and can prevent
some optimizations.
* IR can be created by tools other than clang, in this case treatment
of 'isnan' has to be duplicated in that tool.
Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':
std::isnan(std::numeric_limits<float>::quiet_NaN())
The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.
To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.
Differential Revision: https://reviews.llvm.org/D104854
- Rename `wasm.catch` intrinsic to `wasm.catch.exn`, because we are
planning to add a separate `wasm.catch.longjmp` intrinsic which
returns two values.
- Rename several variables
- Remove an unnecessary parameter from `canLongjmp` and `isEmAsmCall`
from LowerEmscriptenEHSjLj pass
- Add `-verify-machineinstrs` in a test for a safety measure
- Add more comments + fix some errors in comments
- Replace `std::vector` with `SmallVector` for cases likely with small
number of elements
- Renamed `EnableEH`/`EnableSjLj` to `EnableEmEH`/`EnableEmSjLj`: We are
soon going to add `EnableWasmSjLj`, so this makes the distincion
clearer
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D107405
Previously we would emit constant pool entries for ldr inline asm at the
very end of AsmPrinter::doFinalization(). However, if we're emitting
dwarf aranges, that would end all sections with aranges. Then if we have
constant pool entries to be emitted in those same sections, we'd hit an
assert that the section has already been ended.
We want to emit constant pool entries before emitting dwarf aranges.
This patch splits out arm32/64's constant pool entry emission into its
own MCTargetStreamer virtual method.
Fixes PR51208
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D107314
We had some similar hasOneUse/isNON_EXTLoad early-outs spread out over different parts of the method - we should pull them all together.
Noticed while triaging PR45116