This patch introduces a new tool, llvm-tapi-diff, that compares and returns the diff of two TBD files.
Reviewed By: ributzka, JDevlieghere
Differential Revision: https://reviews.llvm.org/D101835
If we're not emitting separate fences for the success/failure cases, we
need to pass the merged ordering to the target so it can emit the
correct instructions.
For the PowerPC testcase, we end up with extra fences, but that seems
like an improvement over missing fences. If someone wants to improve
that, the PowerPC backed could be taught to emit the fences after isel,
instead of depending on fences emitted by AtomicExpand.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33332 .
Differential Revision: https://reviews.llvm.org/D103342
Restructure handling of cfg-hide-unreachable-paths and
cfg-hide-deoptimize-paths options so as to make it easier
to introduce new types of hidden blocks.
This is a first step towards simplifying the transform interface to be less error prone. The basic idea is that querying SCEV is cheap (since it's cached) and we can just check for properties related to branch folding in the transform method instead of relying on the heuristic part to pass everything in correctly.
Differential Revision: https://reviews.llvm.org/D103584
This is a followup to D103422. The DenseMapInfo implementations for
ArrayRef and StringRef are moved into the ArrayRef.h and StringRef.h
headers, which means that these two headers no longer need to be
included by DenseMapInfo.h.
This required adding a few additional includes, as many files were
relying on various things pulled in by ArrayRef.h.
Differential Revision: https://reviews.llvm.org/D103491
`TargetFrameLowering::emitCalleeSavedFrameMoves` with 4 arguments is not
used anywhere in CodeGen. Thus it shouldn't be exposed as a virtual
function. NFC.
Differential Revision: https://reviews.llvm.org/D103328
The buildbot ppc64le-lld-multistage-test has been failing because the variable
Tag in Waymaking.h is set but not used. This patch removes that varaible.
This patch was split from https://reviews.llvm.org/D102246
[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This is mainly for ProfileData part of change. It will load
FS Profile when such profile is detected. For an extbinary format profile,
create_llvm_prof tool will add a flag to profile summary section.
For other format profiles, the users need to use an internal option
(-profile-isfs) to tell the compiler that the profile uses FS discriminators.
This patch also simplified the bit API used by FS discriminators.
Differential Revision: https://reviews.llvm.org/D103041
Add getDemandedBits method for uses so we can query demanded bits for each use. This can help getting better use information. For example, for the code below
define i32 @test_use(i32 %a) {
%1 = and i32 %a, -256
%2 = or i32 %1, 1
%3 = trunc i32 %2 to i8 (didn't optimize this to 1 for illustration purpose)
... some use of %3
ret %2
}
if we look at the demanded bit of %2 (which is all 32 bits because of the return), we would conclude that %a is used regardless of how its return is used. However, if we look at each use separately, we will see that the demanded bit of %2 in trunc only uses the lower 8 bits of %a which is redefined, therefore %a's usage depends on how the function return is used.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D97074
This patch uses the calculated maximum scalable VFs to build VPlans,
cost them and select a suitable scalable VF.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D98722
A recent patch:
https://reviews.llvm.org/rGe0921655b1ff8d4ba7c14be59252fe05b705920e
changed clangs AIX bitfield handling to use 4-byte bitfield containers,
matching XLs behavior. This change triggers static assert failures when
bootstrapping. Change the macro we check to enable bitfield packing on
AIX to `__clang__` which is defined by both xlclang and clang.
Differential Revision: https://reviews.llvm.org/D103474
Use RuntimeLibcalls to get a common way to pick correct RTLIB::POWI_*
libcall for a given value type.
This includes a small refactoring of ExpandFPLibCall and
ExpandArgFPLibCall in SelectionDAGLegalize to share a bit of code,
plus adding an ExpandFPLibCall version that can be called directly
when expanding FPOWI/STRICT_FPOWI to ensure that we actually use
the same RTLIB::Libcall when expanding the libcall as we used when
checking the legality of such a call by doing a getLibcallName check.
Differential Revision: https://reviews.llvm.org/D103050
When rewriting
powf(2.0, itofp(x)) -> ldexpf(1.0, x)
exp2(sitofp(x)) -> ldexp(1.0, sext(x))
exp2(uitofp(x)) -> ldexp(1.0, zext(x))
the wrong type was used for the second argument in the ldexp/ldexpf
libc call, for target architectures with 16 bit "int" type.
The transform incorrectly used a bitcasted function pointer with
a 32-bit argument when emitting the ldexp/ldexpf call for such
targets.
The fault is solved by using the correct function prototype
in the call, by asking TargetLibraryInfo about the size of "int".
TargetLibraryInfo by default derives the size of the int type by
assuming that it is 16 bits for 16-bit architectures, and
32 bits otherwise. If this isn't true for a target it should be
possible to override that default in the TargetLibraryInfo
initializer.
Differential Revision: https://reviews.llvm.org/D99438
Some existing places use getPointerElementType() to create a copy of a
pointer type with some new address space.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D103429
It's still in use in a few places so we can't delete it yet but there's not
many at this point.
Differential Revision: https://reviews.llvm.org/D103352
WindowsSupport.h is a public header, however if it gets included, will cause a compile error indicating that llvm/Config/config.h cannot be found, because config.h is a private header. However there is no actual dependency on the private things in this header, so it can be changed to the public config header.
Reviewed By: amccarth
Differential Revision: https://reviews.llvm.org/D103370
This patch transforms the sequence
lea (reg1, reg2), reg3
sub reg3, reg4
to two sub instructions
sub reg1, reg4
sub reg2, reg4
Similar optimization can also be applied to LEA/ADD sequence.
The modifications to TwoAddressInstructionPass is to ensure the operands of ADD
instruction has expected order (the dest register of LEA should be src register of ADD).
Differential Revision: https://reviews.llvm.org/D101970
When we're remapping an AddRec, the AddRec constructed by a partial
rewrite might not make sense. This triggers an assertion complaining
it's not loop-invariant.
Instead of constructing the partially rewritten AddRec, just skip
straight to calling evaluateAtIteration.
Testcase was automatically reduced using llvm-reduce, so it's a little
messy, but hopefully makes sense.
Differential Revision: https://reviews.llvm.org/D102959
As suggested in https://bugs.llvm.org/show_bug.cgi?id=50527, this
moves the DenseMapInfo for APInt and APSInt into the respective
headers, removing the need to include APInt.h and APSInt.h from
DenseMapInfo.h.
We could probably do the same from StringRef and ArrayRef as well.
Differential Revision: https://reviews.llvm.org/D103422
InstCombine didn't perform the transformations when fmul's operands were
the same instruction because it required to have one use for each of them
which is false in the case. This patch fixes this + adds tests for them
and introduces a new function isOnlyUserOfAnyOperand to check these cases
in a single place.
This patch is a result of discussion in D102574.
Differential Revision: https://reviews.llvm.org/D102698
This patch adds TargetStackID::WasmLocal. This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.
For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there. Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.
Differential Revision: https://reviews.llvm.org/D101140
Update isFirstOrderRecurrence to explore all uses of a recurrence phi
and check if we can sink them. If there are multiple users to sink, they
are all mapped to the previous instruction.
Fixes PR44286 (and another PR or two).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D84951
This is based on the assumption that most simulated instructions don't define
more than one or two registers. This is true for example on x86, where
most instruction definitions don't declare more than one register write.
The default code region size has been increased from 8 to 16. This is based on
the assumption that, for small microbenchmarks, the typical code snippet size is
often less than 16 instructions.
mca::Instruction now uses bitfields to pack flags.
No functional change intended.
1. Removed redundant includes,
2. Removed never defined and used `releaseMemory()`.
3. Fixed member functions names first letter case.
4. Renamed duplicate (in nested struct `NonLocalPointerInfo`) name
`NonLocalDeps` to `NonLocalDepsMap`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D102358
ExprValueMap is a map from SCEV * to a set-vector of (Value *, ConstantInt *) pair,
and while the map itself will likely be big-ish (have many keys),
it is a reasonable assumption that each key will refer to a small-ish
number of pairs.
In particular looking at n=512 case from
https://bugs.llvm.org/show_bug.cgi?id=50384,
the small-size of 4 appears to be the sweet spot,
it results in the least allocations while minimizing memory footprint.
```
$ for i in $(ls heaptrack.opt.*.gz); do echo $i; heaptrack_print $i | tail -n 6; echo ""; done
heaptrack.opt.0-orig.gz
total runtime: 14.32s.
calls to allocation functions: 8222442 (574192/s)
temporary memory allocations: 2419000 (168924/s)
peak heap memory consumption: 190.98MB
peak RSS (including heaptrack overhead): 239.65MB
total memory leaked: 67.58KB
heaptrack.opt.1-n1.gz
total runtime: 13.72s.
calls to allocation functions: 7184188 (523705/s)
temporary memory allocations: 2419017 (176338/s)
peak heap memory consumption: 191.38MB
peak RSS (including heaptrack overhead): 239.64MB
total memory leaked: 67.58KB
heaptrack.opt.2-n2.gz
total runtime: 12.24s.
calls to allocation functions: 6146827 (502355/s)
temporary memory allocations: 2418997 (197695/s)
peak heap memory consumption: 163.31MB
peak RSS (including heaptrack overhead): 211.01MB
total memory leaked: 67.58KB
heaptrack.opt.3-n4.gz
total runtime: 12.28s.
calls to allocation functions: 6068532 (494260/s)
temporary memory allocations: 2418985 (197017/s)
peak heap memory consumption: 155.43MB
peak RSS (including heaptrack overhead): 201.77MB
total memory leaked: 67.58KB
heaptrack.opt.4-n8.gz
total runtime: 12.06s.
calls to allocation functions: 6068042 (503321/s)
temporary memory allocations: 2418992 (200646/s)
peak heap memory consumption: 166.03MB
peak RSS (including heaptrack overhead): 213.55MB
total memory leaked: 67.58KB
heaptrack.opt.5-n16.gz
total runtime: 12.14s.
calls to allocation functions: 6067993 (499958/s)
temporary memory allocations: 2418999 (199307/s)
peak heap memory consumption: 187.24MB
peak RSS (including heaptrack overhead): 233.69MB
total memory leaked: 67.58KB
```
While that test may be an edge worst-case scenario,
https://llvm-compile-time-tracker.com/compare.php?from=dee85d47d9f15fc268f7b18f279dac2774836615&to=98a57e31b1947d5bcdf4a5605ac2ab32b4bd5f63&stat=instructions
agrees that this also results in improvements in the usual situations.
This patch adds TargetStackID::WasmLocal. This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.
For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there. Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.
Differential Revision: https://reviews.llvm.org/D101140
This is similar to the fix in c590a9880d ( PR49832 ), but
we missed handling the pattern for select of bools (no compare
inst).
We can't substitute a vector value because the equality condition
replacement that we are attempting requires that the condition
is true/false for the entire value. Vector select can be partly
true/false.
I added an assert for vector types, so we shouldn't hit this again.
Fixed formatting while auditing the callers.
https://llvm.org/PR50500
If a cmpxchg specifies acquire or seq_cst on failure, make sure we
generate code consistent with that ordering even if the success ordering
is not acquire/seq_cst.
At one point, it was ambiguous whether this sort of construct was valid,
but the C++ standad and LLVM now accept arbitrary combinations of
success/failure orderings.
This doesn't address the corresponding issue in AtomicExpand. (This was
reported as https://bugs.llvm.org/show_bug.cgi?id=33332 .)
Fixes https://bugs.llvm.org/show_bug.cgi?id=50512.
Differential Revision: https://reviews.llvm.org/D103284
Parameter positions seem like they should be unsigned.
While there, make function names lowercase per coding standards.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D103224
This patch changes LoopFlattenPass from FunctionPass to LoopNestPass.
Utilize LoopNest and let function 'Flatten' generate information from it.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D102904
This patch adds TargetStackID::WasmLocal. This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.
For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there. Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.
Differential Revision: https://reviews.llvm.org/D101140
This patch changes LoopFlattenPass from FunctionPass to LoopNestPass.
Utilize LoopNest and let function 'Flatten' generate information from it.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D102904
This patch changes LoopFlattenPass from FunctionPass to LoopNestPass.
Utilize LoopNest and let function 'Flatten' generate information from it.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D102904
The constructor of InOrderIssueStage no longer takes as input a reference to the
target scheduling model. The stage can always query the subtarget to obtain a
reference to the scheduling model.
The ResourceManager is no longer stored internally as a unique_ptr.
Moved a couple of method definitions to the .cpp file.
Moved the logic that checks for RAW hazards from the InOrderIssueStage to the
RegisterFile.
Changed how the InOrderIssueStage keeps track of backend stalls. Stall events
are now generated from method notifyStallEvent().
No functional change intended.
This is the first in a series of patches to provide builtins for
compatibility with the XL compiler. Most of the builtins already had
intrinsics and only needed to be implemented in the front end.
Intrinsics were created for the three iospace builtins, eieio, and icbt.
Pseudo instructions were created for eieio and iospace_eieio to
ensure that nops were inserted before the eieio instruction.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D102443
in stripDebugInfo(). This patch fixes an oversight in
https://reviews.llvm.org/D96181 and also takes into account loop
metadata pointing to other MDNodes that point into the debug info.
rdar://78487175
Differential Revision: https://reviews.llvm.org/D103220
Mark the `ELFRelocationEntry::dump` method as `LLVM_DUMP_METHOD` to
annotate it properly as used to prevent the function being dead stripped
away. This allows use of `dump` in the debugger. This is purely to
improve the developer experience.
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D99149
This patch adds a way for the target to configure the type it uses for
the explicit vector length operands of VP SDNodes. The type must be a
legal integer type (there is still no target-independent legalization of
this operand) and must currently be at least as big as i32, the type
used by the IR intrinsics. An implicit zero-extension takes place on
targets which choose a larger type. All VP nodes should be created with
this type used for the EVL operand.
This allows 64-bit RISC-V to avoid custom legalization of all VP nodes,
keeping them in their target-independent form for that bit longer.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D103027
When lowering the dynamic, guided, auto and runtime types of scheduling,
there is an optional monotonic or non-monotonic modifier. This patch
adds support in the OMP IR Builder to pass this down to the runtime
functions.
Also implements tests for the variants.
Differential Revision: https://reviews.llvm.org/D102008
When lowering the dynamic, guided, auto and runtime types of scheduling,
there is an optional monotonic or non-monotonic modifier. This patch
adds support in the OMP IR Builder to pass this down to the runtime
functions.
Also implements tests for the variants.
Differential Revision: https://reviews.llvm.org/D102008
Thhis is a port from the DAG legalization. We're still missing some of the
canonicalizations of shuffles but it's a start.
Differential Revision: https://reviews.llvm.org/D102828
`non-global-value-max-name-size` is used by `Value` to cap the length of local value name. However, this flag is not considered by `LLParser`, which leads to unexpected `use of undefined value error`. The fix is to move the responsibility of capping the length to `ValueSymbolTable`.
The test is the one provided by [[ https://bugs.llvm.org/show_bug.cgi?id=45899 | Mikael in the bug report ]].
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D102707
There can be a need for some optimizations to get (base, offset)
for any GC pointer. The base can be calculated by generating
needed instructions as it is done by the
RewriteStatepointsForGC::findBasePointer() function. The offset
can be calculated in the same way. Though to not expose the base
calculation and to make the offset calculation as simple as
ptrtoint(derived_ptr) - ptrtoint(base_ptr), which is illegal
outside RS4GC, this patch introduces 2 intrinsics:
@llvm.experimental.gc.get.pointer.base(%derived_ptr)
@llvm.experimental.gc.get.pointer.offset(%derived_ptr)
These intrinsics are inlined by RS4GC along with generation of
statepoint sequences.
With these new intrinsics the GC parseable lowering for atomic
memcpy intrinsics (6ec2c5e402)
could be implemented as a separate pass.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D100445
There were a bunch of lost debug location remarks that show up when legalizing
tail calls on AArch64.
This would happen because we drop the return in the block where we emit the
tail call. So, we end up dropping the debug location, which makes the
LostDebugLocObserver report a missing debug location.
Although it's *true* that we lose these debug locations, this isn't
a particularly useful remark. We expect to drop these debug locations when
emitting tail calls. Suppressing remarks in this case is preferable, since the
amount of noise could hide actual debug location related bugs.
To do this, I just plumbed the LostDebugLocObserver through the relevant
LegalizerHelper functions. This is the only case I can think of where we need
the LostDebugLocObserver in the LegalizerHelper. So, rather than storing it
in the LegalizerHelper proper and mucking around with the constructors, I
figured it'd be cleanest to take the simplest path for now.
This clears up ~20 noisy lost debug location remarks on CTMark in AArch64 at
-Os.
Differential Revision: https://reviews.llvm.org/D103128
This patch introduces "DBG_PHI" instructions, a marker of where a PHI
instruction used to be, before PHI elimination. Under the instruction
referencing model, we want to know where every value in the function is
defined -- and a PHI, even if implicit, is such a place.
Just like instruction numbers, we can use this to identify a value to be
used as a variable value, but we don't need to know what instruction
defines that value, for example:
bb1:
DBG_PHI $rax, 1
[... more insts ... ]
bb2:
DBG_INSTR_REF 1, 0, !1234, !DIExpression()
This specifies that on entry to bb1, whatever value is in $rax is known
as value number one -- and the later DBG_INSTR_REF marks the position
where variable !1234 should take on value number one.
PHI locations are stored in MachineFunction for the duration of the
regalloc phase in the DebugPHIPositions map. The map is populated by
PHIElimination, and then flushed back into the instruction stream by
virtregrewriter. A small amount of maintenence is needed in
LiveDebugVariables to account for registers being split, but only for
individual positions, not for entire ranges of blocks.
Differential Revision: https://reviews.llvm.org/D86812
This patch implements getSmallConstantTripMultiple(L) correctly for multiple exit loops. The previous implementation was both imprecise, and violated the specified behavior of the method. This was fine in practice, because it turns out the function was both dead in real code, and not tested for the multiple exit case.
Differential Revision: https://reviews.llvm.org/D103189
DwarfDebug unconditionally assumes for all call instructions the 0th
operand is the callee operand, which seems to be true for other targets,
but not for WebAssembly. This adds `TargetInstrInfo::getCallOperand`
method whose default implementation returns `getOperand(0)` and makes
WebAssembly overrides it to use its own utility method to get the callee
operand.
This also fixes an existing bug in `WebAssembly::getCalleeOp`, which was
uncovered by this CL.
Reviewed By: dschuff, djtodoro
Differential Revision: https://reviews.llvm.org/D102978
This came up in review for another patch, see https://reviews.llvm.org/D102982#2782407 for full context.
I've reviewed the callers to make sure they can handle multiple exit loops w/non-zero returns. There's two cases in target cost models where results might change (Hexagon and PowerPC), but the results looked legal and reasonable. If a target maintainer wishes to back out the effect of the costing change, they should explicitly check for multiple exit loops and handle them as desired.
Differential Revision: https://reviews.llvm.org/D103182
In objdump, many targets support `-M no-aliases`. Instead of having a
`-*-no-aliases` for each target when LLVM adds the support, it makes more sense
to introduce objdump style `-M`.
-riscv-arch-reg-names is removed. -riscv-no-aliases has too many uses and thus is retained for now.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D103004
The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently simplifies to constants, but we can generalize to arbitrary Values using the InstructionSimplify infrastructure at very low cost.
By itself, this enables some simplifications, but it's mainly useful when combined with the branch simplification over in D102928.
Differential Revision: https://reviews.llvm.org/D102934
Support virtual, physical and tied i128 register operands in inline assembly.
i128 is on SystemZ not really supported and is not a legal type and generally
such a value will be split into two i64 parts. There are however some
instructions that require a pair of two GPR64 registers contained in the GR128
bit reg class, which is untyped.
For inline assmebly operands, it proved to be very cumbersome to first follow
the general behavior of splitting an i128 operand into two parts and then
later rebuild the INLINEASM MI to have one GR128 register. Instead, some
minor common code changes were made to SelectionDAGBUilder to only create one
GR128 register part to begin with. In particular:
- getNumRegisters() now has an optional parameter "RegisterVT" which is
passed by AddInlineAsmOperands() and GetRegistersForValue().
- The bitcasting in GetRegistersForValue is not performed if RegVT is
Untyped.
- The RC for a tied use in AddInlineAsmOperands() is now computed either from
the tied def (virtual register), or by getMinimalPhysRegClass() (physical
register).
- InstrEmitter.cpp:EmitCopyFromReg() has been fixed so that the register
class (DstRC) can also be computed for an illegal type.
In the SystemZ backend getNumRegisters(), splitValueIntoRegisterParts() and
joinRegisterPartsIntoValue() have been implemented to handle i128 operands.
Differential Revision: https://reviews.llvm.org/D100788
Review: Ulrich Weigand
When loop hints are passed via metadata, the allowReordering function
in LoopVectorizationLegality will allow the order of floating point
operations to be changed:
bool allowReordering() const {
// When enabling loop hints are provided we allow the vectorizer to change
// the order of operations that is given by the scalar loop. This is not
// enabled by default because can be unsafe or inefficient.
The -enable-strict-reductions flag introduced in D98435 will currently only
vectorize reductions in-loop if hints are used, since canVectorizeFPMath()
will return false if reordering is not allowed.
This patch changes canVectorizeFPMath() to query whether it is safe to
vectorize the loop with ordered reductions if no hints are used. For
testing purposes, an additional flag (-hints-allow-reordering) has been
added to disable the reordering behaviour described above.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D101836
Global values imply flags such as readable, writable, executable for the
sections that they will be placed in. Currently MC places all such
entries into the same section, using the first set of flags seen. This
can lead to situations in LTO where a writable global is placed in the
same named section as a readable global from another file, and the
section may not be marked writable.
D72194 ensures that mergeable globals with explicit sections are placed
in separate sections with compatible entry size, by emitting the
`unique` assembly syntax where appropriate. This change extends that
approach to include section flags, so that globals with different
section flags are emitted in separate unique sections.
Differential revision: https://reviews.llvm.org/D100944
Summary: This is a NFC patch to change the input parameter of the method SectionRef::isDebugSection(), by replacing the StringRef SectionName with DataRefImpl Sec. This allows us to determine if a section is debug type in more ways than just by section name.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D102601
Since the opaque pointer type won't contain the pointee type, we need to
separately encode the value type for an atomicrmw.
Emit this new code for atomicrmw.
Handle this new code and the old one in the bitcode reader.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D103123
Beside the `comdat any` deduplication feature, instrumentations use comdat to
establish dependencies among a group of sections, to prevent section based
linker garbage collection from discarding some members without discarding all.
LangRef acknowledges this usage with the following wording:
> All global objects that specify this key will only end up in the final object file if the linker chooses that key over some other key.
On ELF, for PGO instrumentation, a `__llvm_prf_cnts` section and its associated
`__llvm_prf_data` section are placed in the same GRP_COMDAT group. A
`__llvm_prf_data` is usually not referenced and expects the liveness of its
associated `__llvm_prf_cnts` to retain it.
The `setComdat(nullptr)` code (added by D10679) in InternalizePass can break the
use case (a `__llvm_prf_data` may be dropped with its associated `__llvm_prf_cnts` retained).
The main goal of this patch is to fix the dependency relationship.
I think it makes sense for InternalizePass to internalize a comdat and thus
suppress the deduplication feature, e.g. a relocatable link of a regular LTO can
create an object file affected by InternalizePass.
If a non-internal comdat in a.o is prevailed by an internal comdat in b.o, the
a.o references to the comdat definitions will be non-resolvable (references
cannot bind to STB_LOCAL definitions in b.o).
On PE-COFF, for a non-external selection symbol, deduplication is naturally
suppressed with link.exe and lld-link. However, this is fuzzy on ELF and I tend
to believe the spec creator has not thought about this use case (see D102973).
GNU ld and gold are still using the "signature is name based" interpretation.
So even if D102973 for ld.lld is accepted, for portability, a better approach is
to rename the comdat. A comdat with one single member is the common case,
leaving the comdat can waste (sizeof(Elf64_Shdr)+4*2) bytes, so we optimize by
deleting the comdat; otherwise we rename the comdat.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D103043
When memoized values for a SCEV expressions are dropped, we also
drop all BECounts that make use of the SCEV expression. This is done
by iterating over all the ExitNotTaken counts and (recursively)
checking whether they use the SCEV expression. If there are many
exits, this will take a lot of time.
This patch improves the situation by pre-computing a set of all
used operands, so that we can determine whether a certain BEInfo
needs to be invalidated using a simple set lookup. Will still need
to loop over all BEInfos though.
This makes for a mild improvement on non-degenerate cases:
https://llvm-compile-time-tracker.com/compare.php?from=b661a55a253f4a1cf5a0fbcb86e5ba7b9fb1387b&to=be1393f450e594c53f0ad7e62339a6bc831b16f6&stat=instructions
For the degenerate case from https://bugs.llvm.org/show_bug.cgi?id=50384,
for n=128 I'm seeing run time drop from 1.6s to 1.1s.
Differential Revision: https://reviews.llvm.org/D102796
- When memory intrinsics, such as memcpy, the attached scoped AA
metadata is not passed down to the backend. As a result, the backend
cannot schedule relevant memory operations around them following that
hint. In this patch, SelectionDAG is enhanced to propagate that
metadata (scoped AA only) when they are lowered into loads and stores.
Differential Revision: https://reviews.llvm.org/D102215
Currently, BPF only contains three relocations:
R_BPF_NONE for no relocation
R_BPF_64_64 for LD_imm64 and normal 64-bit data relocation
R_BPF_64_32 for call insn and normal 32-bit data relocation
Also .BTF and .BTF.ext sections contain symbols in allocated
program and data sections. These two sections reserved 32bit
space to hold the offset relative to the symbol's section.
When LLVM JIT is used, the LLVM ExecutionEngine RuntimeDyld
may attempt to resolve relocations for .BTF and .BTF.ext,
which we want to prevent. So we used R_BPF_NONE for such relocations.
This all works fine until when we try to do linking of
multiple objects.
. R_BPF_64_64 handling of LD_imm64 vs. normal 64-bit data
is different, so lld target->relocate() needs more context
to do a correct job.
. The same for R_BPF_64_32. More context is needed for
lld target->relocate() to differentiate call insn vs.
normal 32-bit data relocation.
. Since relocations in .BTF and .BTF.ext are set to R_BPF_NONE,
they will not be relocated properly when multiple .BTF/.BTF.ext
sections are merged by lld.
This patch intends to address this issue by adding additional
relocation kinds:
R_BPF_64_ABS64 for normal 64-bit data relocation
R_BPF_64_ABS32 for normal 32-bit data relocation
R_BPF_64_NODYLD32 for .BTF and .BTF.ext style relocations.
The old R_BPF_64_{64,32} semantics:
R_BPF_64_64 for LD_imm64 relocation
R_BPF_64_32 for call insn relocation
The existing R_BPF_64_64/R_BPF_64_32 mapping to numeric values
is maintained. They are the most common use cases for
bpf programs and we want to maintain backward compatibility
as much as possible.
ExecutionEngine RuntimeDyld BPF relocations are adjusted as well.
R_BPF_64_{ABS64,ABS32} relocations will be resolved properly and
other relocations will be ignored.
Two tests are added for RuntimeDyld. Not handling R_BPF_64_NODYLD32 in
RuntimeDyldELF.cpp will result in "Relocation type not implemented yet!"
fatal error.
FK_SecRel_4 usages in BPFAsmBackend.cpp and BPFELFObjectWriter.cpp
are removed as they are not triggered in BPF backend.
BPF backend used FK_SecRel_8 for LD_imm64 instruction operands.
Differential Revision: https://reviews.llvm.org/D102712
We really ought to support no_sanitize("coverage") in line with other
sanitizers. This came up again in discussions on the Linux-kernel
mailing lists, because we currently do workarounds using objtool to
remove coverage instrumentation. Since that support is only on x86, to
continue support coverage instrumentation on other architectures, we
must support selectively disabling coverage instrumentation via function
attributes.
Unfortunately, for SanitizeCoverage, it has not been implemented as a
sanitizer via fsanitize= and associated options in Sanitizers.def, but
rolls its own option fsanitize-coverage. This meant that we never got
"automatic" no_sanitize attribute support.
Implement no_sanitize attribute support by special-casing the string
"coverage" in the NoSanitizeAttr implementation. To keep the feature as
unintrusive to existing IR generation as possible, define a new negative
function attribute NoSanitizeCoverage to propagate the information
through to the instrumentation pass.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035
Reviewed By: vitalybuka, morehouse
Differential Revision: https://reviews.llvm.org/D102772
The change is currently NFC, but exploited by the depending D102954.
Code to handle constants is borrowed from the general implementation
of Value::doRAUW().
Differential Revision: https://reviews.llvm.org/D103051
I cannot find documentation on this CPU, and it
is not supported by the Arm Compiler 5 product either.
It was likely a mistake or a different name for the
"ep9312", which is an Arm based Cirrus Logic chip.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D103024
This patch introduces new operations on jitlink::Blocks: setMutableContent,
getMutableContent and getAlreadyMutableContent. The setMutableContent method
will set the block content data and size members and flag the content as
mutable. The getMutableContent method will return a mutable copy of the existing
content value, auto-allocating and populating a new mutable copy if the existing
content is marked immutable. The getAlreadyMutableMethod asserts that the
existing content is already mutable and returns it.
setMutableContent should be used when updating the block with totally new
content backed by mutable memory. It can be used to change the size of the
block. The argument value should *not* be shared with any other block.
getMutableContent should be used when clients want to modify the existing
content and are unsure whether it is mutable yet.
getAlreadyMutableContent should be used when clients want to modify the existing
content and know from context that it must already be immutable.
These operations reduce copy-modify-update boilerplate and unnecessary copies
introduced when clients couldn't me sure whether the existing content was
mutable or not.
Intrumentation callbacks are not made aware of LoopNest passes. From the loop pass manager, we can pass the outermost loop of the LoopNest to instrumentation in case of LoopNest passes.
The current patch made the change in two places in StandardInstrumentation.cpp. I will submit a proper patch where the OuterMostLoop is passed from the LoopPassManager to the call backs. That way we will avoid making changes at multiple places in StandardInstrumentation.cpp.
A testcase also will be submitted.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D102463
The findLoopPreheader function will currently not find a preheader if it
branches to multiple different loop headers. This patch adds an option
to relax that, allowing ARMLowOverheadLoops to process more loops
successfully. This helps with WhileLoopStart setup instructions that can
branch/fallthrough to the low overhead loop and to branch to a separate
loop from the same preheader (but I don't believe it is possible for
both loops to be low overhead loops).
Differential Revision: https://reviews.llvm.org/D102747
If we simplify values we sometimes end up with type mismatches. If the
value is a constant we can often cast it though to still allow
propagation. The logic is now put into a helper and it replaces some
ad hoc things we did before.
This also introduces the AA namespace for abstract attribute related
functions and types.
The state of AAPotentialValues tracks if undef is contained. It should
fold undef into the first non-undef value. However we missed a case
before. There was also a shadowing definition of two variables that
caused trouble. The test exposes both problems.
This makes it possible for targets to define their own MCObjectFileInfo.
This MCObjectFileInfo is then used to determine things like section alignment.
This is a follow up to D101462 and prepares for the RISCV backend defining the
text section alignment depending on the enabled extensions.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101921
The prevailing style does not add the message. The directive name is not useful
because the next line replicates the error line which includes the directive.
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D99149
If exiting using _Exit or ExitProcess, DLLs are still unloaded
cleanly before exiting, running destructors and other cleanup in those
DLLs. When the caller expects to exit without cleanup, running
destructors in some loaded DLLs (which can be either libLLVM.dll or
e.g. libc++.dll) can cause deadlocks occasionally.
This is an alternative to D102684.
Differential Revision: https://reviews.llvm.org/D102944
Keeping these bitfields from Block to Addressable allows them to be packed with
the bitfields at the end of Addressable, reducing the size of Block by eight
bytes.
The implementation and intent behind freeing the triple string here is the same
as LLVMGetDefaultTargetTriple (and any other owned c string returned from the C
API), so we should use LLVMDisposeMessage for to free the string for
consistency.
Patch by Mats Larsen -- thanks Mats!
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D102957
D88631 added initial support for:
- -mstack-protector-guard=
- -mstack-protector-guard-reg=
- -mstack-protector-guard-offset=
flags, and D100919 extended these to AArch64. Unfortunately, these flags
aren't retained for LTO. Make them module attributes rather than
TargetOptions.
Link: https://github.com/ClangBuiltLinux/linux/issues/1378
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D102742
These checks already exist as asserts when creating the corresponding
instruction. Anybody creating these instructions already need to take
care to not break these checks.
Move the checks for success/failure ordering in cmpxchg from the
verifier to the LLParser and BitcodeReader plus an assert.
Add some tests for cmpxchg ordering. The .bc files are created from the
.ll files with an llvm-as with these checks disabled.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102803
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D99149
This reapplies c0f3dfb9, which was reverted following the discovery of
crashes on linux kernel and chromium builds - these issues have since
been fixed, allowing this patch to re-land.
This reverts commit 4397b7095d.
[Debugify][Original DI] Test dbg var loc preservation
This is an improvement of [0]. This adds checking of
original llvm.dbg.values()/declares() instructions in
optimizations.
We have picked a real issue that has been found with
this (actually, picked one variable location missing
from [1] and resolved the issue), and the result is
the fix for that -- D100844.
Before applying the D100844, using the options from [0]
(but with this patch applied) on the compilation of GDB 7.11,
the final HTML report for the debug-info issues can be found
at [1] (please scroll down, and look for
"Summary of Variable Location Bugs"). After applying
the D100844, the numbers has improved a bit -- please take
a look into [2].
[0] https://llvm.org/docs/HowToUpdateDebugInfo.html#\
test-original-debug-info-preservation-in-optimizations
[1] https://djolertrk.github.io/di-check-before-adce-fix/
[2] https://djolertrk.github.io/di-check-after-adce-fix/
Differential Revision: https://reviews.llvm.org/D100845
The Unit test was failing because the pass from the test that
modifies the IR, in its runOnFunction() didn't return 'true',
so the expensive-check configuration triggered an assertion.
We can't declare unique_function that has in its arguments a reference to
a template type with an incomplete argument.
For instance, we can't declare unique_function<void(SmallVectorImpl<A>&)>
when A is forward declared.
This is because SFINAE will trigger a hard error in this case, when instantiating
IsSizeLessThanThresholdT with the incomplete type.
This patch specialize AdjustedParamT for references to remove this error.
Committed on behalf of: @math-fehr (Fehr Mathieu)
Reviewed By: DaniilSuchkov, yrouban
Previously APFloat::convertToDouble may be called only for APFloats that
were built using double semantics. Other semantics like single precision
were not allowed although corresponding numbers could be converted to
double without loss of precision. The similar restriction applied to
APFloat::convertToFloat.
With this change any APFloat that can be precisely represented by double
can be handled with convertToDouble. Behavior of convertToFloat was
updated similarly. It make the conversion operations more convenient and
adds support for formats like half and bfloat.
Differential Revision: https://reviews.llvm.org/D102671
.byte supports string, so if the whole byte list are printable,
we can actually print the string for readability and LIT tests maintainence.
.byte 'H,'e,'l,'l,'o,',,' ,'w,'o,'r,'l,'d
->
.byte "Hello, world"
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D102814
Remove the `nosync` attribute from the memory intrinsic definitions
(i.e. memset, memcpy, memmove).
Like native memory accesses, memory intrinsics can be volatile. This is
indicated by an immarg in the intrinsic call. All else equal, a volatile
memory intrinsic is `sync`, so we cannot annotate the intrinsic functions
themselves as `nosync`. The attributor and function-attr passes know to
take the volatile bit into account.
Since `nosync` is a default attribute, this means we have to stop using
the DefaultAttrIntrinsic tablegen class for memory intrinsics, and
specify all default attributes other than `nosync` explicitly.
Most of the test changes are trivial churn, but one test case
(in nosync.ll) was in fact incorrect before this change.
Differential Revision: https://reviews.llvm.org/D102295
In D98289#inline-939112 @dblaikie said:
Perhaps this could be more informative about what makes the range list
index of 0 invalid? "index 0 out of range of range list table (with
range list base 0xXXX) with offset entry count of XX (valid indexes
0-(XX-1))" Maybe that's too verbose/not worth worrying about since
this'll only be relevant to DWARF producers trying to debug their
DWARFv5, maybe no one will ever see this message in practice. Just
a thought.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102851
EarlyCSE cannot distinguish between floating point instructions and
constrained floating point intrinsics that are marked as running in the
default FP environment. Said intrinsics are supposed to behave exactly the
same as the regular FP instructions. Teach EarlyCSE to handle them in that
case.
Differential Revision: https://reviews.llvm.org/D99962
The use of `SelectionDAG::getSplatValue` isn't guaranteed to return a
type-legal splat value as it may implicitly extract a vector element
from another shuffle. It is not permitted to introduce an illegal type
when lowering shuffles.
This patch addresses the crash by adding a boolean flag to
`getSplatValue`, defaulting to false, which when set will ensure a
type-legal return value. If it is unable to do that it will fail to
return a splat value.
I've been through the existing uses of `getSplatValue` in other targets
and was unable to find a need or test cases showing a need to update
their uses. In some cases, the call is made during `LegalizeVectorOps`
which may still produce illegal scalar types. In other situations, the
illegally-typed splat value may be quickly patched up to a legal type
(such as any-extending the returned `extract_vector_elt` up to a legal
type) before `LegalizeDAG` notices.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D102687
__table_base is know 64-bit, since in LLVM it represents a function pointer offset
__table_base32 is a copy in wasm32 for use in elem init expr, since no truncation may be used there.
New reloc R_WASM_TABLE_INDEX_REL_SLEB64 added
Differential Revision: https://reviews.llvm.org/D101784
This is a follow-up of D102201. After some discussion, it is a better idea
to upgrade all invalid uses of alignment attributes on function return
values and parameters, not just limited to void function return types.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102726
Linker scripts might not handle COMDAT sections. SLSHardeing adds
new section for each __llvm_slsblr_thunk_xN. This new option allows
the generation of the thunks into the normal text section to handle these
exceptional cases.
,comdat or ,noncomdat can be added to harden-sls to control the codegen.
-mharden-sls=[all|retbr|blr],nocomdat.
Reviewed By: kristof.beyls
Differential Revision: https://reviews.llvm.org/D100546
This is an improvement of [0]. This adds checking of
original llvm.dbg.values()/declares() instructions in
optimizations.
We have picked a real issue that has been found with
this (actually, picked one variable location missing
from [1] and resolved the issue), and the result is
the fix for that -- D100844.
Before applying the D100844, using the options from [0]
(but with this patch applied) on the compilation of GDB 7.11,
the final HTML report for the debug-info issues can be found
at [1] (please scroll down, and look for
"Summary of Variable Location Bugs"). After applying
the D100844, the numbers has improved a bit -- please take
a look into [2].
[0] https://llvm.org/docs/HowToUpdateDebugInfo.html\
[1] https://djolertrk.github.io/di-check-before-adce-fix/
[2] https://djolertrk.github.io/di-check-after-adce-fix/
Differential Revision: https://reviews.llvm.org/D100845
llvm::Any::TypeId::Id relies on the uniqueness of the address of a static
variable defined in a template function. hidden visibility implies vague linkage
for that variable, which does not guarantee the uniqueness of the address across
a binary and a shared library. This totally breaks the implementation of
llvm::Any.
Ideally, setting visibility to llvm::Any::TypeId::Id should be enough,
unfortunately this doesn't work as expected and we lack time (before 12.0.1
release) to understand why setting the visibility to llvm::Any does work.
See https://gcc.gnu.org/wiki/Visibility and
https://gcc.gnu.org/onlinedocs/gcc/Vague-Linkage.html
for more information on that topic.
Differential Revision: https://reviews.llvm.org/D101972
No verifier changes needed, the verifier currently doesn't check that
the pointer operand's pointee type matches the GEP type. There is a
similar check in GetElementPtrInst::Create() though.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102744
Summary:
Currently, only `OptimizationRemarks` can be emitted using a Function.
Add constructors to allow this for `OptimizationRemarksAnalysis` and
`OptimizationRemarkMissed` as well.
Reviewed By: jdoerfert thegameg
Differential Revision: https://reviews.llvm.org/D102784
For source-based coverage, the frontend sets the counter IDs and the
constraints of counter IDs is not defined. For e.g., the Rust frontend
until recently had a reserved counter #0
(https://github.com/rust-lang/rust/pull/83774). Rust coverage
instrumentation also creates counters on edges in addition to basic
blocks. Some functions may have more counters than regions.
This breaks an assumption in CoverageMapping.cpp where the number of
counters in a function is assumed to be bounded by the number of
regions:
Counts.assign(Record.MappingRegions.size(), 0);
This assumption causes CounterMappingContext::evaluate() to fail since
there are not enough counter values created in the above call to
`Counts.assign`. Consequently, some uncovered functions are not
reported in coverage reports.
This change walks a Function's CoverageMappingRecord to find the maximum
counter ID, and uses it to initialize the counter array when instrprof
records are missing for a function in sparse profiles.
Differential Revision: https://reviews.llvm.org/D101780
lld/MachO/Driver.cpp and lld/MachO/SyntheticSections.cpp include
llvm/Config/config.h which doesn't exist when building standalone lld.
This patch replaces llvm/Config/config.h include with llvm/Config/llvm-config.h
just like it is in lld/ELF/Driver.cpp and HAVE_LIBXAR with LLVM_HAVE_LIXAR and
moves LLVM_HAVE_LIBXAR from config.h to llvm-config.h
Also it adds LLVM_HAVE_LIBXAR to LLVMConfig.cmake and links liblldMachO2.so
with XAR_LIB if LLVM_HAVE_LIBXAR is set.
Differential Revision: https://reviews.llvm.org/D102084
The operation of some VP intrinsics do/will not map to regular
instruction opcodes. Returning 'None' seems more intuitive here than
'Instruction::Call'.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D102778
- This patch (is one in a series of patches) which introduces HLASM Parser support (for the first parameter of inline asm statements) to LLVM ([[ https://lists.llvm.org/pipermail/llvm-dev/2021-January/147686.html | main RFC here ]])
- This patch in particular introduces HLASM Parser support for Z machine instructions.
- The approach taken here was to subclass `AsmParser`, and make various functions and variables as "protected" wherever appropriate.
- The `HLASMAsmParser` class overrides the `parseStatement` function. Two new private functions `parseAsHLASMLabel` and `parseAsMachineInstruction` are introduced as well.
The general syntax is laid out as follows (more information available in [[ https://www.ibm.com/support/knowledgecenter/SSENW6_1.6.0/com.ibm.hlasm.v1r6.asm/asmr1023.pdf | HLASM V1R6 Language Reference Manual ]] - Chapter 2 - Instruction Statement Format):
```
<TokA><spaces.*><TokB><spaces.*><TokC><spaces.*><TokD>
```
1. TokA is referred to as the Name Entry. This token is optional
2. TokB is referred to as the Operation Entry. This token is mandatory.
3. TokC is referred to as the Operand Entry. This token is mandatory
4. TokD is referred to as the Remarks Entry. This token is optional
- If TokA is provided, then we either parse TokA as a possible comment or as a label (Name Entry), Tok B as the Operation Entry and so on.
- If TokA is not provided (i.e. we have one or more spaces and then the first token), then we will parse the first token (i.e TokB) as a possible Z machine instruction, TokC as the operands to the Z machine instruction and TokD as a possible Remark field
- TokC (Operand Entry), no spaces are allowed between OperandEntries. If a space occurs it is classified as an error.
- TokD if provided is taken as is, and emitted as a comment.
The following additional approach was examined, but not taken:
- Adding custom private only functions to base AsmParser class, and only invoking them for z/OS. While this would eliminate the need for another child class, these private functions would be of non-use to every other target. Similarly, adding any pure virtual functions to the base MCAsmParser class and overriding them in AsmParser would also have the same disadvantage.
Testing:
- This patch doesn't have tests added with it, for the sole reason that MCStreamer Support and Object File support hasn't been added for the z/OS target (yet). Hence, it's not possible generate code outright for the z/OS target. They are in the process of being committed / process of being worked on.
- Any comments / feedback on how to combat this "lack of testing" due to other missing required features is appreciated.
Reviewed By: Kai, uweigand
Differential Revision: https://reviews.llvm.org/D98276
Intriniscs reading or writing the FFR register need to model the fact
there is additional state being read/wrtten.
Model this state as inaccessible memory.
* setffr => write inaccessiblememonly
* rdffr => read inaccessiblememonly
* ldff* => read arg memory, write inaccessiblemem
* ldnf => read arg memory, write inaccessiblemem
There doesn't seem to be a need to support recursive locking,
and a recursive mutex is unnecessarily inefficient.
Differential Revision: https://reviews.llvm.org/D102486
This patch adds a new option to the LoopVectorizer to control how
scalable vectors can be used.
Initially, this suggests three levels to control scalable
vectorization, although other more aggressive options can be added in
the future.
The possible options are:
- Disabled: Disables vectorization with scalable vectors.
- Enabled: Vectorize loops using scalable vectors or fixed-width
vectors, but favors fixed-width vectors when the cost
is a tie.
- Preferred: Like 'Enabled', but favoring scalable vectors when the
cost-model is inconclusive.
Reviewed By: paulwalker-arm, vkmr
Differential Revision: https://reviews.llvm.org/D101945
To bring D99599's implementation in line with the existing
PrintPassInstrumentation, and to fix a FIXME, add more customizability
to PrintPassInstrumentation.
Introduce three new options. The first takes over the existing
"-debug-pass-manager-verbose" cl::opt.
The second and third option are specific to -fdebug-pass-structure. They
allow indentation, and also don't print analysis queries.
To avoid more golden file tests than necessary, prune down the
-fdebug-pass-structure tests.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D102196
This patch transforms the sequence
lea (reg1, reg2), reg3
sub reg3, reg4
to two sub instructions
sub reg1, reg4
sub reg2, reg4
Similar optimization can also be applied to LEA/ADD sequence.
The modifications to TwoAddressInstructionPass is to ensure the operands of ADD
instruction has expected order (the dest register of LEA should be src register
of ADD).
Differential Revision: https://reviews.llvm.org/D101970
This patch implements first part of Flow Sensitive SampleFDO (FSAFDO).
It has the following changes:
(1) disable current discriminator encoding scheme,
(2) new hierarchical discriminator for FSAFDO.
For this patch, option "-enable-fs-discriminator=true" turns on the new
functionality. Option "-enable-fs-discriminator=false" (the default)
keeps the current SampleFDO behavior. When the fs-discriminator is
enabled, we insert a flag variable, namely, llvm_fs_discriminator, to
the object. This symbol will checked by create_llvm_prof tool, and used
to generate a profile with FS-AFDO discriminators enabled. If this
happens, for an extbinary format profile, create_llvm_prof tool
will add a flag to profile summary section.
Differential Revision: https://reviews.llvm.org/D102246
In many cases it is helpful to know at what address the resolved function starts.
This patch adds a new StartAddress member to the DILineInfo structure.
Reviewed By: jhenderson, dblaikie
Differential Revision: https://reviews.llvm.org/D102316
For opaque pointers, we're trying to avoid uses of
PointerType::getElementType().
A couple of ISel places use PointerType::getElementType(). Some of these
are easy to fix by using ArgListEntry's indirect types.
The inalloca type wasn't stored there, as opposed to preallocated and
byval which have their indirect types available, so add it and use it.
This is a reland after an MSan fix in D102667.
Differential Revision: https://reviews.llvm.org/D101713
Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserved, they won't be invalidated unless their analyses are.
SCEVAAResults was the one exception to this, it was treated like a
typical analysis result. Make it like the others and don't invalidate
unless SCEV is invalidated.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D102032
Don't check that types match when the pointer operand is an opaque
pointer.
I would separate the Assembler and Verifier changes, but
verify-uselistorder in the Assembler test ends up running the verifier.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102450
Handle PDB writing errors like any other error in LLD: emit an error and
continue. This allows the linker to print timing data and summary data
after linking, which can be helpful for finding PDB size problems. Also
report how large the file would have been.
Example output:
lld-link: error: Output data is larger than 4 GiB. File size would have been 6,937,108,480
lld-link: error: failed to write PDB file ./chrome.dll.pdb
Summary
--------------------------------------------------------------------------------
33282 Input OBJ files (expanded from all cmd-line inputs)
4 PDB type server dependencies
0 Precomp OBJ dependencies
33396931 Input type records
... snip ...
Input File Reading: 59756 ms ( 45.5%)
GC: 7500 ms ( 5.7%)
ICF: 3336 ms ( 2.5%)
Code Layout: 6329 ms ( 4.8%)
PDB Emission (Cumulative): 46192 ms ( 35.2%)
Add Objects: 27609 ms ( 21.0%)
Type Merging: 16740 ms ( 12.8%)
Symbol Merging: 10761 ms ( 8.2%)
Publics Stream Layout: 9383 ms ( 7.1%)
TPI Stream Layout: 1678 ms ( 1.3%)
Commit to Disk: 3461 ms ( 2.6%)
--------------------------------------------------
Total Link Time: 131244 ms (100.0%)
Differential Revision: https://reviews.llvm.org/D102713
This patch introduces functionality used by BOLT when
re-linking the final binary. It adds to MemoryManager a new member
function allowStubAllocation to control whether this MemoryManager
supports increasing code size with stubs or not. Since BOLT can
rewrite some files in-place, it needs to avoid stub insertion done
by the linker. This patch also introduces allowsZeroSymbols to the
JITSymbolResolver class, enabling us to finish a link successfully
even when some symbols resolve to the value zero. When rewriting a
binary, sometimes we do need to resolve a target to zero in case
the input binary calls address zero and we want to be bug
compatible. We also expose reassignSectionAddress as it is used by
BOLT.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D97898
Use existing KnownBits helpers from KnownBits.h to simplify G_ICMPs.
E.g.
x == x -> true
x != x -> false
load(x) > 1 -> true (when the load is known to be greater than 1)
And so on.
Differential Revision: https://reviews.llvm.org/D102542
A long time ago LLDB wanted to start using StringRef instead of
C-Strings/ConstString but was blocked by the StringRef(const char *) ctor
asserting that the C-string isn't a nullptr. To workaround this, D24697
introduced a special function called withNullAsEmpty and that's what LLDB (and
only LLDB) started to use to build StringRefs from C-strings.
A bit later it seems that withNullAsEmpty was declared too awkward to use and
instead the assert in the StringRef constructor got removed (see D24904). The
rest of LLDB was then converted to StringRef by just calling the now perfectly
usable implicit constructor.
However, it seems that the original approach with withNullAsEmpty was never
touched again since then and now just exists as a function in StringRef that
is only used in a few places in LLDB.
I removed the few uses of withNullAsEmpty in D102597 and this patch removes
the function itself. Calling the implicit StringRef(const char *) constructor
is the preferred way of doing this today.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D102599
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.
This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).
This is the first step of this project; only X86_64 target is enabled in this patch.
Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -fasync-exceptions,
plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.
NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.
The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow
three rules:
* First, no exception can move in or out of _try region., i.e., no "potential
faulty instruction can be moved across _try boundary.
* Second, the order of exceptions for instructions 'directly' under a _try
must be preserved (not applied to those in callees).
* Finally, global states (local/global/heap variables) that can be read
outside of _try region must be updated in memory (not just in register)
before the subsequent exception occurs.
The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also
be handled properly by an upstream SEH _try-handler or a C++ catch(...).
As such, when that happens in the middle of an object's life scope, the dtor
must be invoked the same way as C++ Synchronous Exception during unwinding
process.
Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.
This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.
One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:
A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing
SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others. If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:
Warning: jump bypasses variable with a non-trivial destructor
In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.
Implementation:
Part-1: Clang implementation described below.
Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end().
_scope_begin() is immediately added after ctor() is called and EHStack is pushed.
So it must be an invoke, not a call. With that it's also guaranteed an
EH-cleanup-pad is created regardless whether there exists a call in this scope.
_scope_end is added before dtor(). These two intrinsics make the computation of
Block-State possible in downstream code gen pass, even in the presence of
ctor/dtor inlining.
Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark
_try boundary and to prevent from exceptions being moved across _try boundary.
All memory instructions inside a _try are considered as 'volatile' to assure
2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's
acceptable as the amount of code directly under _try is very small.
Part-2 (will be in Part-2 patch): LLVM implementation described below.
For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).
For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.
The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.
Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.htmlhttps://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html
Differential Revision: https://reviews.llvm.org/D80344/new/
Similar versions of these already exist, this effectively just just
factors them out into STLExtras. I plan to use these in future patches.
Differential Revision: https://reviews.llvm.org/D100672
Follow up to D88631 but for aarch64; the Linux kernel uses the command
line flags:
1. -mstack-protector-guard=sysreg
2. -mstack-protector-guard-reg=sp_el0
3. -mstack-protector-guard-offset=0
to use the system register sp_el0 for the stack canary, enabling the
kernel to have a unique stack canary per task (like a thread, but not
limited to userspace as the kernel can preempt itself).
Address pr/47341 for aarch64.
Fixes: https://github.com/ClangBuiltLinux/linux/issues/289
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen
Differential Revision: https://reviews.llvm.org/D100919
This patch contains the bare minimum to run the new Pass Manager from the LLVM-C APIs. It does not feature PGOOptions, PassPlugins or Debugify in its current state. Bugzilla: PR48499
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D102136
This patch contains the bare minimum to run the new Pass Manager from the LLVM-C APIs. It does not feature PGOOptions, PassPlugins or Debugify in its current state. Bugzilla: PR48499
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D102136
Adding lowering support for bitreverse.
Previously, lowering bitreverse would expand it into a series of other instructions. This patch makes it so this produces a single rbit instruction instead.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D102397
This fixes https://bugs.llvm.org/show_bug.cgi?id=50370,
which reports a yet another endless combine loop,
this one regressed from 554b1bced3,
which fixed yet another endless combine loop (PR50308)
This code had fallen into the very typical pitfall of forgetting
that constant expressions exist, and they aren't free to invert,
because the `not` won't be absorbed by the "constant",
but will remain a (constant) expression...
Swift's new concurrency features are going to require guaranteed tail calls so
that they don't consume excessive amounts of stack space. This would normally
mean "tailcc", but there are also Swift-specific ABI desires that don't
naturally go along with "tailcc" so this adds another calling convention that's
the combination of "swiftcc" and "tailcc".
Support is added for AArch64 and X86 for now.
With prelink inlining, pseudo probes with same ID can come from different inline contexts. Such probes should not share samples and their factors should be fixed up separately.
I'm seeing 0.3% speedup for SPEC2017 overall. Benchmark 631.deepsjeng_s benefits the most, about 4%.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D102429
ScheduleDAGFast.cpp is compiled to object file, but the ScheduleDAGFast
object file isn't linked into clang executable file as no symbol is
referred by outside. Add calling to createXxx of ScheduleDAGFast.cpp,
then the ScheduleDAGFast object file will be linked into clang
executable file. The static RegisterScheduler will register scheduler
fast and linearize at clang boot time.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D101601
This adds a simple fold into codegenprepare that converts comparison of
branches towards comparison with zero if possible. For example:
%c = icmp ult %x, 8
br %c, bla, blb
%tc = lshr %x, 3
becomes
%tc = lshr %x, 3
%c = icmp eq %tc, 0
br %c, bla, blb
As a first order approximation, this can reduce the number of
instructions needed to perform the branch as the shift is (often) needed
anyway. At the moment this does not effect very much, as llvm tends to
prefer the opposite form. But it can protect against regressions from
commits like rG9423f78240a2.
Simple cases of Add and Sub are added along with Shift, equally as the
comparison to zero can often be folded with cpsr flags.
Differential Revision: https://reviews.llvm.org/D101778
This patch introduces source loading and pruning functions.
It will allow to use the DWARF embedded source and use the same code for JSON printout.
No functional changes.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102539
This patch adds support for GCC's -fstack-usage flag. With this flag, a stack
usage file (i.e., .su file) is generated for each input source file. The format
of the stack usage file is also similar to what is used by GCC. For each
function defined in the source file, a line with the following information is
produced in the .su file.
<source_file>:<line_number>:<function_name> <size_in_byte> <static/dynamic>
"Static" means that the function's frame size is static and the size info is an
accurate reflection of the frame size. While "dynamic" means the function's
frame size can only be determined at run-time because the function manipulates
the stack dynamically (e.g., due to variable size objects). The size info only
reflects the size of the fixed size frame objects in this case and therefore is
not a reliable measure of the total frame size.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D100509
These checks are not specific to the instruction based variant of
isPotentiallyReachable(), they are equally valid for the basic
block based variant. Move them there, to make sure that switching
between the instruction and basic block variants cannot introduce
regressions.
Adds the ability to pass MCRegisterInfo to dump_pretty and to the print functions,
so that if present, target specific enums names are printed instead of enum values.
GlobalVariables are Constants, yet should not unconditionally be
considered true for __builtin_constant_p.
Via the LangRef
https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic:
This intrinsic generates no code. If its argument is known to be a
manifest compile-time constant value, then the intrinsic will be
converted to a constant true value. Otherwise, it will be converted
to a constant false value.
In particular, note that if the argument is a constant expression
which refers to a global (the address of which _is_ a constant, but
not manifest during the compile), then the intrinsic evaluates to
false.
Move isManifestConstant from ConstantFolding to be a method of
Constant so that we can reuse the same logic in
LowerConstantIntrinsics.
pr/41459
Reviewed By: rsmith, george.burgess.iv
Differential Revision: https://reviews.llvm.org/D102367
Previously, we already used BatchAA for individual simple pointer
dependency queries. This extends BatchAA usage for the non-local
case, so that only one BatchAA instance is used for all blocks,
instead of one instance per block.
Use of BatchAA is safe as IR cannot be modified during a MemDep
query.
This is not expected to have any practical compile-time effect,
as the alias() calls inside callCapturesBefore() are rare. This
should still be supported for API completeness, and might be
useful for reachability caching.
This extends any frame record created in the function to include that
parameter, passed in X22.
The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:
* 0b0000 => a normal, non-extended record with just [FP, LR]
* 0b0001 => the extended record [X22, FP, LR]
* 0b1111 => kernel space, and a non-extended record.
All other values are currently reserved.
If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.
There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
This is separate from (but builds on) the support added in ec6b71df70 for
emitting LinkGraphs in the context of an active materialization. This commit
makes LinkGraphs a first-class data structure with features equivalent to
object files within ObjectLinkingLayer.
The opaque pointer type is essentially just a normal pointer type with a
null pointee type.
This also adds support for the opaque pointer type to the bitcode
reader/writer, as well as to textual IR.
To avoid confusion with existing pointer types, we disallow creating a
pointer to an opaque pointer.
Opaque pointer types should not be widely used at this point since many
parts of LLVM still do not support them. The next steps are to add some
very simple use cases of opaque pointers to make sure they work, then
start pretending that all pointers are opaque pointers and see what
breaks.
https://lists.llvm.org/pipermail/llvm-dev/2021-May/150359.html
Reviewed By: dblaikie, dexonsmith, pcc
Differential Revision: https://reviews.llvm.org/D101704
I've taken the following steps to add unwinding support from inline assembly:
1) Add a new `unwind` "attribute" (like `sideeffect`) to the asm syntax:
```
invoke void asm sideeffect unwind "call thrower", "~{dirflag},~{fpsr},~{flags}"()
to label %exit unwind label %uexit
```
2.) Add Bitcode writing/reading support + LLVM-IR parsing.
3.) Emit EHLabels around inline assembly lowering (SelectionDAGBuilder + GlobalISel) when `InlineAsm::canThrow` is enabled.
4.) Tweak InstCombineCalls/InlineFunction pass to not mark inline assembly "calls" as nounwind.
5.) Add clang support by introducing a new clobber: "unwind", which lower to the `canThrow` being enabled.
6.) Don't allow unwinding callbr.
Reviewed By: Amanieu
Differential Revision: https://reviews.llvm.org/D95745
We want it to be available in analyzes so that we could use the
CodeGen notion in middle-end passes (for example, to check if
a GC may free some particular pointer).
This is a preparatory patch that simply moves the files around.
Note: if this causes some build issues, this patch must just be reverted.
Differential Revision: https://reviews.llvm.org/D100557
Reviewed By: reames
The transferDefinedSymbol operation updates a Symbol's target block, offset,
and size. This can be convenient when you want to redefine the content of some
symbol(s) pointing at a block, while retaining the original block in the graph.
Add new type of tree node for `InsertElementInst` chain forming vector.
These instructions could be either removed, or replaced by shuffles during
vectorization and we can add this node to cost model, so naturally estimating
their cost, getting rid of `CompensateCost` tricks and reducing further work
for InstCombine. This fixes PR40522 and PR35732 in a natural way. Also this
patch is the first step towards revectorization of partially vectorization
(to fix PR42022 completely). After adding inserts to tree the next step is
to add vector instructions there (for instance, to merge `store <2 x float>`
and `store <2 x float>` to `store <4 x float>`).
Fixes PR40522 and PR35732.
Differential Revision: https://reviews.llvm.org/D98714
getVectorNumElements() returns a value for scalable vectors
without any warning so it is effectively getVectorMinNumElements().
By renaming it and making getVectorNumElements() forward to
it, we can insert a check for scalable vectors into getVectorNumElements()
similar to EVT. I didn't do that in this patch because there are still more
fixes needed, but I was able to temporarily do it and passed the RISCV
lit tests with these changes.
The changes to isPow2VectorType and getPow2VectorType are copied from EVT.
The change to TypeInfer::EnforceSameNumElts reduces the size of AArch64's isel table.
We're now considering SameNumElts to require the scalable property to match which
removes some unneeded type checks.
This was motivated by the bug I fixed yesterday in 80b9510806
Reviewed By: frasercrmck, sdesmalen
Differential Revision: https://reviews.llvm.org/D102262
The bug (PR50227, affecting COFF) that caused the revert in
6f5670a4c3 has been fixed in
382c505d9c now, so it should be safe
to reenable the pass for that target (and ELF).
In PR50227 it's also mentioned that the same pass seems to cause
problems on aarch64 on darwin, so leaving it disabled there for now.
Previous crashes caused by this patch were the result of machine
subregisters being incorrectly handled in updateDbgUsersToReg; this has
been fixed by using RegUnits to determine overlapping registers, instead
of using the register values directly.
Differential Revision: https://reviews.llvm.org/D101523
This reverts commit 7ca26c5fa2.
Currently the ValueHandler handles both selecting the type and
location for arguments, as well as inserting instructions needed to
handle them. Split this so that the determination of the argument
handling is independent of the function state. Currently the checks
for tail call compatibility do not follow the full assignment logic,
so it misses cases where arguments require nontrivial legalization.
This should help avoid targets ending up in a buggy state where the
argument evaluation may change in different contexts.
We can handle the distinction easily enough in the generic code, and
this makes it easier to abstract the selection of type/location from
the code to insert code.
When making compilation relocatable, for example in distributed
compilation scenarios, we want to set compilation dir to a relative
value like `.` but this presents a problem when generating reports
because if the file path is relative as well, for example `..`, you
may end up writing files outside of the output directory.
This change introduces a flag that allows overriding the compilation
directory that's stored inside the profile with a different value that
is absolute.
Differential Revision: https://reviews.llvm.org/D100232
These can be used to create eh-frame section fixing passes outside the usual
linker pipeline, which can be useful for tests and tools that just want to
verify or dump graphs.
This patch adds JSON output style to llvm-symbolizer to better support CLI automation by providing a machine readable output.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D96883
This change was originally landed in: 5000a1b4b9
It was reverted in: 061e071d8c
This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.
Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)
Like the ELF linker merging is only performed at `-O1` and above.
This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)
Differential Revision: https://reviews.llvm.org/D97657
This patch adds support for Darwin's libsystem math vector functions to
TLI. Darwin's libsystem provides a range of vector functions for libm
functions.
This initial patch only adds the 2 x double and 4 x float versions,
which are available on both X86 and ARM64. On X86, wider vector versions
are supported as well.
Reviewed By: jroelofs
Differential Revision: https://reviews.llvm.org/D101856
This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.
Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)
Like the ELF linker merging is only performed at `-O1` and above.
This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)
Differential Revision: https://reviews.llvm.org/D97657
For opaque pointers, we're trying to avoid uses of
PointerType::getElementType().
A couple of ISel places use PointerType::getElementType(). Some of these
are easy to fix by using ArgListEntry's indirect types.
The inalloca type wasn't stored there, as opposed to preallocated and
byval which have their indirect types available, so add it and use it.
Differential Revision: https://reviews.llvm.org/D101713
This is better no-functional-change-intended than the 1st attempt.
As noted in D102002, there were at least 2 diffs that went
unchecked in pass manager regressions tests: different pass
parameters (SimplifyCFG) and an extension point/callback.
Those should be lifted from the original code blocks correctly
now.
This reverts commit fefcb1f878.
It was supposed to be NFC, but as noted in the post-commit
comments in D102002, that was not true: SimplifyCFG uses
different parameters and there's a difference in an
extension point / callback.
A ConstantAggregateZero may be created from a scalable vector type.
However, it still assumed fixed number of elements when queried for
them. This patch changes ConstantAggregateZero to correctly report its
element count.
This change fixes a couple of issues. Firstly, it fixes a crash in
Constant::getUniqueValue when called on a scalable-vector
zeroinitializer constant.
Secondly, it fixes a latent bug in GlobalISel's IRTranslator in which
translating a scalable-vector zeroinitializer would hit the assertion in
ConstantAggregateZero::getNumElements when casting to a FixedVectorType,
rather than reporting an error more gracefully. This is currently
hypothetical as the IRTranslator has deeper issues preventing the use of
scalable vector types.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D102082
When using parallel loop construct, the OpenMP specification allows for
guided, auto and runtime as scheduling variants (as well as static and
dynamic which are already supported).
This adds the translation from MLIR to LLVM-IR for these scheduling
variants.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D101435
Generalizing this API allows work to be distributed more evenly. In particular,
query callbacks can now be dispatched (rather than running immediately on the
thread that satisfied the query). This avoids the pathalogical case where an
operation on one thread satisfies many queries simultaneously, causing large
amounts of work to be run on that thread while other threads potentially sit
idle.
This patch lifts the restriction on the number of read/write registers for a
move elimination candidate. With this patch, move elimination candidates with
exactly two reads and two writes are treated like register swap operations for
the purpose of move elimination.
This patch currently doesn't affect any upstream model. However, it should help
unblock the progress on PR50258.
Printing pass manager invocations is fairly verbose and not super
useful.
This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.
This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D101797
We never bothered to have a separate set of combines for -O0 in the prelegalizer
before. This results in some minor performance hits for a mode where performance
isn't a concern (although not regressing code size significantly is still preferable).
This also removes the CSE option since we don't need it for -O0.
Through experiments, I've arrived at a set of combines that gets the most code
size improvement at -O0, while reducing the amount of time spent in the combiner
by around 35% give or take.
Differential Revision: https://reviews.llvm.org/D102038
We're trying to move DebugLogging into instrumentation, rather than
being part of PassManagers/AnalysisManagers.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D102093
induction variable to be perfect
This patch allow more conditional branches to be considered as loop
guard, and so more loop nests can be considered perfect.
Reviewed By: bmahjour, sidbav
Differential Revision: https://reviews.llvm.org/D94717
Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts.
Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication.
I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to).
NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly.
Differential Revision: https://reviews.llvm.org/D101987
This patch modifies updateDbgUsersToReg to properly handle
DBG_VALUE_LIST instructions, by replacing the hard-coded operand indices
(i.e. getOperand(0)) with the more general getDebugOperandsForReg(), and
updating the register for all matching operands.
Differential Revision: https://reviews.llvm.org/D101523
Serialize ScavengeFI from SIMachineFunctionInfo into yaml.
ScavengeFI is not used outside of the PrologEpilogInserter,
so this shouldn't change anything.
Differential Revision: https://reviews.llvm.org/D101367
The CGSCC pass manager interplay with the FunctionAnalysisManagerCGSCCProxy is 'special' in the sense that the former will rerun the latter if there are changes to a SCC structure; that being said, some of the functions in the SCC may be unchanged. In that case, the function simplification pipeline will be re-run, which impacts compile time[1].
This patch allows the function simplification pipeline be skipped if it was already run and the function was not modified since.
The behavior is currently disabled by default. This is because, currently, the rerunning of the function simplification pipeline on an unchanged function may still result in changes. The patch simplifies investigating and fixing those cases where repeated function pass runs do actually positively impact code quality, while offering an easy workaround for those impacted negatively by compile time regressions, and not impacting mainline scenarios.
[1] A [[ http://llvm-compile-time-tracker.com/compare.php?from=eb37d3546cd0c6e67798496634c45e501f7806f1&to=ac722d1190dc7bbdd17e977ef7ec95e69eefc91e&stat=instructions | compile time tracker ]] run with the option enabled.
Differential Revision: https://reviews.llvm.org/D98103
Adds support for scalable vectorization of loops containing first-order recurrences, e.g:
```
for(int i = 0; i < n; i++)
b[i] = a[i] + a[i - 1]
```
This patch changes fixFirstOrderRecurrence for scalable vectors to take vscale into
account when inserting into and extracting from the last lane of a vector.
CreateVectorSplice has been added to construct a vector for the recurrence, which
returns a splice intrinsic for scalable types. For fixed-width the behaviour
remains unchanged as CreateVectorSplice will return a shufflevector instead.
The tests included here are the same as test/Transform/LoopVectorize/first-order-recurrence.ll
Reviewed By: david-arm, fhahn
Differential Revision: https://reviews.llvm.org/D101076
This patch fixes various issues with our prior `declare target` handling
and extends it to support `omp begin declare target` as well.
This started with PR49649 in mind, trying to provide a way for users to
avoid the "ref" global use introduced for globals with internal linkage.
From there it went down the rabbit hole, e.g., all variables, even
`nohost` ones, were emitted into the device code so it was impossible to
determine if "ref" was needed late in the game (based on the name only).
To make it really useful, `begin declare target` was needed as it can
carry the `device_type`. Not emitting variables eagerly had a ripple
effect. Finally, the precedence of the (explicit) declare target list
items needed to be taken into account, that meant we cannot just look
for any declare target attribute to make a decision. This caused the
handling of functions to require fixup as well.
I tried to clean up things while I was at it, e.g., we should not "parse
declarations and defintions" as part of OpenMP parsing, this will always
break at some point. Instead, we keep track what region we are in and
act on definitions and declarations instead, this is what we do for
declare variant and other begin/end directives already.
Highlights:
- new diagnosis for restrictions specificed in the standard,
- delayed emission of globals not mentioned in an explicit
list of a declare target,
- omission of `nohost` globals on the host and `host` globals on the
device,
- no explicit parsing of declarations in-between `omp [begin] declare
variant` and the corresponding end anymore, regular parsing instead,
- precedence for explicit mentions in `declare target` lists over
implicit mentions in the declaration-definition-seq, and
- `omp allocate` declarations will now replace an earlier emitted
global, if necessary.
---
Notes:
The patch is larger than I hoped but it turns out that most changes do
on their own lead to "inconsistent states", which seem less desirable
overall.
After working through this I feel the standard should remove the
explicit declare target forms as the delayed emission is horrible.
That said, while we delay things anyway, it seems to me we check too
often for the current status even though that is often not sufficient to
act upon. There seems to be a lot of duplication that can probably be
trimmed down. Eagerly emitting some things seems pretty weak as an
argument to keep so much logic around.
---
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D101030
This can be useful for clients constructing custom JIT stacks: If the C API
for your custom stack exposes API to obtain a reference to an object layer
(e.g. LLVMOrcLLJITGetObjLinkingLayer) then the newly added
LLVMOrcObjectLayerAddObjectFile and LLVMOrcObjectLayerAddObjectFileWithRT
functions can be used to add objects directly to that layer.
This change enables emitting CFI unwind information for debugging purpose
for targets with MCAsmInfo::ExceptionsType == ExceptionHandling::None.
Currently generating CFI unwind information is entangled with supporting
the exceptions, even when AsmPrinter explicitly recognizes that the unwind
tables are being generated as debug information.
In fact, the unwind information is not generated even if we specify
--force-dwarf-frame-section, unless exceptions are enabled. The LIT test
llvm/test/CodeGen/AMDGPU/debug_frame.ll demonstrates this behavior.
Enable this option for AMDGPU to prepare for future patches which add
complete CFI support.
Reviewed By: dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D78778
Unfortunately the current call lowering code is built on top of the
legacy MVT/DAG based code. However, GlobalISel was not using it the
same way. In short, the DAG passes legalized types to the assignment
function, and GlobalISel was passing the original raw type if it was
simple.
I do believe the DAG lowering is conceptually broken since it requires
picking a type up front before knowing how/where the value will be
passed. This ends up being a problem for AArch64, which wants to pass
i1/i8/i16 values as a different size if passed on the stack or in
registers.
The argument type decision is split across 3 different places which is
hard to follow. SelectionDAG builder uses
getRegisterTypeForCallingConv to pick a legal type, tablegen gives the
illusion of controlling the type, and the target may have additional
hacks in the C++ part of the call lowering. AArch64 hacks around this
by not using the standard AnalyzeFormalArguments and special casing
i1/i8/i16 by looking at the underlying type of the original IR
argument.
I believe people have generally assumed the calling convention code is
processing the original types, and I've discovered a number of dead
paths in several targets.
x86 actually relies on the opposite behavior from AArch64, and relies
on x86_32 and x86_64 sharing calling convention code where the 64-bit
cases implicitly do not work on x86_32 due to using the pre-legalized
types.
AMDGPU targets without legal i16/f16 have always used a broken ABI
that promotes to i32/f32. GlobalISel accidentally fixed this to be the
ABI we should have, but this fixes it so we're using the worse ABI
that is compatible with the DAG. Ideally we would fix the DAG to match
the old GlobalISel behavior, but I don't wish to fight that battle.
A new native GlobalISel call lowering framework should let the target
process the incoming types directly.
CCValAssigns select a "ValVT" and "LocVT" but the meanings of these
aren't entirely clear. Different targets don't use them consistently,
even within their own call lowering code. My current belief is the
intent was "ValVT" is supposed to be the legalized value type to use
in the end, and and LocVT was supposed to be the ABI passed type
(which is also legalized).
With the default CCState::Analyze functions always passing the same
type for these arguments, these only differ when the TableGen part of
the lowering decide to promote the type from one legal type to
another. AArch64's i1/i8/i16 hack ends up inverting the meanings of
these values, so I had to add an additional hack to let the target
interpret how large the argument memory is.
Since targets don't consistently interpret ValVT and LocVT, this
doesn't produce quite equivalent code to the initial DAG
lowerings. I've opted to consistently interpret LocVT as the in-memory
size for stack passed values, and ValVT as the register type to assign
from that memory. We therefore produce extending loads directly out of
the IRTranslator, whereas the DAG would emit regular loads of smaller
values. This will also produce loads/stores that are wider than the
argument value if the allocated stack slot is larger (and there will
be undef padding bytes). If we had the optimizations to reduce
load/stores based on truncated values, this wouldn't produce a
different end result.
Since ValVT/LocVT are more consistently interpreted, we now will emit
more G_BITCASTS as requested by the CCAssignFn. For example AArch64
was directly assigning types to some physical vector registers which
according to the tablegen spec should have been casted to a vector
with a different element type.
This also moves the responsibility for inserting
G_ASSERT_SEXT/G_ASSERT_ZEXT from the target ValueHandlers into the
generic code, which is closer to how SelectionDAGBuilder works.
I had to xfail an x86 test since I don't see a quick way to fix it
right now (I filed bug 50035 for this). It's broken independently of
this change, and only triggers since now we end up with more ands
which hit the improperly handled selection pattern.
I also observed that FP arguments that need promotion (e.g. f16 passed
as f32) are broken, and use regular G_TRUNC and G_ANYEXT.
TLDR; the current call lowering infrastructure is bad and nobody has
ever understood how it chooses types.
This untangles the MCContext and the MCObjectFileInfo. There is a circular
dependency between MCContext and MCObjectFileInfo. Currently this dependency
also exists during construction: You can't contruct a MOFI without a MCContext
without constructing the MCContext with a dummy version of that MOFI first.
This removes this dependency during construction. In a perfect world,
MCObjectFileInfo wouldn't depend on MCContext at all, but only be stored in the
MCContext, like other MC information. This is future work.
This also shifts/adds more information to the MCContext making it more
available to the different targets. Namely:
- TargetTriple
- ObjectFileType
- SubtargetInfo
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101462
- As per the HLASM support we are providing, i.e. support only for the first parameter of the inline asm block, only pertaining to Z machine instructions defined in LLVM, character literals and string literals are not supported (see Figure 4 - https://www-01.ibm.com/servers/resourcelink/svc00100.nsf/pages/zOSV2R3sc264940/$file/asmr1023.pdf for more information)
- This patch explicitly rejects the usage of char literals and string literals (for example "abc 'a'") when the relevant field is set
- This is achieved by introducing a field called `LexHLASMStrings` in MCAsmLexer similar to `LexMasmStrings`
Reviewed By: abhina.sreeskantharajan, Kai
Differential Revision: https://reviews.llvm.org/D101660
This reverts commit 57b259a852.
The relative lookup table converter pass seems to cause problems
for chromium on Windows/ARM64, see https://crbug.com/1204788.
This patch adds the two MVTs to fix a legalizer crash when using vector
shuffles of <256 x i16> and <128 x i16> on RISC-V. The legalizer can't
promote the operand of `v256i32 = any_extend_vector_inreg v128i16`.
Reviewed By: craig.topper, RKSimon
Differential Revision: https://reviews.llvm.org/D101769
Root node in ProfiledCallGraph.
In ProfiledCallGraph::addProfiledFunction, to add a function symbol into the
ProfiledCallGraph, currently an uninitialized ProfiledCallGraphNode node is
created by ProfiledFunctions[Name] and inserted into Callees set of Root node
before the node is initialized. The Callees set use
ProfiledCallGraphNodeComparer as its comparator so the uninitialized
ProfiledCallGraphNode may fail to be inserted into Callees set if it happens
to contain a name in memory which has been inserted into the Callees set
before. The problem will prevent some function symbols from being annotated
with profiles and cause performance regression. The patch fixes the problem.
Differential Revision: https://reviews.llvm.org/D101815
This reverts the revert 02c5ba8679
Fix:
Pass was registered as DUMMY_FUNCTION_PASS causing the newpm-pass
functions to be doubly defined. Triggered in -DLLVM_ENABLE_MODULE=1
builds.
Original commit:
This patch implements expansion of llvm.vp.* intrinsics
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).
VP expansion is required for targets that do not implement VP code
generation. Since expansion is controllable with TTI, targets can switch
on the VP intrinsics they do support in their backend offering a smooth
transition strategy for VP code generation (VE, RISC-V V, ARM SVE,
AVX512, ..).
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D78203
As pointed out in D101726, this function already exists in MathExtras.
It uses different types, but with the values used here I believe that
should not make a functional difference.
Add a demangling support for a small subset of a new Rust mangling
scheme, with complete support planned as a follow up work.
Intergate Rust demangling into llvm-cxxfilt and use llvm-cxxfilt for
end-to-end testing. The new Rust mangling scheme uses "_R" as a prefix,
which makes it easy to disambiguate it from other mangling schemes.
The public API is modeled after __cxa_demangle / llvm::itaniumDemangle,
since potential candidates for further integration use those.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D101444
GlobalsAA is only created at the beginning of the inliner pipeline. If
an AAManager is cached from previous passes, it won't get rebuilt to
include the newly created GlobalsAA.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D101379
Add function to create the offload_maptypes and the offload_mapnames globals. These two functions
are used in clang. They will be used in the Flang/MLIR lowering as well.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D101503
- This patch attempts to implement the location counter syntax (*) for the HLASM variant for PC-relative instructions.
- In the HLASM variant, for purely constant relocatable values, we expect a * token preceding it, with special support for " *" which is parsed as "<pc-rel-insn 0>"
- For combinations of absolute values and relocatable values, we don't expect the "*" preceding the token.
When you have a " * " what’s accepted is:
```
*<space>.*{.*} -> <pc-rel-insn> 0
*[+|-][constant-value] -> <pc-rel-insn> [+|-]constant-value
```
When you don’t have a " * " what’s accepted is:
```
brasl 1,func is allowed (MCSymbolRef type)
brasl 1,func+4 is allowed (MCBinary type)
brasl 1,4+func is allowed (MCBinary type)
brasl 1,-4+func is allowed (MCBinary type)
brasl 1,func-4 is allowed (MCBinary type)
brasl 1,*func is not allowed (* cannot be used for non-MCConstantExprs)
brasl 1,*+func is not allowed (* cannot be used for non-MCConstantExprs)
brasl 1,*+func+4 is not allowed (* cannot be used for non-MCConstantExprs)
brasl 1,*+4+func is not allowed (* cannot be used for non-MCConstantExprs)
brasl 1,*-4+8+func is not allowed (* cannot be used for non-MCConstantExprs)
```
Reviewed By: Kai
Differential Revision: https://reviews.llvm.org/D100987
The comment about how to make use of debugger tuning within DwarfDebug
really belongs inside the DwarfDebug declaration, where it will be
easier to find.
This avoids the non-trivial overhead of creating a TaskGroup in these degenerate
cases, but also exposes parallelism. It turns out that the default executor
underlying TaskGroup prevents recursive parallelism - so an instance of a task
group being alive will make nested ones become serial.
This is a big issue in MLIR in some dialects, if they have a single instance of
an outer op (e.g. a firrtl.circuit) that has many parallel ops within it (e.g.
a firrtl.module). This patch side-steps the problem by avoiding creating the
TaskGroup in the unneeded case. See this issue for more details:
https://github.com/llvm/circt/issues/993
Note that this isn't a really great solution for the general case of nested
parallelism. A redesign of the TaskGroup stuff would be better, but would be
a much more invasive change.
Differential Revision: https://reviews.llvm.org/D101699
This patch adds the basic functions needed for controlling auto conversion on z/OS.
Auto conversion is enabled on untagged input file to ASCII by making the assumption that all untagged files are EBCDIC encoded. Output files are auto converted to EBCDIC IBM-1047.
This change also enables conversion for stdin/stdout/stderr.
For more information on how fcntl controls codepage https://www.ibm.com/docs/en/zos/2.4.0?topic=descriptions-fcntl-bpx1fct-bpx4fct-control-open-file-descriptors
Reviewed By: anirudhp
Differential Revision: https://reviews.llvm.org/D100483
Similarly to D101096, this makes sure that MMO operands get propagated
through from MVE gathers/scatters to the Machine Instructions. This
allows extra scheduling freedom, not forcing the instructions to act as
scheduling barriers. We create MMO's with an unknown size, specifying
that they can load from anywhere in memory, similar to the masked_gather
or X86 intrinsics.
Differential Revision: https://reviews.llvm.org/D101219
We create MMO's for the VLDn/VSTn intrinsics in ARMTargetLowering::
getTgtMemIntrinsic, but they do not currently make it ll the way through
ISel. This changes that in the various places it needs changing, making
sure that the MMO is propagate through to the final instruction. This
can help in scheduling, not treating the VLD2/VST2 as a scheduling
barrier.
Differential Revision: https://reviews.llvm.org/D101096
This allows for a much more efficient encoding for small negative
numbers by storing the sign bit first and negating the rest of
the bits. This was already being used for OPC_CheckInteger.
For every in tree target this affects, the table got smaller.
R600GenDAGISel.inc saw the largest reduction of 7K.
I did have to add a new opcode for StringIntegers used for
register class ids and subregister indices since we don't have the
integer value to encode. The enum name is emitted directly into
the table. Previously assumed the enum would expand to a positive
7-bit number. We might be able to just shift that right by 1 and
assume it is a positive 6 bit number, but that will need more
investigation.
This seems to be a leftover from when the BackedgeTakenInfo
stored multiple exit counts with manual memory management. At
some point this was switchted to a simple vector, and there should
be no need to micro-manage the clearing anymore. We can simply
drop the loop from the map and the the destructor do its job.
Move some types in STLExtras.h which are named and behave identically to
STL types from future standards into a dedicated header. This keeps them
organized (they are not "extras" in the same sense as most types in
STLExtras.h are) and fixes circular dependencies in future patches.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D100668
This patch fixes two bugs that arise when a 'defm' inherits from a multiclass
and also from a class with assertions.
Differential Revision: https://reviews.llvm.org/D101626
This patch implements expansion of llvm.vp.* intrinsics
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).
VP expansion is required for targets that do not implement VP code
generation. Since expansion is controllable with TTI, targets can switch
on the VP intrinsics they do support in their backend offering a smooth
transition strategy for VP code generation (VE, RISC-V V, ARM SVE,
AVX512, ..).
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D78203
In a future change it will be possible to run register
allocation with a specific set of register classes,
so some of the remaining virtual registers will still
be meaningful.
Value only used by metadata can be removed from .addrsig table.
This solves the undefined symbol error when enabling addrsig table on COFF LTO.
Differential Revision: https://reviews.llvm.org/D101512
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.
Differential Revision: https://reviews.llvm.org/D100865
- Add new variantKinds for the symbol's variable offset and region handle
- Print the proper relocation specifier @gd in the asm streamer when emitting
the TC Entry for the variable offset for the symbol
- Fix the switch section failure between the TC Entry of variable offset and
region handle
- Put .__tls_get_addr symbol in the ProgramCodeSects with XTY_ER property
Reviewed by: sfertile
Differential Revision: https://reviews.llvm.org/D100956
This reverts commit 3b8ec86fd5.
Revert "[X86] Refine AMX fast register allocation"
This reverts commit c3f95e9197.
This pass breaks using LLVM in a multi-threaded environment by
introducing global state.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.
Differential Revision: https://reviews.llvm.org/D100865
- Currently, the "." (Dot) character, when not identifying an Identifier or a Constant, refers to the current PC (Program Counter)
- However, in z/OS, for the HLASM dialect, it strictly accepts only the "*" as the current PC (Support for this will be put up in a follow-up patch)
- The changes in this patch allow individual platforms to choose whether they would like to use the "." (Dot) character as a marker for the current PC or not.
- It is achieved by introducing a new field in MCAsmInfo.h called `DotIsPC` (similar to `DollarIsPC`)
Reviewed By: abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D100975
This patch adds section support in the OpenMP IRBuilder module, along with a test for the same.
Reviewed By: fghanim
Differential Revision: https://reviews.llvm.org/D89671
We spend some time during sqlite3 compilation regrowing this vector,
bump it up to avoid this.
Gives around 1-2% improvement in codegen-only time for sqlite3 at -O0.
This patch causes the loop vectorizer to not interleave loops that have
nounroll loop hints (llvm.loop.unroll.disable and llvm.loop.unroll_count(1)).
Note that if a particular interleave count is being requested
(through llvm.loop.interleave_count), it will still be honoured, regardless
of the presence of nounroll hints.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D101374
- Previously, https://reviews.llvm.org/D99889 changed the framework in the AsmLexer to treat special tokens, if they occur at the start of the string, as Identifiers.
- These are used by the MASM Parser implementation in LLVM, and we can extend some of the changes made in the previous patch to SystemZ.
- In SystemZ, the special "tokens" referred to here are "_", "$", "@", "#". [_|$|@|#] are already supported as "part" of an Identifier.
- The changes in this patch ensure that these special tokens, when they occur at the start of the Identifier, are treated as Identifiers.
Reviewed By: abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D100959
Add a flag to change dsymutil's behavior and force a static variable to
keep its enclosing function. The test shows a situation where that could
be useful. I'm not convinced this behavior makes sense as a default,
which is why it's behind a flag.
rdar://74918374
Differential revision: https://reviews.llvm.org/D101337
This patch changes the AArch32 crypto instructions (sha2 and aes) to
require the specific sha2 or aes features. These features have
already been implemented and can be controlled through the command
line, but do not have the expected result (i.e. `+noaes` will not
disable aes instructions). The crypto feature retains its existing
meaning of both sha2 and aes.
Several small changes are included due to the knock-on effect this has:
- The AArch32 driver has been modified to ensure sha2/aes is correctly
set based on arch/cpu/fpu selection and feature ordering.
- Crypto extensions are permitted for AArch32 v8-R profile, but not
enabled by default.
- ACLE feature macros have been updated with the fine grained crypto
algorithms. These are also used by AArch64.
- Various tests updated due to the change in feature lists and macros.
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D99079
This was picking a concrete size for a physical register, and
enforcing exact match on the virtual register's type size. Some
targets add multiple types to a register class, and some are smaller
than the full bit width. For example x86 adds f32 to 128-bit xmm
registers, and AMDGPU adds i16/f16 to 32-bit registers.
It might be better to represent these cases as a copy of the full
register and an extraction of the subpart, but a lot of code assumes
you can directly copy. This will help fix the current usage of the DAG
calling convention infrastructure which is incompatible with how
GlobalISel is now using it.
The API is somewhat cumbersome here, but I just mirrored the existing
functions, except now with LLTs (and allow returning null on failure,
unlike the MVT version). I think the concept of selecting register
classes based on type is flawed to begin with, but I'm trying to keep
this compatible with the existing handling.
This patch also refactors the way the feasible max VF is calculated,
although this is NFC for fixed-width vectors.
After this change scalable VF hints are no longer truncated/clamped
to a shorter scalable VF, nor does it drop the 'scalable flag' from
the suggested VF to vectorize with a similar VF that is fixed.
Instead, the hint is ignored which means the vectorizer is free
to find a more suitable VF, using the CostModel to determine the
best possible VF.
Reviewed By: c-rhodes, fhahn
Differential Revision: https://reviews.llvm.org/D98509
In terms of readability, the `enum CFIMoveType` didn't better document what it
intends to convey i.e. the type of CFI section that gets emitted.
Reviewed By: dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D76519
Previously we used an i32 constant to store the saturation width, but i32 isn't
legal on RISCV64. This wasn't a big deal to fix, but it is extra work for the
type legalizer.
This patch uses a VTSDNode to store the type similar to SEXT_INREG. This makes
it opaque to the type legalizer.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D101262
Verifier will complain, but by then it may be too late,
because we might have never reached it because
we already crashed with some bogus bug.
It is best to catch this the moment it happens.
GCC supports negative values for -mstack-protector-guard-offset=, this
should be a signed value. Pre-req to D100919.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101325
This patch adds the support to restrict prefixed instruction from
crossing the 64 byte boundary:
- Add the infrastructure to register a custom XCOFF streamer
- Add a custom XCOFF streamer for PowerPC to allow us to
intercept instructions as they are being emitted and align all 8 byte
instructions to a 64 byte boundary if required by adding a 4 byte nop.
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D101107
This utility allows more efficient start of pattern match.
Often MachineInstr(MI) is available and instead of using
mi_match(MI.getOperand(0).getReg(), MRI, ...) followed by
MRI.getVRegDef(Reg) that gives back MI we now use
mi_match(MI, MRI, ...).
Differential Revision: https://reviews.llvm.org/D99735
The .file directive is changed to only have basename in D36018 for
ELF.
But on AIX, we require the .file directive to also contain the
directory info. This aligns with other AIX compiler like XLC and is
required by some AIX tool like DBX.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D99785
This reapplies 8740360093, which was reverted in bbddadd46e due to buildbot
errors.
This version checks that a JIT instance can be safely constructed, skipping
tests if it can not be. To enable this it introduces new C API to retrieve and
set the target triple for a JITTargetMachineBuilder.
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.
Differential Revision: https://reviews.llvm.org/D98650
In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic`
but has 3 mostly unused pointers. GlobalOpt considers that the pointers can
potentially retain allocated objects, so GlobalOpt cannot optimize out the
`NoopStatistic` variables (see D69428 for more context), wasting 23KiB for stage
2 clang.
This patch makes `NoopStatistic` empty and thus reclaims the wasted space. The
clang size is even smaller than applying D69428 (slightly smaller in both .bss and
.text).
```
# This means the D69428 optimization on clang is mostly nullified by this patch.
HEAD+D69428: size(.bss) = 0x0725a8
HEAD+D101211: size(.bss) = 0x072238
# bloaty - HEAD+D69428 vs HEAD+D101211
# With D101211, we also save a lot of string table space (.rodata).
FILE SIZE VM SIZE
-------------- --------------
-0.0% -32 -0.0% -24 .eh_frame
-0.0% -336 [ = ] 0 .symtab
-0.0% -360 [ = ] 0 .strtab
[ = ] 0 -0.2% -880 .bss
-0.0% -2.11Ki -0.0% -2.11Ki .rodata
-0.0% -2.89Ki -0.0% -2.89Ki .text
-0.0% -5.71Ki -0.0% -5.88Ki TOTAL
```
Note: LoopFuse is a disabled pass. For now this patch adds
`#if LLVM_ENABLE_STATS` so `OptimizationRemarkMissed` is skipped in
LLVM_ENABLE_STATS==0 builds. If these `OptimizationRemarkMissed` are useful in
LLVM_ENABLE_STATS==0 builds, we can replace `llvm::Statistic` with
`llvm::TrackingStatistic`, or use a different abstraction to keep track of the strings.
Similarly, skip the code in `mlir/lib/Pass/PassStatistics.cpp` which
calls `getName`/`getDesc`/`getValue`.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D101211
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.
Differential Revision: https://reviews.llvm.org/D98650
This reverts commit 0ce723cb22.
D76519 was not quite NFC. If we see a CFISection::Debug function before a
CFISection::EH one (-fexceptions -fno-asynchronous-unwind-tables), we may
incorrectly pick CFISection::Debug and emit a `.cfi_sections .debug_frame`.
We should use .eh_frame instead.
This scenario is untested.
Adds support for creating custom MaterializationUnits in the C API with the new
LLVMOrcCreateCustomMaterializationUnit function.
Modifies ownership rules for LLVMOrcAbsoluteSymbols to make it consistent with
LLVMOrcCreateCustomMaterializationUnit. This is an ABI breaking change for any
clients of the LLVMOrcAbsoluteSymbols API.
Adds LLVMOrcLLJITGetObjLinkingLayer and LLVMOrcObjectLayerEmit functions to
allow clients to get a reference to an LLJIT instance's linking layer, then
emit an object file using it. This can be used to support construction of
custom materialization units in the common case where those units will
generate an object file that needs to be emitted to complete the
materialization.
In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic`
but has 3 unused pointers. GlobalOpt considers that the pointers can potentially
retain allocated objects, so GlobalOpt cannot optimize out the `NoopStatistic`
variables (see D69428 for more context), wasting 23KiB for stage 2 clang.
This patch makes `NoopStatistic` empty and thus reclaims the wasted space. The
clang size is even smaller than applying D69428 (slightly smaller in both .bss and
.text).
```
# This means the D69428 optimization on clang is mostly nullified by this patch.
HEAD+D69428: size(.bss) = 0x0725a8
HEAD+D101211: size(.bss) = 0x072238
# bloaty - HEAD+D69428 vs HEAD+D101211
# With D101211, we also save a lot of string table space (.rodata).
FILE SIZE VM SIZE
-------------- --------------
-0.0% -32 -0.0% -24 .eh_frame
-0.0% -336 [ = ] 0 .symtab
-0.0% -360 [ = ] 0 .strtab
[ = ] 0 -0.2% -880 .bss
-0.0% -2.11Ki -0.0% -2.11Ki .rodata
-0.0% -2.89Ki -0.0% -2.89Ki .text
-0.0% -5.71Ki -0.0% -5.88Ki TOTAL
```
Note: LoopFuse is a disabled pass. This patch adds `#if LLVM_ENABLE_STATS` so
`OptimizationRemarkMissed` is skipped in LLVM_ENABLE_STATS==0 builds. If these
`OptimizationRemarkMissed` are useful and not noisy, we can replace
`llvm::Statistic` with `llvm::TrackingStatistic` in the future.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D101211
1. Add an accessor function to MCSymbolizer to retrieve addresses
referenced by a symbolizable operand, but not resolved to a symbol.
That way, the caller can synthesize labels at those addresses and
then retry disassembling the section.
2. Implement that in AMDGPU -- a failed symbol lookup results in the
address being added to a vector returned by the new function.
3. Use that in llvm-objdump when using MCSymbolizer (which only happens
on AMDGPU) and SymbolizeOperands is on.
Differential Revision: https://reviews.llvm.org/D101145
Change-Id: I19087c3bbfece64bad5a56ee88bcc9110d83989e
When vectorising for AArch64 targets if you specify the SVE attribute
we automatically then treat masked loads and stores as legal. Also,
since we have no cost model for masked memory ops we believe it's
cheap to use the masked load/store intrinsics even for fixed width
vectors. This can lead to poor code quality as the intrinsics will
currently be scalarised in the backend. This patch adds a basic
cost model that marks fixed-width masked memory ops as significantly
more expensive than for scalable vectors.
Tests for the cost model are added here:
Transforms/LoopVectorize/AArch64/masked-op-cost.ll
Differential Revision: https://reviews.llvm.org/D100745
The StringView::substr now accepts a substring starting position and its
length instead of previous non-standard `from` & `to` positions.
All uses of two argument StringView::substr are in MicrosoftDemangler
and have 0 as a starting position, so no changes are necessary.
This also fixes a bug where attempting to extract a suffix with substr
(a `to` position equal to size) would return a substring without the
last character.
Fixing the issue should not introduce observable changes in the
demangler, since as currently used, a second argument to
StringView::substr is either: 1) a result of a successful call to
StringView::find and so necessarily smaller than size., or 2) in the
case of Demangler::demangleCharLiteral potentially equal to size, but
with demangler expecting more data to follow later on and failing either
way.
Reviewed By: #libc_abi, ldionne, erik.pilkington
Differential Revision: https://reviews.llvm.org/D100246
m_Deferred() has nothing to do with commutative matchers, it needs
to be used whenever the value to match is determinde as part of
the same match expression.
In terms of readability, the `enum CFIMoveType` didn't better document what it
intends to convey i.e. the type of CFI section that gets emitted.
Reviewed By: dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D76519
There is already code in InlineCost.cpp to identify and ignore ephemeral
values (llvm.assume intrinsics and other side-effect free instructions
only feeding the assumes). However, because llvm.type.test intrinsics
were not marked speculatable, they and any instructions specifically
feeding the type test (typically a bitcast) were being counted towards
the instruction cost when inlining. This was causing profile matching
issues in some cases when enabling -fwhole-program-vtables for whole
program devirtualization.
According to the language reference, the speculatable attribute means:
"the function does not have any effects besides calculating its result
and does not have undefined behavior". I see no reason why type tests
cannot be marked with this attribute.
There are 2 test changes:
llvm/test/Transforms/Inline/ephemeral.ll: I added a type test intrinsic
here to verify the fix. Also, I found the test was not actually testing
what it originally intended. Many of the existing instructions were
optimized away by -Oz, and the cost of inlining was negative due to the
benefit of removing the call. So I changed the test to simply invoke the
inline pass and check the number of instructions computed by InlineCost.
I also fixed an instruction that was not actually used anywhere.
llvm/test/Transforms/SimplifyCFG/no-md-sink.ll needed to be made more
robust to code changes that reordered the metadata.
Differential Revision: https://reviews.llvm.org/D101180
These are added for compatibility with XLC. They are similar to
vec_cts and vec_ctu except that the result is a doubleword vector
regardless of the parameter type.
In cases when ScalarizationCostPassed has no value, UINT_MAX is actually used
for cost estimation in `return ScalarCalls * ScalarCost + ScalarizationCost`.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D101099
Previous build failures were caused by an error in bitcode reading and
writing for DIArgList metadata, which has been fixed in e5d844b587.
There were also some unnecessary asserts that were being triggered on
certain builds, which have been removed.
This reverts commit dad5caa59e.
ConstantFoldingMIRBuilder was an experiment which is not used for
anything. The constant folding functionality is now part of
CSEMIRBuilder.
Differential Revision: https://reviews.llvm.org/D101050
The Linux kernel objtool diagnostic `call without frame pointer save/setup`
arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism
introduced in D100251, it's trivial to respect the command line
-m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it.
Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan)
Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan)
Also document the function attribute "frame-pointer" which is long overdue.
Differential Revision: https://reviews.llvm.org/D101016
These instructions don't really exist, but we have ways we can
emulate them.
.vv will swap operands and use vmsle().vv. .vi will adjust the
immediate and use .vmsgt(u).vi when possible. For .vx we need to
use some of the multiple instruction sequences from the V extension
spec.
For unmasked vmsge(u).vx we use:
vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd
For cases where mask and maskedoff are the same value then we have
vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case that
requires a temporary so we use:
vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt
For other masked cases we use this sequence:
vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0
We trust that register allocation will prevent vd in vmslt{u}.vx
from being v0 since v0 is still needed by the vmxor.
Differential Revision: https://reviews.llvm.org/D100925
Intrinsics for the following instructions are added. The intrinsic
name is "int_hexagon_<inst>[_128B]", e.g.
int_hexagon_V6_vL32b_pred_ai for 64-byte version
int_hexagon_V6_vL32b_pred_ai_128B for 128-byte version
V6_vL32b_pred_ai if (Pv4) Vd32 = vmem(Rt32+#s4)
V6_vL32b_pred_pi if (Pv4) Vd32 = vmem(Rx32++#s3)
V6_vL32b_pred_ppu if (Pv4) Vd32 = vmem(Rx32++Mu2)
V6_vL32b_npred_ai if (!Pv4) Vd32 = vmem(Rt32+#s4)
V6_vL32b_npred_pi if (!Pv4) Vd32 = vmem(Rx32++#s3)
V6_vL32b_npred_ppu if (!Pv4) Vd32 = vmem(Rx32++Mu2)
V6_vL32b_nt_pred_ai if (Pv4) Vd32 = vmem(Rt32+#s4):nt
V6_vL32b_nt_pred_pi if (Pv4) Vd32 = vmem(Rx32++#s3):nt
V6_vL32b_nt_pred_ppu if (Pv4) Vd32 = vmem(Rx32++Mu2):nt
V6_vL32b_nt_npred_ai if (!Pv4) Vd32 = vmem(Rt32+#s4):nt
V6_vL32b_nt_npred_pi if (!Pv4) Vd32 = vmem(Rx32++#s3):nt
V6_vL32b_nt_npred_ppu if (!Pv4) Vd32 = vmem(Rx32++Mu2):nt
V6_vS32b_pred_ai if (Pv4) vmem(Rt32+#s4) = Vs32
V6_vS32b_pred_pi if (Pv4) vmem(Rx32++#s3) = Vs32
V6_vS32b_pred_ppu if (Pv4) vmem(Rx32++Mu2) = Vs32
V6_vS32b_npred_ai if (!Pv4) vmem(Rt32+#s4) = Vs32
V6_vS32b_npred_pi if (!Pv4) vmem(Rx32++#s3) = Vs32
V6_vS32b_npred_ppu if (!Pv4) vmem(Rx32++Mu2) = Vs32
V6_vS32Ub_pred_ai if (Pv4) vmemu(Rt32+#s4) = Vs32
V6_vS32Ub_pred_pi if (Pv4) vmemu(Rx32++#s3) = Vs32
V6_vS32Ub_pred_ppu if (Pv4) vmemu(Rx32++Mu2) = Vs32
V6_vS32Ub_npred_ai if (!Pv4) vmemu(Rt32+#s4) = Vs32
V6_vS32Ub_npred_pi if (!Pv4) vmemu(Rx32++#s3) = Vs32
V6_vS32Ub_npred_ppu if (!Pv4) vmemu(Rx32++Mu2) = Vs32
V6_vS32b_nt_pred_ai if (Pv4) vmem(Rt32+#s4):nt = Vs32
V6_vS32b_nt_pred_pi if (Pv4) vmem(Rx32++#s3):nt = Vs32
V6_vS32b_nt_pred_ppu if (Pv4) vmem(Rx32++Mu2):nt = Vs32
V6_vS32b_nt_npred_ai if (!Pv4) vmem(Rt32+#s4):nt = Vs32
V6_vS32b_nt_npred_pi if (!Pv4) vmem(Rx32++#s3):nt = Vs32
V6_vS32b_nt_npred_ppu if (!Pv4) vmem(Rx32++Mu2):nt = Vs32
This patch adds semantic checks for the General Restrictions of the
Allocate Directive.
Since the requires directive is not yet implemented in Flang, the
restriction:
```
allocate directives that appear in a target region must
specify an allocator clause unless a requires directive with the
dynamic_allocators clause is present in the same compilation unit
```
will need to be updated at a later time.
A different patch will be made with the Fortran specific restrictions of
this directive.
I have used the code from https://reviews.llvm.org/D89395 for the
CheckObjectListStructure function.
Co-authored-by: Isaac Perry <isaac.perry@arm.com>
Reviewed By: clementval, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D91159
It is proper to relax non-negative limitation of step_vector.
Also this patch adds more combines for step_vector:
(sub X, step_vector(C)) -> (add X, step_vector(-C))
Differential Revision: https://reviews.llvm.org/D100812
The change adds support for triming and merging cold context when mergine CSSPGO profiles using llvm-profdata. This is similar to the context profile trimming in llvm-profgen, however the flexibility to trim cold context after profile is generated can be useful.
Differential Revision: https://reviews.llvm.org/D100528
This patch allows PRE of the following type of loads:
```
preheader:
br label %loop
loop:
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
br label %merge
merge:
...
br i1 ..., label %loop, label %exit
```
Into
```
preheader:
%x0 = load %p
br label %loop
loop:
%x.pre = phi(x0, x2)
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
%x1 = load %p
br label %merge
merge:
x2 = phi(x.pre, x1)
...
br i1 ..., label %loop, label %exit
```
So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.
The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.
There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that
we don't know if their sum is colder than the header.
Differential Revision: https://reviews.llvm.org/D99926
Reviewed By: reames
This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions.
Reviewed By: jdoerfert, Meinersbur
Differential Revision: https://reviews.llvm.org/D95976
Report dangling probes for frames that have real samples collected. Dangling probes are the probes associated to an empty block. When reported, sample count on a dangling probe will not be trusted by the compiler and we will rely on the counts inference algorithm to get the probe a reasonable count. This actually fixes a bug where previously only those dangling probes with samples collected were reported.
This patch also fixes two existing issues. Pseudo probes are stored in `Address2ProbesMap` and their pointers are used in `PseudoProbeInlineTree`. Previously `std::vector` was used to store probes and the pointers to probes may get obsolete as the vector grows. I'm changing `std::vector` to `std::list` instead.
The other issue is that all outlined functions shared the same inline frame previously due to the unchanged `Index` value as the dummy inlineSite identifier.
Good results seen for SPEC2017 in general regarding profile quality.
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D100235
On ELF targets, if a function has uwtable or personality, or does not have
nounwind (`needsUnwindTableEntry`), it marks that `.eh_frame` is needed in the module.
Then, a function gets `.eh_frame` if `needsUnwindTableEntry` or `-g[123]` is specified.
(i.e. If -g[123], every function gets `.eh_frame`.
This behavior is strange but that is the status quo on GCC and Clang.)
Let's take asan as an example. Other sanitizers are similar.
`asan.module_[cd]tor` has no attribute. `needsUnwindTableEntry` returns true,
so every function gets `.eh_frame` if `-g[123]` is specified.
This is the root cause that
`-fno-exceptions -fno-asynchronous-unwind-tables -g` produces .debug_frame
while
`-fno-exceptions -fno-asynchronous-unwind-tables -g -fsanitize=address` produces .eh_frame.
This patch
* sets the nounwind attribute on sanitizer module ctor/dtor.
* let Clang emit a module flag metadata "uwtable" for -fasynchronous-unwind-tables. If "uwtable" is set, sanitizer module ctor/dtor additionally get the uwtable attribute.
The "uwtable" mechanism is generic: synthesized functions not cloned/specialized
from existing ones should consider `Function::createWithDefaultAttr` instead of
`Function::create` if they want to get some default attributes which
have more of module semantics.
Other candidates: "frame-pointer" (https://github.com/ClangBuiltLinux/linux/issues/955https://github.com/ClangBuiltLinux/linux/issues/1238), dso_local, etc.
Differential Revision: https://reviews.llvm.org/D100251
- Previously, https://reviews.llvm.org/D72680 introduced a new attribute called `AllowSymbolAtNameStart` (in relation to the MAsmParser changes) in `MCAsmInfo.h` which (according to the comment in the header) allows the following behaviour:
```
/// This is true if the assembler allows $ @ ? characters at the start of
/// symbol names. Defaults to false.
```
- However, the usage of this field in AsmLexer.cpp doesn't seem completely accurate* for a couple of reasons.
```
default:
if (MAI.doesAllowSymbolAtNameStart()) {
// Handle Microsoft-style identifier: [a-zA-Z_$.@?][a-zA-Z0-9_$.@#?]*
if (!isDigit(CurChar) &&
isIdentifierChar(CurChar, MAI.doesAllowAtInName(),
AllowHashInIdentifier))
return LexIdentifier();
}
```
1. The Dollar and At tokens, when occurring at the start of the string, are treated as separate tokens (AsmToken::Dollar and AsmToken::At respectively) and not lexed as an Identifier.
2. I'm not too sure why `MAI.doesAllowAtInName()` is used when `AllowAtInIdentifier` could be used. For X86 platforms, afaict, this shouldn't be an issue, since the `CommentString` attribute isn't "@". (alternatively the call to the setter can be set anywhere else as needed). The `AllowAtInName` does have an additional important meaning, but in the context of AsmLexer, shouldn't mean anything different compared to `AllowAtInIdentifier`
My proposal is the following:
- Introduce 3 new fields called `AllowQuestionTokenAtStartOfString`, `AllowDollarTokenAtStartOfString` and `AllowAtTokenAtStartOfString` in MCAsmInfo.h which will encapsulate the previously documented behaviour of "allowing $, @, ? characters at the start of symbol names")
- Introduce these fields where "$", "@" are lexed, and treat them as identifiers depending on whether `Allow[Dollar|At]TokenAtStartOfString` is set.
- For the sole case of "?", append it to the existing logic for treating a "default" token as an Identifier.
z/OS (HLASM) will also make use of some of these fields in follow up patches.
completely accurate* - This was based on the comments and the intended behaviour the code. I might have completely misinterpreted it, and if that is the case my sincere apologies. We can close this patch if necessary, if there are no changes to be made :)
Depends on https://reviews.llvm.org/D99374
Reviewed By: Jonathan.Crowther
Differential Revision: https://reviews.llvm.org/D99889
CommandLine.h is indirectly included in ~50% of TUs when building
clang, and VirtualFileSystem.h is large.
(Already remarked by jhenderson on D70769.)
No behavior change.
Differential Revision: https://reviews.llvm.org/D100957
This is a follow-up to https://reviews.llvm.org/D100387.
std::vector is not the best storage container here. My local benchmark (counting
the number of instruction when compiling the sqlite3 amalgamation) yields the
following:
- std::vector<BitVector> -> 5,860,885,896
- SmallVector<BitWord, 0> -> 5,858,991,997
- SmallVector<BitWord> -> 5,817,679,224
Differential Revision: https://reviews.llvm.org/D100744