Handles COFF import files of static archive. Changes static library genrator to build up object file map keyed by symbol name that excludes the symbols from dllimported symbols so that static generator will not be responsible for them. It exposes the list of dynamic libraries that need to be imported. Client should properly load the libraries in this list beforehand. Object file map is also an improvment from the past in terms of performance. Archive.findSym does a slow O(n) linear serach of symbol list to find the symbol. (we call findSym O(n) times, thus full time complexity is O(n^2); we were the only user of findSym function in fact)
There is a room for improvements in how to load the libraries in the list. We currently just hand the responsibility over to the clinet. A better way would be let ORC read this list and hand them over to JITLink side that would also help validation (e.g. not trying to generate stub for non dllimported targets) Nevertheless, we will have to exclude the symbols from COFF import object file list and need a way to access this list, which this patch offers.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D129952
Current DWARFLinker implementation does not support some debug sections
(mainly DWARF v5 sections). This patch adds diagnostic for such sections.
The warning would be displayed for critical(such that could not be removed)
sections and the source file would be skipped. Other unsupported sections
would be removed and warning message should be displayed. The zero exit
status would be returned for both cases.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D123623
This builtin allows the creation of custom scheduling pipelines on a per-region
basis. Like the sched_barrier builtin this is intended to be used either for
testing, in situations where the default scheduler heuristics cannot be
improved, or in critical kernels where users are trying to get performance that
is close to handwritten assembly. Obviously using these builtins will require
extra work from the kernel writer to maintain the desired behavior.
The builtin can be used to create groups of instructions called "scheduling
groups" where ordering between the groups is enforced by the scheduler.
__builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter
is a mask that determines the types of instructions that you would like to
synchronize around and add to a scheduling group. These instructions will be
selected from the bottom up starting from the sched_group_barrier's location
during instruction scheduling. The second parameter is the number of matching
instructions that will be associated with this sched_group_barrier. The third
parameter is an identifier which is used to describe what other
sched_group_barriers should be synchronized with. Note that multiple
sched_group_barriers must be added in order for them to be useful since they
only synchronize with other sched_group_barriers. Only "scheduling groups" with
a matching third parameter will have any enforced ordering between them.
As an example, the code below tries to create a pipeline of 1 VMEM_READ
instruction followed by 1 VALU instruction followed by 5 MFMA instructions...
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 1 VALU
__builtin_amdgcn_sched_group_barrier(2, 1, 0)
// 5 MFMA
__builtin_amdgcn_sched_group_barrier(8, 5, 0)
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 3 VALU
__builtin_amdgcn_sched_group_barrier(2, 3, 0)
// 2 VMEM_WRITE
__builtin_amdgcn_sched_group_barrier(64, 2, 0)
Reviewed By: jrbyrnes
Differential Revision: https://reviews.llvm.org/D128158
GetDemandedBits is mainly a wrapper around SimplifyMultipleUseDemandedBits now, and is only used by DAGCombiner::visitSTORE so I've moved all remaining functionality there.
visitSTORE was making use of this to 'simplify' constants for a trunc-store. Just removing this code left to a mixture of regressions and gains - it came down to whether a target preferred a sign or zero extended constant for materialization/truncation. I've just moved the code over for now, but a next step would be to move this to targetShrinkDemandedConstant, but some targets that override the method expect a basic binop, and might react badly to a store node.....
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D130012
This is pretty straightforward, it just adds a builtin to return a
pointer to a resource handle. This maps to a dx intrinsic.
The shape of this builtin and the underlying intrinsic will likely
shift a bit as this implementation becomes more feature complete, but
this is a good basis to get started.
Depends on D128569.
Differential Revision: https://reviews.llvm.org/D130016
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D130012
At the moment, proveNoUnsignedWrapViaInduction may be called for the
same AddRec a large number of times via getZeroExtendExpr. This can have
a severe compile-time impact for very loop-heavy code. One one
particular workload, LSR takes ~51s without this patch, almost
exlusively in proveNoUnsignedWrapViaInduction. With this patch, the time
in LSR drops to ~0.4s.
If proveNoUnsignedWrapViaInduction failed to prove NUW the first time,
it is unlikely to succeed on subsequent tries and the cost doesn't seem
to be justified.
Besides drastically improving compile-time in some excessive cases, this
also has a slightly positive compile-time impact on CTMark:
NewPM-O3: -0.07%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.06
https://llvm-compile-time-tracker.com/compare.php?from=b435da027d7774c24cdb8c88d09f6b771e07fb14&to=f2729e33e8284b502f6c35a43345272252f35d12&stat=instructions
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D130648
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
Currently, there is significant code duplication for dealing with
MD_prof metadata throughout the compiler. These utility functions can
improve code reuse and simplify boilerplate code when dealing with
profiling metadata, such as branch weights. The inent is to provide a
uniform set of APIs that allow common tasks, such as identifying
specific types of MD_prof metadata and extracting branch weights.
Future patches can build on this initial implementation and clean up the
different implementations across the compiler.
Reviewed By: bogner
Differential Revision: https://reviews.llvm.org/D128858
Teach libDebugInfo (llvm-dwarfdump) and lldb about DWARF tags and
attributes for pointer authentication. These values have been emitted by
Apple clang for several releases. Although upstream LLVM doesn't emit
these values yet, we hope to upstream that part sometime soon.
Differential revision: https://reviews.llvm.org/D130215
This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of
materializing a specific constant in code size. Doing so prevents us from
sinking constants which require multiple instructions to generate into
use blocks.
Code size savings on CTMark -Os:
Program size.__text
before after diff
ClamAV/clamscan 381940.00 382052.00 0.0%
lencod/lencod 428408.00 428428.00 0.0%
SPASS/SPASS 411868.00 411876.00 0.0%
kimwitu++/kc 449944.00 449944.00 0.0%
Bullet/bullet 463588.00 463556.00 -0.0%
sqlite3/sqlite3 284696.00 284668.00 -0.0%
consumer-typeset/consumer-typeset 414492.00 414424.00 -0.0%
7zip/7zip-benchmark 595244.00 594972.00 -0.0%
mafft/pairlocalalign 247512.00 247368.00 -0.1%
tramp3d-v4/tramp3d-v4 372884.00 372044.00 -0.2%
Geomean difference -0.0%
Differential Revision: https://reviews.llvm.org/D130554
I am playing with the LoopDataPrefetch pass and found out that it
bails to work with a pointer in a non-zero address space. This
patch adds the target callback to check if an address space is to
be considered for prefetching. Default implementation still only
allows address space 0, so this is NFCI.
This does not currently affect any known targets, but seems to be
generally useful for the future.
Differential Revision: https://reviews.llvm.org/D129795
While working on D118450 <https://reviews.llvm.org/D118450>, I noticed that
`sys::getHostCPUName` lacks SPARC support.
This patch implements it. The code is taken from/inspired by GCC's
`gcc/config/sparc/driver-sparc.cc`. There's one caveat: since LLVM, unlike
GCC, doesn't support the SPARC-M7, -S7, and -M8 CPUs, I map all those to
the latest supported one (UltraSparc T4/`niagara4`).
Tested on `sparcv9-sun-solaris2.11` and `sparc64-unknown-linux-gnu` by
running `savcov --version` on
- Netra SPARC S7-2 (SPARC-S7, Solaris 11.4)
- SPARC T5-2 (SPARC T5, Solaris 11.4)
- SPARC Enterprise T5220 (UltraSPARC T2, Solaris 11.3)
- SPARC T5 (UltraSPARC T5, Debian sid)
- SPARC T3 (UltraSPARC T3, Debian sid)
- SPARC Enterprise T5220 (Debian sid)
Differential Revision: https://reviews.llvm.org/D130272
-print-changed for new pass manager is handy beside -print-after-all.
Port it to MachineFunctionPass.
Note: lib/Passes/StandardInstrumentations.cpp implements a number of
misc features. If we want to use them for codegen, we may need to lift
some functionality to LLVMIR.
Reviewed By: aeubanks, jamieschmeiser
Differential Revision: https://reviews.llvm.org/D130434
WinEHPrepare marks any function call from EH funclets as unreachable, if it's not a nounwind intrinsic or has no proper funclet bundle operand. This
affects ARC intrinsics on Windows, because they are lowered to regular function calls in the PreISelIntrinsicLowering pass. It caused silent binary truncations and crashes during unwinding with the GNUstep ObjC runtime: https://github.com/gnustep/libobjc2/issues/222
This patch adds a new function `llvm::IntrinsicInst::mayLowerToFunctionCall()` that aims to collect all affected intrinsic IDs.
* Clang CodeGen uses it to determine whether or not it must emit a funclet bundle operand.
* PreISelIntrinsicLowering asserts that the function returns true for all ObjC runtime calls it lowers.
* LLVM uses it to determine whether or not a funclet bundle operand must be propagated to inlined call sites.
Reviewed By: theraven
Differential Revision: https://reviews.llvm.org/D128190
This patch starts small, only detecting sequences of the form
<a, a+n, a+2n, a+3n, ...> where a and n are ConstantSDNodes.
Differential Revision: https://reviews.llvm.org/D125194
Turning on opaque pointers has uncovered an issue with WPD where we currently pattern match away `assume(type.test)` in WPD so that a later LTT doesn't resolve the type test to undef and introduce an `assume(false)`. The pattern matching can fail in cases where we transform two `assume(type.test)`s into `assume(phi(type.test.1, type.test.2))`.
Currently we create `assume(type.test)` for all virtual calls that might be devirtualized. This is to support `-Wl,--lto-whole-program-visibility`.
To prevent this, all virtual calls that may not be in the same LTO module instead use a new `llvm.public.type.test` intrinsic in place of the `llvm.type.test`. Then when we know if `-Wl,--lto-whole-program-visibility` is passed or not, we can either replace all `llvm.public.type.test` with `llvm.type.test`, or replace all `llvm.public.type.test` with `true`. This prevents WPD from trying to pattern match away `assume(type.test)` for public virtual calls when failing the pattern matching will result in miscompiles.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D128955
Current DWARFLinker implementation does not support some debug sections
(mainly DWARF v5 sections). This patch adds diagnostic for such sections.
The warning would be displayed for critical(such that could not be removed)
sections and the source file would be skipped. Other unsupported sections
would be removed and warning message should be displayed. The zero exit
status would be returned for both cases.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D123623
Currently, when llvm-objdump is disassembling a code section and
encounters a point where no instruction can be decoded, it uses the
same policy on all targets: consume one byte of the section, emit it
as "<unknown>", and try disassembling from the next byte position.
On an architecture where instructions are always 4 bytes long and
4-byte aligned, this makes no sense at all. If a 4-byte word cannot be
decoded as an instruction, then the next place that a valid
instruction could //possibly// be found is 4 bytes further on.
Disassembling from a misaligned address can't possibly produce
anything that the code generator intended, or that the CPU would even
attempt to execute.
This patch introduces a new MCDisassembler virtual method called
`suggestBytesToSkip`, which allows each target to choose its own
resynchronization policy. For Arm (as opposed to Thumb) and AArch64,
I've filled in the new method to return a fixed width of 4.
Thumb is a more interesting case, because the criterion for
identifying 2-byte and 4-byte instruction encodings is very simple,
and doesn't require the particular instruction to be recognized. So
`suggestBytesToSkip` is also passed an ArrayRef of the bytes in
question, so that it can take that into account. The new test case
shows Thumb disassembly skipping over two unrecognized instructions,
and identifying one as 2-byte and one as 4-byte.
For targets other than Arm and AArch64, this is NFC: the base class
implementation of `suggestBytesToSkip` still returns 1, so that the
existing behavior is unchanged. Other targets can fill in their own
implementations as they see fit; I haven't attempted to choose a new
behavior for each one myself.
I've updated all the call sites of `MCDisassembler::getInstruction` in
llvm-objdump, and also one in sancov, which was the only other place I
spotted the same idiom of `if (Size == 0) Size = 1` after a call to
`getInstruction`.
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D130357
This teaches ProcessElfCore to recognise the MTE tag segments.
https://www.kernel.org/doc/html/latest/arm64/memory-tagging-extension.html#core-dump-support
These segments contain all the tags for a matching memory segment
which will have the same size in virtual address terms. In real terms
it's 2 tags per byte so the data in the segment is much smaller.
Since MTE is the only tag type supported I have hardcoded some
things to those values. We could and should support more formats
as they appear but doing so now would leave code untested until that
happens.
A few things to note:
* /proc/pid/smaps is not in the core file, only the details you have
in "maps". Meaning we mark a region tagged only if it has a tag segment.
* A core file supports memory tagging if it has at least 1 memory
tag segment, there is no other flag we can check to tell if memory
tagging was enabled. (unlike a live process that can support memory
tagging even if there are currently no tagged memory regions)
Tests have been added at the commands level for a core file with
mte and without.
There is a lot of overlap between the "memory tag read" tests here and the unit tests for
MemoryTagManagerAArch64MTE::UnpackTagsFromCoreFileSegment, but I think it's
worth keeping to check ProcessElfCore doesn't cause an assert.
Depends on D129487
Reviewed By: omjavaid
Differential Revision: https://reviews.llvm.org/D129489
A new helper class DXILOpBuilder is added to create DXIL op function calls.
TableGen backend for DXILOperation will create table for DXIL op function parameter types.
When create DXIL op function, these parameter types will used to create the function type.
Reviewed By: bogner
Differential Revision: https://reviews.llvm.org/D130291
`SmallDenseMap` constructor with reserve gets an arbitrary `NumInitBuckets` value and passes it below to `init` method.
If `NumInitBuckets` is greater then `InlineBuckets`, then `SmallDenseMap` initializes to large representation passing `NumInitBuckets` below to `DenseMap` initialization. `DenseMap::initEmpty` method asserts that initial buckets count must be a power of 2.
Proposed solution is to update `NumInitBuckets` value in `SmallDenseMap` constructor till the next power of 2. It should satisfy both `DenseMap` preconditions and required minimum buckets count for reservation.
Reviewed By: atrick
Differential Revision: https://reviews.llvm.org/D129825
Reimplements ADDR32NB/REL32 relocations properly, out-of-reach targets will be dealt in the separate patch that will generate the stub for dllimport symbols.
Reviewed By: sgraenitz
Differential Revision: https://reviews.llvm.org/D129936
The code used to cause a warning:
llvm/include/llvm/Support/MathExtras.h:751:39: warning: suggest parentheses around ‘-’ in operand of ‘&’ [-Wparentheses]
751 | assert(Align != 0 && (Align & Align - 1) == 0 &&
|
This change adds a nop instruction if section starts with landing pad. This change is like [D73739](https://reviews.llvm.org/D73739) which avoids zero offset landing pad in basic block sections.
Detailed description:
The current machine functions splitter can create ˜sections which start with a landing pad themselves. This places landing pad at offset zero from LPStart.
```
.section .text.split.foo10,"ax",@progbits
foo10.cold: # %lpad
.cfi_startproc
.cfi_personality 3, __gxx_personality_v0
.cfi_lsda 3, .Lexception5
.cfi_def_cfa %rsp, 16
.Ltmp11: <--- This is a Landing pad and also LP Start as it is start of this section
movq %rax, %rdi <--- first instruction is at offest 0 from LPStart
callq _Unwind_Resume@PLT
```
This will cause landing pad entries to become zero (.Ltmp11-foo10.cold)
```
.Lcst_begin4:
.uleb128 .Ltmp9-.Lfunc_begin2 # >> Call Site 1 <<
.uleb128 .Ltmp10-.Ltmp9 # Call between .Ltmp9 and .Ltmp10
.uleb128 .Ltmp11-foo10.cold <---This is zero # jumps to .Ltmp11
.byte 3 # On action: 2
.uleb128 .Ltmp10-.Lfunc_begin2 # >> Call Site 2 <<
.uleb128 .Lfunc_end9-.Ltmp10 # Call between .Ltmp10 and .Lfunc_end9
.byte 0 # has no landing pad
.byte 0 # On action: cleanup
.p2align 2
```
The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. This change adds a nop instruction at start of such sections so that such a case could be avoided. Output:
```
.section .text.split.foo10,"ax",@progbits
foo10.cold: # %lpad
.cfi_startproc
.cfi_personality 3, __gxx_personality_v0
.cfi_lsda 3, .Lexception5
.cfi_def_cfa %rsp, 16
nop <--- new instruction that is added
.Ltmp11:
movq %rax, %rdi
callq _Unwind_Resume@PLT
```
Reviewed By: modimo, snehasish, rahmanl
Differential Revision: https://reviews.llvm.org/D130133
An async suspend models the split between two partial async functions.
`llvm.swift.async.context.addr ` will have a different value in the two
partial functions so it is not correct to generally CSE the instruction.
rdar://97336162
Differential Revision: https://reviews.llvm.org/D130201
Multiple calls to `omp_get_wtime` could be optimized out due to the function
is mistakenly marked as `readnone`. This patch fixes the issue, and also add the
support to run optimization on `libomptarget` tests.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D130179
The name `getEntrySamples` was misleading for 2 reasons. One, it's
close in name to `Function::getEntryCount`, but the equivalent here is
`getHeadSamples`; second, as opposed to the other get* APIs in
`FunctionSamples`, it performs an estimate/heuristic rather than just
retrieving raw data (or a non-heuristic derivate off that data, like
`getMaxCountInside`)
The new name should more clearly communicate its intent; and, being
close (in name) to `getHeadSamples`, it should allow the reader discover
the relation between them.
Also updated the doc comments for both `getHeadSamples[Estimate]` so a
reader may better understand the relation between them.
Differential Revision: https://reviews.llvm.org/D130281
Multiple calls to `omp_get_wtime` could be optimized out due to the function
is mistakenly marked as `readnone`. This patch fixes the issue, and also add the
support to run optimization on `libomptarget` tests.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D130179
This patch adds a command line flag to be able to test
the type based cost-model analysis for Intrinsics.
Differential Revision: https://reviews.llvm.org/D129109
If a function is non-recursive we only performed intra-procedural
reasoning for reachability (via AA::isPotentiallyReachable). However,
if it is re-entrant that doesn't mean we can't reach. Instead of this
problematic logic in the reachability reasoning we utilize logic in
AAPointerInfo. If a location is for sure written by a function it can
be re-entrant or recursive we know only intra-procedural reasoning is
sufficient.
If we have a dominating must-write access we do not need to know the
initial value of some object to perform reasoning about the potential
values. The dominating must-write has overwritten the initial value.
The patch adds SPIRVPrepareFunctions pass, which modifies function
signatures containing aggregate arguments and/or return values before
IR translation. Information about the original signatures is stored in
metadata. It is used during call lowering to restore correct SPIR-V types
of function arguments and return values. This pass also substitutes some
llvm intrinsic calls to function calls, generating the necessary functions
in the module, as the SPIRV translator does.
The patch also includes changes in other modules, fixing errors and
enabling many SPIR-V features that were omitted earlier. And 15 LIT tests
are also added to demonstrate the new functionality.
Differential Revision: https://reviews.llvm.org/D129730
Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com>
Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com>
Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>
Adds a number of utilities that are used to help create and update
memprof related metadata. These will be used during profile matching
and annotation, as well as by the inliner when updating the metadata.
Also adds unit tests for the utilities.
See also related RFCs:
RFC: Sanitizer-based Heap Profiler [1]
RFC: A binary serialization format for MemProf [2]
RFC: IR metadata format for MemProf [3]
(Note that the IR metadata format has changed from the RFC during
implementation, as described in the preceeding patch adding the basic
metadata and verification support.)
Depends on D128141.
Differential Revision: https://reviews.llvm.org/D128854
The InstCombine test is reduced from issue #56601. Without the more
liberal match for ConstantExpr, we try to rearrange constants in
Negator forever.
Alternatively, we could adjust the definition of m_ImmConstant to be
more conservative, but that's probably a larger patch, and I don't
see any downside to changing m_ConstantExpr. We never capture and
modify a ConstantExpr; transforms just want to avoid it.
Differential Revision: https://reviews.llvm.org/D130286
This change implements the contextual symbolizer markup elements: reset,
module, and mmap. These provide information about the runtime context of
the binary necessary to resolve addresses to symbolic values.
Summary information is printed to the output about this context.
Multiple mmap elements for the same module line are coalesced together.
The standard requires that such elements occur on their own lines to
allow for this; accordingly, anything after a contextual element on a
line is silently discarded.
Implementing this cleanly requires that the filter drive the parser;
this allows skipped sections to avoid being parsed. This also makes the
filter quite a bit easier to use, at the cost of some unused
flexibility.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D129519
This patch adds the AArch64 hook for preferPredicateOverEpilogue,
which currently returns true if SVE is enabled and one of the
following conditions (non-exhaustive) is met:
1. The "sve-tail-folding" option is set to "all", or
2. The "sve-tail-folding" option is set to "all+noreductions"
and the loop does not contain reductions,
3. The "sve-tail-folding" option is set to "all+norecurrences"
and the loop has no first-order recurrences.
Currently the default option is "disabled", but this will be
changed in a later patch.
I've added new tests to show the options behave as expected here:
Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll
Differential Revision: https://reviews.llvm.org/D129560
Summary:
We previously used `alignof` to get the necessary alignment of the
binary header. However this was different on 32-bit platforms and caused
a few tests to fail because of it. This patch just changes this to be a
hard-coded constant of 8.
Replace the value-accepting isReallocLikeFn() overload with a
getReallocatedOperand() function, which returns which operand is
the one being reallocated. Currently, this is always the first one,
but once allockind(realloc) is respected, the reallocated operand
will be determined by the allocptr parameter attribute.
Remove isFreeCall() in favor of getFreedOperand(). Replace the
two remaining uses with a getFreedOperand() != nullptr check, as
they only care that something is getting freed. (The usage in DSE
is correct as such. The allocator-related checks in CFLGraph look
rather questionable in general.)
DWARF files may contain overlapping address ranges. f.e. it can happen if the two
copies of the function have identical instruction sequences and they end up sharing.
That looks incorrect from the point of view of DWARF spec. Current implementation
of DWARFLinker does not combine overlapped address ranges. It would be good if such
ranges would be handled in some useful way. Thus, this patch allows DWARFLinker
to combine overlapped ranges in a single one.
Depends on D86539
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D123469
We currently assume in a number of places that free-like functions
free their first argument. This is true for all hardcoded free-like
functions, but with the new attribute-based design, the freed
argument is supposed to be indicated by the allocptr attribute.
To make sure we handle this correctly once allockind(free) is
respected, add a getFreedOperand() helper which returns the freed
argument, rather than just indicating whether the call frees *some*
argument.
This migrates most but not all users of isFreeCall() to the new
API. The remaining users are a bit more tricky.
DWARF files may contain overlapping address ranges. f.e. it can happen if the two
copies of the function have identical instruction sequences and they end up sharing.
That looks incorrect from the point of view of DWARF spec. Current implementation
of DWARFLinker does not combine overlapped address ranges. It would be good if such
ranges would be handled in some useful way. Thus, this patch allows DWARFLinker
to combine overlapped ranges in a single one.
Depends on D86539
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D123469
Default getAllocSize() to use the trivial mapper. Also switch
from using std::function to function_ref.
Furthermore, update the doc comment to point out a subtle difference
between getAllocSize() and getObjectSize(): The latter may also
return something for calls that return their argument (via "returned"
attribute or special intrinsics like invariant groups).
There is a problem in loop cache analysis that the types of SCEV variables
`Coeff` and `ElemSize` in function `isConsecutive()` may not match. The
mismatch would cause SCEV failures when `Coeff` is multiplied with `ElemSize`.
The fix in this patch is to extend the type of both `Coeff` and `ElemSize` to
whichever is wider in those two variables. As a clean-up, duplicate calculations
of `Stride` in `computeRefCost()` is then removed.
Reviewed By: Meinersbur, #loopoptwg
Differential Revision: https://reviews.llvm.org/D128877
MapperJITLinkMemoryManager supports executor memory management using any
implementation of MemoryMapper to do the transfer such as InProcessMapper or
SharedMemoryMapper.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D129495
Add basic support for the MemProf metadata (!memprof and !callsite)
which was initially described in "RFC: IR metadata format for MemProf"
(https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165).
The bulk of the patch is verification support, along with some tests.
There are a couple of changes to the format described in the original
RFC:
Initial measurements suggested that a tree format for the stack ids in
the contexts would be more efficient, but subsequent evaluation with
large applications showed that in fact the cost of the additional
metadata nodes required by this deduplication scheme overwhelmed the
benefit from sharing stack id nodes. Therefore, the implementation here
and in follow on patches utilizes a simpler scheme of lists of stack id
integers in the memprof profile contexts and callsite metadata. The
follow on matching patch employs context trimming optimizations to
reduce the cost.
Secondly, instead of verbosely listing all profiled fields in each
profiled context (memory info block or MIB), and deferring the
interpretation of the profile data, the profile data is evaluated and
converted into string tags specifying the behavior (e.g. "cold") during
profile matching. This reduces the verbosity of the profile metadata,
and allows additional context trimming optimizations. As a result, the
named metadata schema description is also no longer needed.
Differential Revision: https://reviews.llvm.org/D128141
This patch restores a call to has_value to make it clear that we are
checking the presence of an optional value, not the underlying value.
This patch partially reverts d08f34b592.
Differential Revision: https://reviews.llvm.org/D129453
This change improves ctags generation for tablegen files.
For the following example
```
class A;
class A {
int a;
}
```
Previously, tags were generated only for a forward declaration of class 'A'.
This patch allows generating tags for the forward declarations
and further definition of class 'A'.
Reviewed By: barannikov88
Original patch by: rusyaev-roman (Roman Rusyaev)
Some adjustments by: nhaehnle (Nicolai Hähnle)
Differential Revision: https://reviews.llvm.org/D129935
The n_type field in the symbol table entry has two interpretations in XCOFF32, and a single interpretation in XCOFF64.
The new interpretation is used in XCOFF32 if the value of the o_vstamp field in the auxiliary header is 2.
In XCOFF64 and the new XCOFF32 interpretation, the n_type field is used for the symbol type and visibility.
The patch writes the aux header with an o_vstamp field value of 2 when the visibility is specified in XCOFF32 to make the new XCOFF32 interpretation used.
Reviewed By: DiggerLin, jhenderson
Differential Revision: https://reviews.llvm.org/D128148
To solve the readnone problems in coroutines. See
https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015
for details.
According to the discussion, we decide to fix the problem by inserting
isPresplitCoroutine() checks in different passes instead of
wrapping/unwrapping readnone attributes in CoroEarly/CoroCleanup passes.
In this direction, we might not be able to cover every case at first.
Let's take a "find and fix" strategy.
Reviewed By: nikic, nhaehnle, jyknight
Differential Revision: https://reviews.llvm.org/D127383
There were two problems with the previous setup:
1. We weren't setting its size, which caused problems when `__llvm_addrsig`
wasn't the last section. In particular, `__debug_line` (if created) is
generated and placed after `__llvm_addrsig`, and would result in an
invalid object file w/ overlapping sections being emitted.
2. The symbol indices could be invalidated if e.g. `llvm-strip` ran on
the object file. See discussion [here][1].
To fix both these issues, we use symbol relocations instead of encoding
symbol indices directly in the section contents. The section itself
doesn't contain any data. That sidesteps the layout problem in addition
to solving the second issue.
The corresponding LLD change to read in this new format: {D128938}.
It will fix the icf-safe.ll test failure on this diff.
[1]: https://discourse.llvm.org/t/problems-with-mach-o-address-significance-table-generation/63392/
Reviewed By: #lld-macho, alx32
Differential Revision: https://reviews.llvm.org/D127637
...with more fixes.
The original patch was reverted in 3e9cc543f2 due to bot failures caused by
a missing dependence on librt. That issue was fixed in 32d8d23cd0, but that
commit also broke sanitizer bots due to a bug in SimplePackedSerialization:
empty ArrayRef<char>s triggered a zero-byte memcpy from a null source. The
ArrayRef<char> serialization issue was fixed in 67220c2ad7, and this patch has
also been updated with a new custom SharedMemorySegFinalizeRequest message that
should avoid serializing empty ArrayRefs in the first place.
https://reviews.llvm.org/D128544
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
only as an afterthought. `genericValueTraversal` did offer an option
but `AAValueSimplify` did not. Thus, we might end up with "too much"
simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
problems like the infinite recursion bug (#54981) as well as code
duplication.
This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.
`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.
We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good even if some tests look like they regress.
Fixes: https://github.com/llvm/llvm-project/issues/54981
Note: A previous version was flawed and consequently reverted in
6555558a80.
- add zstd to `llvm::compression` namespace
- add a CMake option `LLVM_ENABLE_ZSTD` with behavior mirroring that of `LLVM_ENABLE_ZLIB`
- add tests for zstd to `llvm/unittests/Support/CompressionTest.cpp`
- debian users should install libzstd when using `LLVM_ENABLE_ZSTD=FORCE_ON` from source due to this bug https://bugs.launchpad.net/ubuntu/+source/libzstd/+bug/1941956
Reviewed By: leonardchan, MaskRay
Differential Revision: https://reviews.llvm.org/D128465
Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS variable to avoid wasting space for it.
There are a number of implicit arguments accessed by intrinsic already
so this implementation closely follows the existing handling. It is slightly
novel in that this SGPR is written by the kernel prologue.
It is necessary in the general case to put variables at different addresses
such that they can be compactly allocated and thus necessary for an
indirect function call to have some means of determining where a
given variable was allocated. Claiming an arbitrary SGPR into which
an integer can be written by the kernel, in this implementation based
on metadata associated with that kernel, which is then passed on to
indirect call sites is sufficient to determine the variable address.
The intent is to emit a __const array of LDS addresses and index into it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D125060
This patch implements proposal https://lists.llvm.org/pipermail/llvm-dev/2020-August/144579.html
llvm-dwarfutil - is a tool that is used for processing debug info(DWARF) located in built binary files to improve debug info quality, reduce debug info size. The patch currently implements smaller set of command-line options(comparing to the proposal):
```
./llvm-dwarfutil [options] <input file> <output file>
--garbage-collection Do garbage collection for debug info(default)
-j <value> Alias for --num-threads
--no-garbage-collection Don`t do garbage collection for debug info
--no-odr-deduplication Don`t do ODR deduplication for debug types
--no-odr Alias for --no-odr-deduplication
--no-separate-debug-file
Create single output file, containing debug tables(default)
--num-threads <threads> Number of available threads for multi-threaded execution. Defaults to the number of cores on the current machine
--odr-deduplication Do ODR deduplication for debug types(default)
--odr Alias for --odr-deduplication
--separate-debug-file Create two output files: file w/o debug tables and file with debug tables
--tombstone [bfd,maxpc,exec,universal]
Tombstone value used as a marker of invalid address(default: universal)
=bfd - Zero for all addresses and [1,1] for DWARF v4 (or less) address ranges and exec
=maxpc - Minus 1 for all addresses and minus 2 for DWARF v4 (or less) address ranges
=exec - Match with address ranges of executable sections
=universal - Both: bfd and maxpc
```
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D86539
The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XOR mask being a shifted all-bits mask, but we can relax this to only need to match under the demanded bits.
This helps expose more bit extraction/clearing patterns and fixes the PowerPC testCompares*.ll regressions from D127115
Alive2: https://alive2.llvm.org/ce/z/fl7T7K
Differential Revision: https://reviews.llvm.org/D129933
Fixes https://github.com/llvm/llvm-project/issues/56484
H registers are 16 bit views of AArch64's Neon registers and
B are the 8 bit views.
msvc does not support 16 bit float (some mention in DirectX but I
couldn't find a way to get to it) so for lack of a better reference
I'm using:
85c9b41b33/server/references/dia/include/cvconst.h
(the other microsoft-pdb repo is no longer up to date)
Luckily clang does support fp16 so a test is added for that.
There is no 8 bit float type so I had to get creative with the
test case. We're not testing for correct debug info here just
that we can select the B register and not crash in the process.
For FPCR it's never going to be passed as an argument so I've
not added a test for it. It is included to keep our list looking
the same as the reference.
Reviewed By: majnemer
Differential Revision: https://reviews.llvm.org/D129774
This patch implements proposal https://lists.llvm.org/pipermail/llvm-dev/2020-August/144579.html
llvm-dwarfutil - is a tool that is used for processing debug info(DWARF) located in built binary files to improve debug info quality, reduce debug info size. The patch currently implements smaller set of command-line options(comparing to the proposal):
```
./llvm-dwarfutil [options] <input file> <output file>
--garbage-collection Do garbage collection for debug info(default)
-j <value> Alias for --num-threads
--no-garbage-collection Don`t do garbage collection for debug info
--no-odr-deduplication Don`t do ODR deduplication for debug types
--no-odr Alias for --no-odr-deduplication
--no-separate-debug-file
Create single output file, containing debug tables(default)
--num-threads <threads> Number of available threads for multi-threaded execution. Defaults to the number of cores on the current machine
--odr-deduplication Do ODR deduplication for debug types(default)
--odr Alias for --odr-deduplication
--separate-debug-file Create two output files: file w/o debug tables and file with debug tables
--tombstone [bfd,maxpc,exec,universal]
Tombstone value used as a marker of invalid address(default: universal)
=bfd - Zero for all addresses and [1,1] for DWARF v4 (or less) address ranges and exec
=maxpc - Minus 1 for all addresses and minus 2 for DWARF v4 (or less) address ranges
=exec - Match with address ranges of executable sections
=universal - Both: bfd and maxpc
```
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D86539
Some methods of json::Array require json::Value to be completely defined, so
they can't be defined in-class. Fix that by defining them out of class.
Fix#55780
Following discussion in PR56243, we need to somehow detect the situation
when token values penetrate LCSSA form for transforms that require that
it is maintained by all values (for example, to sustain use-def dominance
invarians). This patch introduces a parameter to LCSSA checkers to control
their ignorance about tokens.
Differential Revision: https://reviews.llvm.org/D129983
Reviewed By: efriedma
Avoids a zero-length memcpy from a null src, which caused errors on some of the
sanitizer bots. Also uses null when deserializing an empty ArrayRef (rather
than pointing to a zero length range in the middle of the input buffer).
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
Undef tokens may appear in unreached code as result of RAUW of some optimization,
and it should not be considered as bad IR.
Patch by Dmitry Bakunevich!
Differential Revision: https://reviews.llvm.org/D128904
Reviewed By: mkazantsev
Callbr is no longer an indirect terminator in the sense that is
relevant here (that it's successors cannot be updated). The primary
effect of this change is that callbr no longer prevents formation
of loop simplify form.
I decided to drop the isIndirectTerminator() method entirely and
replace it with isa<IndirectBrInst>() checks. I assume this method
was added to abstract over indirectbr and callbr, but it never
really caught on, and there is nothing left to abstract anymore
at this point.
Differential Revision: https://reviews.llvm.org/D129849
This patch introduce an automatic generation of the clause parser from the TableGen
information.
New information can be stored directly in the TableGen file:
- The different aliases that a clause support.
- prefix before a value.
- whether a prefix is optional or not.
Makes it easier to add new clauses and also avoid some error (`write` clause incorrect until now).
This patch is updating only the OpenACC part. A patch with a modification of the OpenMP clause parser will follow.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D106968
This change introduces the dynamic stack boolean field to code-object-v3
and above under the code properties of the kernel descriptor and under
the kernel metadata map of NT_AMDGPU_METADATA. This field corresponds to
the is_dynamic_callstack field of amd_kernel_code_t.
Differential Revision: https://reviews.llvm.org/D128344
This patch reports number of counts being dropped when a hash-mismatch
happens. This information will be helpful to the users -- if the dropped
counts are large, the user should redo the instrumentation build and
recollect the profile.
Differential Revision: https://reviews.llvm.org/D129001
When an issue exists in the main file (caller) instead of an included file
(callee), using a `src` pattern applying to the included file may be
inappropriate if it's the caller's responsibility. Add `mainfile` prefix to check
the main filename.
For the example below, the issue may reside in a.c (foo should not be called
with a misaligned pointer or foo should switch to an unaligned load), but with
`src` we can only apply to the innocent callee a.h. With this patch we can use
the more appropriate `mainfile:a.c`.
```
//--- a.h
// internal linkage
static inline int load(int *x) { return *x; }
//--- a.c, -fsanitize=alignment
#include "a.h"
int foo(void *x) { return load(x); }
```
See the updated clang/docs/SanitizerSpecialCaseList.rst for a caveat due
to C++ vague linkage functions.
Reviewed By: #sanitizers, kstoimenov, vitalybuka
Differential Revision: https://reviews.llvm.org/D129832
The original commit was reverted in 3e9cc543f2 due to buildbot failures, which
should be fixed by the addition of dependencies on librt.
Differential Revision: https://reviews.llvm.org/D128544
is out of range. Both intrinsics return a poison value.
Consequently, mark the intrinsics speculatable.
Differential Revision: https://reviews.llvm.org/D129656
This is similar to D125680, but for llvm.experimental.patchpoint
(instead of llvm.experimental.stackmap).
Differential review: https://reviews.llvm.org/D129268
Following some recent discussions, this changes the representation
of callbrs in IR. The current blockaddress arguments are replaced
with `!` label constraints that refer directly to callbr indirect
destinations:
; Before:
%res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo))
to label %asm.fallthrough [label %foo]
; After:
%res = callbr i8* asm "", "=r,r,!i"(i8* %x)
to label %asm.fallthrough [label %foo]
The benefit of this is that we can easily update the successors of
a callbr, without having to worry about also updating blockaddress
references. This should allow us to remove some limitations:
* Allow unrolling/peeling/rotation of callbr, or any other
clone-based optimizations
(https://github.com/llvm/llvm-project/issues/41834)
* Allow duplicate successors
(https://github.com/llvm/llvm-project/issues/45248)
This is just the IR representation change though, I will follow up
with patches to remove limtations in various transformation passes
that are no longer needed.
Differential Revision: https://reviews.llvm.org/D129288
This is a followup to D129630, which switches LSR to the member
isSafeToExpand() variant, and removes the freestanding function.
This is done by creating the SCEVExpander early (already during the
analysis phase). Because the SCEVExpander is now available for the
whole lifetime of LSRInstance, I've also made it into a member
variable, rather than passing it around in even more places.
Differential Revision: https://reviews.llvm.org/D129769
clang 14 removed -gz=zlib-gnu and ld.lld/llvm-objcopy removed .zdebug support
recently. llvm-dwp currently doesn't support SHF_COMPRESSED. Add support and
remove .zdebug support.
Simplify llvm::object::Decompressor which has no .zdebug user now.
While here, add tests for ELF32LE, ELF32BE, and ELF64BE.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D129728
As a followup to D129630, this switches a usage of the freestanding
function in LoopPredication to use the member variant instead. This
was the last use of the freestanding function, so drop it entirely.
isSafeToExpand() for addrecs depends on whether the SCEVExpander
will be used in CanonicalMode. At least one caller currently gets
this wrong, resulting in PR50506.
Fix this by a) making the CanonicalMode argument on the freestanding
functions required and b) adding member functions on SCEVExpander
that automatically take the SCEVExpander mode into account. We can
use the latter variant nearly everywhere, and thus make sure that
there is no chance of CanonicalMode mismatch.
Fixes https://github.com/llvm/llvm-project/issues/50506.
Differential Revision: https://reviews.llvm.org/D129630
The linux perf tools use /proc/kcore for disassembly kernel functions.
Actually it copies the relevant parts to a temp file and then pass it to
objdump. But it doesn't have section headers so llvm-objdump cannot
handle it.
Let's create fake section headers for the program headers. It'd have a
single section for each segment to cover the entire range. And for this
purpose we can consider only executable code segments.
With this change, I can see the following command shows proper outputs.
perf annotate --stdio --objdump=/path/to/llvm-objdump
Differential Revision: https://reviews.llvm.org/D128705
The SI machine scheduler inherits from ScheduleDAGMI.
This patch adds support for a few features that are implemented
in ScheduleDAGMI (or its base classes) that were missing so far
because their support is implemented in overridden functions.
* Support cl::opt -view-misched-dags
This option allows to open a graphical window of the scheduling DAG.
* Support cl::opt -misched-print-dags
This option allows to print the scheduling DAG in text form.
* After constructing the scheduling DAG, call postprocessDAG()
to apply any registered DAG mutations.
Note that currently there are no mutations defined in AMDGPUTargetMachine.cpp
in case SIScheduler is used.
Still add this to avoid surprises in the future in case mutations are added.
Differential Revision: https://reviews.llvm.org/D128808
- add `FindZSTD.cmake`
- add zstd to `llvm::compression` namespace
- add a CMake option `LLVM_ENABLE_ZSTD` with behavior mirroring that of `LLVM_ENABLE_ZLIB`
- add tests for zstd to `llvm/unittests/Support/CompressionTest.cpp`
Reviewed By: leonardchan, MaskRay
Differential Revision: https://reviews.llvm.org/D128465
- add `FindZSTD.cmake`
- add zstd to `llvm::compression` namespace
- add a CMake option `LLVM_ENABLE_ZSTD` with behavior mirroring that of `LLVM_ENABLE_ZLIB`
- add tests for zstd to `llvm/unittests/Support/CompressionTest.cpp`
Reviewed By: leonardchan, MaskRay
Differential Revision: https://reviews.llvm.org/D128465
It's more natural to use uint8_t * (std::byte needs C++17 and llvm has
too much uint8_t *) and most callers use uint8_t * instead of char *.
The functions are recently moved into `llvm::compression::zlib::`, so
downstream projects need to make adaption anyway.
This is an implementation of orc::MemoryMapper that maps shared memory
pages in both executor and controller process and writes directly to
them avoiding transferring content over EPC. All allocations are properly
deinitialized automatically on the executor side at shutdown by the
ExecutorSharedMemoryMapperService.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D128544
When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale.
vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.)
We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.)
It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic.
Differential Revision: https://reviews.llvm.org/D129609
The request is mentioned on D129053. I feel that having this functionality is
mildly useful (not strong).
* Rename .ctors to .init_array and change sh_type to SHT_INIT_ARRAY (GNU objcopy
detects the special name but we don't).
* Craft tests for a new SHT_LLVM_* extension
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D129337
For MTE globals, we should have clang emit the attribute for all GV's
that it creates, and then use that in the upcoming AArch64 global
tagging IR pass. We need a positive attribute for this sanitizer (rather
than implicit sanitization of all globals) because it needs to interact
with other parts of LLVM, including:
1. Suppressing certain global optimisations (like merging),
2. Emitting extra directives by the ASM writer, and
3. Putting extra information in the symbol table entries.
While this does technically make the LLVM IR / bitcode format
non-backwards-compatible, nobody should have used this attribute yet,
because it's a no-op.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D128950
Make use of a single FoldUnOpFMF() API, though in practice FNeg
is the only unary operation that exists.
This is likely NFC in practice, because users of InstSimplifyFolder
don't create fneg.
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.
Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.
The warning is off by default as its likely to be somewhat disruptive
otherwise.
This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.
Reviewed By: aaron.ballman, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D128059
In D94439, BumpPtrAllocator changed its implementation to use an empty base optimization for the underlying allocator.
This patch builds on that by extending its functionality to more classes as well as enabling the underlying allocator to be a reference type, something not currently possible as you can't derive from a reference.
The main place this sees use is in StringMaps which often use the default MallocAllocator, yet have to pay the size of a pointer for no reason.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D129206
This reverts commit cc309721d2 because it
breaks the following tests on GreenDragon:
TestDataFormatterObjCCF.py
TestDataFormatterObjCExpr.py
TestDataFormatterObjCKVO.py
TestDataFormatterObjCNSBundle.py
TestDataFormatterObjCNSData.py
TestDataFormatterObjCNSError.py
TestDataFormatterObjCNSNumber.py
TestDataFormatterObjCNSURL.py
TestDataFormatterObjCPlain.py
TestDataFormatterObjNSException.py
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/45288/
Some rework of getStackGuard() based on comments in
https://reviews.llvm.org/D129505.
- getStackGuard() now creates and returns the destination
register, simplifying calls
- the pointer type is passed to getStackGuard() to avoid
recomputation
- removed PtrMemTy in emitSPDescriptorParent(), because
this type is only used here when loading the value but
not when storing the value
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D129576
Adds initial COFF support in JITLink. This is able to run a hello world c program in x86 windows successfully.
Implemented
- COFF object loader
- Static local symbols
- Absolute symbols
- External symbols
- Weak external symbols
- Common symbols
- COFF jitlink-check support
- All COMDAT selection type execpt largest
- Implicit symobl size calculation
- Rel32 relocation with PLT stub.
- IMAGE_REL_AMD64_ADDR32NB relocation
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D128968
It is illegal to merge two `llvm.coro.save` calls unless their
`llvm.coro.suspend` users are also merged. Marks it "nomerge" for
the moment.
This reverts D129025.
Alternative to D129025, which affects other token type users like WinEH.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D129530
The goal of this change is fixing most of compile time slowdown seen after a630ea3003 commit on lencod and sqlite3 benchmarks.
There are 3 improvements included in this patch:
1. In getNumOperands when possible get value directly from SmallNumOps.
2. Inline getLargePtr by moving its definition to header.
3. In TBAAStructTypeNode::getField get all operands once instead taking operands in loop one after one.
Differential Revision: https://reviews.llvm.org/D129468
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.
Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.
The warning is off by default as its likely to be somewhat disruptive
otherwise.
This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.
Reviewed By: aaron.ballman, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D128059
InlineAsm constraint string verification can fail for many reasons,
but used to always print a generic "invalid type for inline asm
constraint string" message -- which is especially confusing if
the actual error is unrelated to the type, e.g. a failure to parse
the constraint string.
Change the verify API to return an Error with a more specific
error message, and print that in the IR parser.
This patch adds OMPIRBuilder support for the simdlen clause for the
simd directive. It uses the simdlen support in OpenMPIRBuilder when
it is enabled in Clang. Simdlen is lowered by OpenMPIRBuilder by
generating the loop.vectorize.width metadata.
Reviewed By: jdoerfert, Meinersbur
Differential Revision: https://reviews.llvm.org/D129149
Summary:
Introduce NeverAlign fragment type.
The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible instructions. NeverAlign fragment ensures that
the next fragment (first instruction in the pair) does not end at a
given alignment boundary by emitting a minimal size nop if necessary.
In effect, it ensures that a pair of macro-fusible instructions is not
split by a given alignment boundary, which is a precondition for
macro-op fusion in modern Intel Cores (64B = cache line size, see Intel
Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode
Pipeline: Macro-Fusion).
This patch introduces functionality used by BOLT when emitting code with
MacroFusion alignment already in place.
The use case is different from BoundaryAlign and instruction bundling:
- BoundaryAlign can be extended to perform the desired alignment for the
first instruction in the macro-op fusion pair (D101817). However, this
approach has higher overhead due to reliance on relaxation as
BoundaryAlign requires in the general case - see
https://reviews.llvm.org/D97982#2710638.
- Instruction bundling: the intent of NeverAlign fragment is to prevent
the first instruction in a pair ending at a given alignment boundary, by
inserting at most one minimum size nop. It's OK if either instruction
crosses the cache line. Padding both instructions using bundles to not
cross the alignment boundary would result in excessive padding. There's
no straightforward way to request instruction bundling to avoid a given
end alignment for the first instruction in the bundle.
LLVM: https://reviews.llvm.org/D97982
Manual rebase conflict history:
https://phabricator.intern.facebook.com/D30142613
Test Plan: sandcastle
Reviewers: #llvm-bolt
Subscribers: phabricatorlinter
Differential Revision: https://phabricator.intern.facebook.com/D31361547
Currently, for vectorised loops that use the get.active.lane.mask
intrinsic we only use the mask for predicated vector operations,
such as masked loads and stores, etc. The loop itself is still
controlled by comparing the canonical induction variable with the
trip count. However, for some targets this is inefficient when it's
cheap to use the mask itself to control the loop.
This patch adds support for using the active lane mask for control
flow by:
1. Generating the active lane mask for the next iteration of the
vector loop, rather than the current one. If there are still any
remaining iterations then at least the first bit of the mask will
be set.
2. Extract the first bit of this mask and use this bit for the
conditional branch.
I did this by creating a new VPActiveLaneMaskPHIRecipe that sets
up the initial PHI values in the vector loop pre-header. I've also
made use of the new BranchOnCond VPInstruction for the final
instruction in the loop region.
Differential Revision: https://reviews.llvm.org/D125301
The following commit https://reviews.llvm.org/D125998 added a static_assert which was triggered on z/OS because bitfields are always aligned to 1 regardless of type.
```
error: static_assert failed due to requirement 'alignof(llvm::SmallVector<llvm::MDOperand, 0>) <= alignof(llvm::MDNode::Header)' "LargeStorageVector too strongly aligned"
```
The solution was to force the alignment to be size_t.
Reviewed By: wolfgangp
Differential Revision: https://reviews.llvm.org/D129369
(Reapply after revert in e9ce1a5880 due to
Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other
than error categories, to be checked in more detail and reapplied
separately.)
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.
Differential Revision: https://reviews.llvm.org/D129120
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.
Differential Revision: https://reviews.llvm.org/D129120
PointerToGOT lowering was accidentally changed from Delta32 to Delta64 in
db37225803. This patch moves it back to Delta32 and renames the generic
aarch64 edge to Delta32ToGOT to avoid the ambiguity.
No test case yet -- I haven't figured out how to write a succinct test case
(this typically appears in CIEs in eh-frames).
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.
Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.
The warning is off by default as its likely to be somewhat disruptive
otherwise.
This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.
Reviewed By: aaron.ballman, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D128059
Previously we added the `push_target_tripcount` function to send the
loop tripcount to the device runtime so we knew how to configure the
teams / threads for execute the loop for a teams distribute construct.
This was implemented as a separate function mostly to avoid changing the
interface for backwards compatbility. Now that we've changed it anyway
and the new interface can take an arbitrary number of arguments via the
struct without changing the ABI, we can move this to the new interface.
This will simplify the runtime by removing unnecessary state between
calls.
Depends on D128550
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D128816
This patch changes the code we generate to enter a target region on the
device. This is in-line with the new definition in the runtime that was
added previously. Additionally we implement this in the OpenMPIRBuilder
so that this code can be shared with Flang in the future.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D128550
* Remove crc32 from zlib compression namespace, people should use the `llvm::crc32` instead.
Reviewed By: MaskRay, leonardchan
Differential Revision: https://reviews.llvm.org/D128754
* Refactor compression namespaces across the project, making way for a possible
introduction of alternatives to zlib compression.
Changes are as follows:
* Relocate the `llvm::zlib` namespace to `llvm::compression::zlib`.
Reviewed By: MaskRay, leonardchan, phosek
Differential Revision: https://reviews.llvm.org/D128953
SelectionDAG has a target hook, getExtendForAtomicOps, which it uses
in the computeKnownBits implementation for ATOMIC_LOAD. This is pretty
ugly (as is having a separate load opcode for atomics), so instead
allow making use of atomic zextload. Enable this for AArch64 since the
DAG path defaults in to the zext behavior.
The tablegen changes are pretty ugly, but partially helps migrate
SelectionDAG from using ISD::ATOMIC_LOAD to regular ISD::LOAD with
atomic memory operands. For now the DAG emitter will emit matchers for
patterns which the DAG will not produce.
I'm still a bit confused by the intent of the isLoad/isStore/isAtomic
bits. The DAG implementation rejects trying to use any of these in
combination. For now I've opted to make the isLoad checks also check
isAtomic, although I think having isLoad and isAtomic set on these
makes most sense.
Move the device_type parser to a separate parser AccDeviceTypeExprList. Preparatory work for D106968.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D106967
Set the isOptional flag for the self clause. Move the optional and parenthesis part of the parser. Update the rest of the code to deal with the optional value.
Preparatory work for D106968.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D106965