Summary:
This patch adds support for emission of following DWARFv5 macro forms
in .debug_macro section.
1. DW_MACRO_start_file
2. DW_MACRO_end_file
3. DW_MACRO_define_strp
4. DW_MACRO_undef_strp.
Reviewed By: dblaikie, ikudrin
Differential Revision: https://reviews.llvm.org/D72828
Summary:
If the decoding functions are called with both start and end pointers
being nullptr, the function will crash due to a nullptr dereference.
This happens because the function does not recognise nullptr as a valid
end pointer.
Obviously, nobody is going to pass null pointers here deliberately, but
it can happen indirectly (as it did for me), when calling these
functions on an ArrayRef, as a default-initialized empty ArrayRef will
have both begin() and end() pointers equal to nullptr.
The fix is to simply remove the nullptr check. Passing nullptr for "end"
with a valid "begin" pointer will still work, as one cannot reach
nullptr by incrementing a valid pointer without triggerring UB.
Reviewers: dblaikie
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77304
Summary:
This patch adds the optional Error argument, and the Cursor variants to
more DataExtractor methods. The functions now behave the same way as
other error-aware functions (they set the error when they fail, and
don't do anything if the error is already set).
I have merged the LEB128 implementations via a template (similarly to
how fixed-size functions are handled) to reduce code duplication.
Depends on D77304.
Reviewers: dblaikie, aprantl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77306
Summary:
This patch adds an optional Error argument to DataExtractor functions
for string extraction, and makes them behave like other DataExtractor
functions (set the error if extraction fails, don't do anything if the
error is already set).
I have merged the StringRef and C string versions of the functions to
reduce code duplication.
Reviewers: dblaikie, MaskRay
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77307
Summary:
This is a roll forward of D77394 minus AlignmentFromAssumptions (which needs to be addressed separately)
Differences from D77394:
- DebugStr() now prints the alignment value or `None` and no more `Align(x)` or `MaybeAlign(x)`
- This is to keep Warning message consistent (CodeGen/SystemZ/alloca-04.ll)
- Removed a few unneeded headers from Alignment (since it's included everywhere it's better to keep the dependencies to a minimum)
Reviewers: courbet
Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77537
Summary:
24 March 2020: LLVM 10.0.0 is out.
I gathered all deprecated function introduced between 9 and 10 and cleaned them up so they will be removed from 11.
> git log -p -S LLVM_ATTRIBUTE_DEPRECATED llvmorg-9.0.0..llvmorg-10.0.0
Reviewers: courbet
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77409
DWARFv5 defines index sections in package files in a slightly different
way than the pre-standard GNU proposal, see Section 7.3.5 in the DWARF
standard and https://gcc.gnu.org/wiki/DebugFissionDWP for GNU proposal.
The main concern here is values for section identifiers, which are
partially overlapped with changed meanings. The patch adds support for
v5 index sections and resolves that difficulty by defining a set of
identifiers for internal use which can represent and distinct values
of both standards.
Differential Revision: https://reviews.llvm.org/D75929
This is a preparation for an upcoming patch which adds support for
DWARFv5 unit index sections. The patch adds tag "_EXT_" to identifiers
which reference sections that are deprecated in the DWARFv5 standard.
See D75929 for the discussion.
Differential Revision: https://reviews.llvm.org/D77141
Move the listing of allowed clauses per OpenMP directive to the new
macro file in `llvm/Frontend/OpenMP`. Also, use a single generic macro
that specifies the directive and one allowed clause explicitly instead
of a dedicated macro per directive.
We save 800 loc and boilerplate for all new directives/clauses with no
functional change. We also need to include the macro file only once and
not once per directive.
Depends on D77112.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D77113
This is a cleanup and normalization patch that also enables reuse with
Flang later on. A follow up will clean up and move the directive ->
clauses mapping.
Reviewed By: fghanim
Differential Revision: https://reviews.llvm.org/D77112
Add a new overload of StaticLibraryDefinitionGenerator::Load that takes a triple
argument and supports loading archives from MachO universal binaries in addition
to regular archives.
The LLI tool is updated to use this overload.
`isKnownReachable` had only interface (always returns true).
Changed it to call `isPotentiallyReachable`.
This change enables deductions of other Abstract Attributes depending on
AAReachability to use reachability information obtained from CFG, and it
can make them stronger.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D76210
This commit was made to settle [[ https://github.com/llvm/llvm-project/issues/175 | this issue on GitHub ]].
I added analysis getters for LoopInfo, DominatorTree, and
PostDominatorTree. And I added a test to show an improvement of the
deduction of `dereferenceable` attribute.
Reviewed By: jdoerfert, uenoku
Differential Revision: https://reviews.llvm.org/D76378
Since D73835 we no longer need to define the whole IRBuilder
implementation in the header. This patch moves some of the larger
methods out of line, into the C++ file.
Differential Revision: https://reviews.llvm.org/D77332
Summary:
Note: This revision is very similar to D62296.
In D75756, we need `getDynamicSymbolIterators()` to skip first NULL symbol in `.dynsym`. And I believe it might be worth pointing this out in a separate patch to gather you experts' opinions.
I have checked that current code base will not be affected by this change.
```
dynamic_symbol_begin()
|- dynamic_symbol_end(): Ok
`- getDynamicSymbolIterators()
|- addDynamicElfSymbols(): llvm/tools/llvm-objdump/llvm-objdump.cpp, Line 934
| Ok, NULL symbol will be omitted by Line 945-947
| StringRef Name = unwrapOrError(Symbol.getName(), Obj->getName());
| if (Name.empty()) continue;
|- dumpSymbolNameFromObject(): llvm/tools/llvm-nm/llvm-nm.cpp, Line 1192
| There's no test for dumping dynamic debugging symbol. This patch helps improve llvm-nm behavior. (we should add test for this later)
`- computeSymbolSizes(): llvm/lib/Object/SymbolSize.cpp, Line 52
|- OProfileJITEventListener::notifyObjectLoaded(): llvm/lib/ExecutionEngine/OProfileJIT/OProfileJITEventListener.cpp, Line 92
| Ok, NULL symbol will be omitted by Line 94-95
| if (!Sym.getType() || *Sym.getType() != SF_Function) continue;
|- IntelJITEventListener::notifyObjectLoaded(): llvm/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp, Line 98
| Ok, NULL symbol will be omitted by Line 124-126 (same as previous one)
|- PerfJITEventListener::notifyObjectLoaded(): llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp, Line 244
| Ok, NULL symbol will be omitted by Line 254-256, (same as previous one)
|- SymbolizableObjectFile::create(): llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp, Line 73
| Ok, NULL symbol will be omitted by Line 75
| res->addSymbol()
| In addSymbol(), Line 167-168
| if (!Sec || (Obj && Obj->section_end() == *Sec)) return std::error_code();
|- dumpCXXData(): llvm/tools/llvm-cxxdump/llvm-cxxdump.cpp, Line 189
| Ok, NULL symbol will be omitted by Line 199-202
| object::section_iterator SecI = *SecIOrErr;
| // Skip external symbols.
| if (SecI == Obj->section_end())
| continue;
`- printLineInfoForInput(): llvm/tools/llvm-rtdyld/llvm-rtdyld.cpp, Line 418
Ok, NULL symbol will be omitted by Line 430-477
if (Type == object::SymbolRef::ST_Function) {
...
}
```
Reviewers: grimar, jhenderson, MaskRay
Reviewed By: jhenderson, MaskRay
Subscribers: rupprecht, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76081
Forward declare DemandedBits in IVDescriptors, and move include
into the cpp file. Also drop the include from LoopUtils, which
does not need it at all.
Summary:
This patch includes two extensions:
1. It extends the GraphDiff to also keep the original list of updates
after legalization, not just the deletes/insert vectors.
It also provides an API to pop the first update (the updates are store
in reverse, such that the first update is at the end of the list)
2. It adds a bool to mark whether the given updates should be applied as
given, or applied in reverse. This moves the task of reversing the
updates (when the caller needs this) to a functionality inside
GraphDiff, versus having the caller do this.
The two changes could be split into two patches, but they seemed
reasonably small to be reviewed together.
Reviewers: kuhar, dblaikie
Subscribers: hiraditya, george.burgess.iv, mgrang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77167
Added unit tests for 2 scenarios that were failing.
Made replace_path_prefix back to 3 parameters instead of 5, simplifying the implementation. The other 2 were always used with the default value.
This commit is intended to be the first of 3:
1) simplify/fix replace_path_prefix.
2) use it in the context of -fdebug-prefix-map and -fmacro-prefix-map (see D76869).
3) Make Windows version of replace_path_prefix insensitive to both case and separators (slash vs backslash).
Differential Revision: https://reviews.llvm.org/D77223
Summary:
For current architect, we always require setContainingCsect to be
called on every MCSymbol got used in XCOFF context.
This is very hard to achieve because symbols gets created everywhere
and other MCSymbol types(ELF, COFF) do not have similar rules.
It's very easy to miss setting the containing csect, and we would
need to add a lot of XCOFF specialized code around some common code area.
This patch intendeds to do
1. Rely on getFragment().getParent() to get csect from labels.
2. Only use get/setRepresentedCsect (was get/setContainingCsect)
if symbol itself represents a csect.
Reviewers: DiggerLin, hubert.reinterpretcast, daltenty
Differential Revision: https://reviews.llvm.org/D77080
The old name was a bit misleading because the functions actually return
contributions to the corresponding sections.
Differential revision: https://reviews.llvm.org/D77302
Summary:
This patch adds parsing and dumping DWARFv5 .debug_macro section in llvm-dwarfdump,
it does not introduce any new switch. Existing switch "--debug-macro"
should be used to dump macinfo or macro section.
Reviewed By: dblaikie, ikudrin, jhenderson
Differential Revision: https://reviews.llvm.org/D73086
Summary:
Add mapping from exp2 math functions
to corresponding SVML calls.
This is a follow up and extension for llvm diff
https://reviews.llvm.org/D19544
Test Plan:
- update test case and run ninja check.
- run tests locally
Reviewers: wenlei, hoyFB, mmasten, mzolotukhin, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77114
Summary:
[llvm][TextAPI] adding inlining reexported libraries support
* this patch adds reader/writer support for MachO tbd files.
The usecase is to represent reexported libraries in top level library
that won't need to exist for linker indirection because all of the
needed content will be inlined in the same document.
Reviewers: ributzka, steven_wu, jhenderson
Reviewed By: ributzka
Subscribers: JDevlieghere, hiraditya, mgrang, dexonsmith, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67646
Summary:
Splitting Knowledge retention into Queries in Analysis and Builder into Transform/Utils
allows Queries and Transform/Utils to use Analysis.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77171
This patch adds
- New arguments to getMinPrefetchStride() to let the target decide on a
per-loop basis if software prefetching should be done even with a stride
within the limit of the hw prefetcher.
- New TTI hook enableWritePrefetching() to let a target do write prefetching
by default (defaults to false).
- In LoopDataPrefetch:
- A search through the whole loop to gather information before emitting any
prefetches. This way the target can get information via new arguments to
getMinPrefetchStride() and emit prefetches more selectively. Collected
information includes: Does the loop have a call, how many memory
accesses, how many of them are strided, how many prefetches will cover
them. This is NFC to before as long as the target does not change its
definition of getMinPrefetchStride().
- If a previous access to the same exact address was 'read', and the
current one is 'write', make it a 'write' prefetch.
- If two accesses that are covered by the same prefetch do not dominate
each other, put the prefetch in a block that dominates both of them.
- If a ConstantMaxTripCount is less than ItersAhead, then skip the loop.
- A SystemZ implementation of getMinPrefetchStride().
Review: Ulrich Weigand, Michael Kruse
Differential Revision: https://reviews.llvm.org/D70228
This removes some includes/forward-declarations that don't seem to be necessary in the MCA core headers
Based off a cppclean report
Differential Revision: https://reviews.llvm.org/D77073
Different file formats have different naming style for the debug
sections. The method is implemented for ELF, COFF and Mach-O formats.
Differential Revision: https://reviews.llvm.org/D76276
This is a cleanup and normalization patch that also enables reuse with
Flang later on. A follow up will clean up and move the directive ->
clauses mapping.
Differential Revision: https://reviews.llvm.org/D77112
If we have a must-tail call the callee and caller need to have matching
ABIs. Part of that is alignment which we might modify when we deduce
alignment of arguments of either. Since we would need to keep them in
sync, which is not as simple, we simply avoid deducing alignment for
arguments of the must-tail caller or callee.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D76673
We create a lot of AbstractAttributes and they live as long as
the Attributor does. It seems reasonable to allocate them via a
BumpPtrAllocator owned by the Attributor.
Reviewed By: baziotis
Differential Revision: https://reviews.llvm.org/D76589
Summary:
Currently, the comparison argument used for ATOMIC_CMP_XCHG is legalised
with GetPromotedInteger, which leaves the upper bits of the value
undefind. Since this is used for comparing in an LR/SC loop with a
full-width comparison, we must sign extend it. We introduce a new
getExtendForAtomicCmpSwapArg to complement getExtendForAtomicOps, since
many targets have compare-and-swap instructions (or pseudos) that
correctly handle an any-extend input, and the existing function
determines the extension of the result, whereas we are concerned with
the input.
This is related to https://reviews.llvm.org/D58829, which solved the
issue for ATOMIC_CMP_SWAP_WITH_SUCCESS, but not the simpler
ATOMIC_CMP_SWAP.
Reviewers: asb, lenary, efriedma
Reviewed By: asb
Subscribers: arichardson, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, evandro, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74453
SUMMARY:
run clang format on the file llvm/include/llvm/MC/MCDirectives.h
Reviewers: Jason liu
Subscribers: rupprecht, seiyai,hiraditya
Differential Revision: https://reviews.llvm.org/D77170
getCallCost is only used within the different layers of TTI, with no
backend implementing it so fold the base implementation into
getUserCost. I think this is an NFC.
Differential Revision: https://reviews.llvm.org/D77050
Summary: These were templated due to SelectionDAG using int masks for shuffles and IR using unsigned masks for shuffles. But now that D72467 has landed we have an int mask version of IRBuilder::CreateShuffleVector. So just use int instead of a template
Reviewers: spatel, efriedma, RKSimon
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D77183
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.
This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.
I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors. Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it. Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.
Differential Revision: https://reviews.llvm.org/D72467
Summary:
CGProfilePass is run by default in certain new pass manager optimization pipeline. Assemblers other than llvm as (such as gnu as) cannot recognize the .cgprofile entries generated and emitted from this pass, causing build time error.
This patch adds new options in clang CodeGenOpts and PassBuilder options so that we can turn cgprofile off when not using integrated assembler.
Reviewers: Bigcheese, xur, george.burgess.iv, chandlerc, manojgupta
Reviewed By: manojgupta
Subscribers: manojgupta, void, hiraditya, dexonsmith, llvm-commits, tcwang, llozano
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D62627
Summary: this patch preserve information from various places in EarlyCSE into assume bundles.
Reviewers: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76769
--no-threads is a name copied from gold.
gold has --no-thread, --thread-count and several other --thread-count-*.
There are needs to customize the number of threads (running several lld
processes concurrently or customizing the number of LTO threads).
Having a single --threads=N is a straightforward replacement of gold's
--no-threads + --thread-count.
--no-threads is used rarely. So just delete --no-threads instead of
keeping it for compatibility for a while.
If --threads= is specified (ELF,wasm; COFF /threads: is similar),
--thinlto-jobs= defaults to --threads=,
otherwise all available hardware threads are used.
There is currently no way to override a --threads={1,2,...}. It is still
a debate whether we should use --threads=all.
Reviewed By: rnk, aganea
Differential Revision: https://reviews.llvm.org/D76885
This is the Waymarking algorithm implemented as an independent utility.
The utility is operating on a range of sequential elements.
First we "tag" the elements, by calling `fillWaymarks`.
Then we can "follow" the tags from every element inside the tagged
range, and reach the "head" (the first element), by calling
`followWaymarks`.
Differential Revision: https://reviews.llvm.org/D74415
This patch updates ValueLattice to distinguish between ranges that are
guaranteed to not include undef and ranges that may include undef.
A constant range guaranteed to not contain undef can be used to simplify
instructions to arbitrary values. A constant range that may contain
undef can only be used to simplify to a constant. If the value can be
undef, it might take a value outside the range. For example, consider
the snipped below
define i32 @f(i32 %a, i1 %c) {
br i1 %c, label %true, label %false
true:
%a.255 = and i32 %a, 255
br label %exit
false:
br label %exit
exit:
%p = phi i32 [ %a.255, %true ], [ undef, %false ]
%f.1 = icmp eq i32 %p, 300
call void @use(i1 %f.1)
%res = and i32 %p, 255
ret i32 %res
}
In the exit block, %p would be a constant range [0, 256) including undef as
%p could be undef. We can use the range information to replace %f.1 with
false because we remove the compare, effectively forcing the use of the
constant to be != 300. We cannot replace %res with %p however, because
if %a would be undef %cond may be true but the second use might not be
< 256.
Currently LazyValueInfo uses the new behavior just when simplifying AND
instructions and does not distinguish between constant ranges with and
without undef otherwise. I think we should address the remaining issues
in LVI incrementally.
Reviewers: efriedma, reames, aqjune, jdoerfert, sstefan1
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D76931
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.
This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.
Use the new intrinsic in the atomic optimizer pass.
Differential Revision: https://reviews.llvm.org/D65088
Leverage ARM ELF build attribute section to create ELF attribute section
for RISC-V. Extract the common part of parsing logic for this section
into ELFAttributeParser.[cpp|h] and ELFAttributes.[cpp|h].
Differential Revision: https://reviews.llvm.org/D74023
Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression when writing/reading the profile. The patch adds the support in extbinary format. It is off by default but user can choose to enable it.
Note the feature of using MD5 in name table can bring very small chance of name conflict leading to profile mismatch. Besides, profile using the feature won't have the profile remapping support.
Differential Revision: https://reviews.llvm.org/D76255
When we have
```
a = G_OR x, x
```
or
```
b = G_AND y, y
```
We can drop the G_OR/G_AND and just use x/y respectively.
Also update arm64-fallback.ll because there was an or in there which hits this
transformation.
Differential Revision: https://reviews.llvm.org/D77105
Implement identity combines for operations like the following:
```
%a = G_SUB %b, 0
```
This can just be replaced with %b.
Over CTMark, this gives some minor size improvements at -O3.
Differential Revision: https://reviews.llvm.org/D76640
Summary:
This code was throwing away the opcode for a boolean, which was then
reconstructing the opcode from that boolean. Just pass the opcode, and
forget the boolean.
Reviewers: srhines
Reviewed By: srhines
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77100
Extend the FileCollector's API with addDirectory which adds a directory
and its contents to the VFS mapping.
Differential revision: https://reviews.llvm.org/D76671
Also add a test case to wasm-ld that asserts without this change.
Internally wasm-ld builds a StringMap of exported functions and it seems
like allowing empty string in the set is preferable to adding checks.
This assert looks like it was most likely just a historical accident.
It started life here purely to support InputLanguagesSet:
eeac27e38c
Then got extracted here:
e57a403338
Then got moved to AST here
5c48bae209
With the `InLang` paramater name still intact which suggested is
InputLanguagesSet origins.
Differential Revision: https://reviews.llvm.org/D74589
Summary:
Code frequently relies upon the results of "is.constant" intrinsics to
DCE invalid code paths. We don't want the intrinsic to be made control-
dependent on any additional values. For instance, we can't split a PHI
into a "constant" and "non-constant" part via jump threading in order
to "optimize" the constant part, because the "is.constant" intrinsic is
meant to return "false".
Reviewers: wmi, kazu, MaskRay
Reviewed By: kazu
Subscribers: jdoerfert, efriedma, joerg, lebedev.ri, nikic, xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75799
Optimize the common case of splat vector constant. For large vector
going through all elements is expensive. For splatr/broadcast cases we
can skip going through all elements.
Differential Revision: https://reviews.llvm.org/D76664
Summary:
This change adds amdgcn.reloc.constant intrinsic to the amdgpu backend, which will compile into a relocation entry in the resulting elf.
The intrinsics takes a MetadataNode (String) as its only argument, which specifies the symbol name of the relocation entry.
`SelectionDAGBuilder::getValueImpl` is changed to allow metadata operands passed through to ISel.
Author: csyonghe <yonghe@google.com>
Reviewers: tpr, nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76440
For each natural loop with multiple exit blocks, this pass creates a
new block N such that all exiting blocks now branch to N, and then
control flow is redistributed to all the original exit blocks.
The bulk of the tranformation is a new function introduced in
BasicBlockUtils that an redirect control flow from a set of incoming
blocks to a set of outgoing blocks via a common "hub".
This is a useful workaround for a limitation in the structurizer which
incorrectly orders blocks when processing a nest of loops. This pass
bypasses that issue by ensuring that each natural loop is recognized
as a separate region. Since the structurizer is a region pass, it no
longer sees a nest of loops in a single region, and instead processes
each "level" in the nesting as a separate region.
The AMDGPU backend provides a new option to enable this pass before
the structurizer, which may eventually be enabled by default.
Reviewers: madhur13490, arsenm, nhaehnle
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D75865
Summary:
Also deprecate getOriginalAlignment, getAlignment will take much more time as it is pervasive through the codebase (including TableGened files).
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76933
Aligned_alloc is a standard lib function and has been in glibc since
2.16 and in the C11 standard. It has semantics similar to malloc/calloc
for several analyses/transforms. This patch introduces aligned_alloc
in target library info and memory builtins. Subsequent ones will
make other passes aware and fix https://bugs.llvm.org/show_bug.cgi?id=44062
This change will also be useful to LLVM generators that need to allocate
buffers of vector elements larger than 16 bytes (for eg. 256-bit ones),
element boundary alignment for which is not typically provided by glibc malloc.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76970
This is safe if the iterator type is a pointer and the comparator is
stateless. The enable_if pattern I'm adding here only uses
array_pod_sort for the default comparator (std::less).
Using array_pod_sort has a potential performance impact, but I didn't
notice anything when testing clang. Sorting doesn't seem to be on the
hot path anywhere in LLVM.
Shrinks Release+Asserts clang by 73k.
MC already knows how to emulate the .weak directive (with its ELF
semantics; i.e., an undefined weak symbol resolves to 0, and a defined
weak symbol has lower link precedence than a strong symbol of the same
name) using COFF weak externals. Plumb this through the ASM printer too,
so that definitions marked with __attribute__((weak)) at the language
level (which gets translated to weak linkage at the IR level) have the
corresponding .weak directive emitted. Note that declarations marked
with __attribute__((weak)) at the language level (which translates to
extern_weak at the IR level) already have .weak directives emitted.
Weak*/linkonce* symbols without an associated comdat (in particular, ones
generated with __attribute__((weak)) in C/C++) were earlier emitted as
normal unique globals, as the comdat is required to provide the linkonce
semantics. This change makes sure they are emitted as .weak instead,
allowing other symbols to override them.
Rename the existing coff-weak.ll test to coff-linkonce.ll. I'm not
quite sure what that test covers, since the behavior being tested in it
(the emission of a one_only section) is just a result of passing
-function-sections to llc; the linkonce_odr makes no difference.
Add a new coff-weak.ll which tests the new directive emission.
Based on an previous patch by Shoaib Meenai.
Differential Revision: https://reviews.llvm.org/D44543
This change implements constant folding to constrained versions of
intrinsics, implementing rounding: floor, ceil, trunc, round, rint and
nearbyint.
Differential Revision: https://reviews.llvm.org/D72930
Extend the FileCollector's API with addDirectory which adds a directory
and its contents to the VFS mapping.
Differential revision: https://reviews.llvm.org/D76671
Without this some instances of copy construction would use the
converting constructor & lead to the destination function_ref referring
to the source function_ref instead of the underlying functor.
Discovered in feedback from 857bf5da35
Thanks to Johannes Doerfert, Arthur O'Dwyer, and Richard Smith for the
discussion and debugging.
The current implementation of the JSONWriter does not support writing
out directory entries. Earlier today I added a unit test to illustrate
the problem. When an entry is added to the YAMLVFSWriter and the path is
a directory, it will incorrectly emit the directory as a file, and any
files inside that directory will not be found by the VFS.
It's possible to partially work around the issue by only adding "leaf
nodes" (files) to the YAMLVFSWriter. However, this doesn't work for
representing empty directories. This is a problem for clients of the VFS
that want to iterate over a directory. The directory not being there is
not the same as the directory being empty.
This is not just a hypothetical problem. The FileCollector for example
does not differentiate between file and directory paths. I temporarily
worked around the issue for LLDB by ignoring directories, but I suspect
this will prove problematic sooner rather than later.
This patch fixes the issue by extending the JSONWriter to support
writing out directory entries. We store whether an entry should be
emitted as a file or directory.
Differential revision: https://reviews.llvm.org/D76670
Summary:
The method is used where TypeSize is implicitly cast to integer for
being checked against 0.
Reviewers: sdesmalen, efriedma
Reviewed By: sdesmalen, efriedma
Subscribers: efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76748
Make these behave the same way unsafe-fp-math and co. The command line
flag should add the attribute to functions that do not already have
it, and leave existing attributes. The attribute is the actual
implementation, but the flag is useful in some testing situations.
AMDGPU has a variety of tests with denormals enabled/disabled that
would require a painful level of test duplication without a flag. This
doesn't expose setting the separate input/output modes, or add a flag
for the f32 version yet.
Tests will be included in future patch.
Generalizes D61992. In GNU as, the .reloc directive supports arbitrary relocation types.
A MCFixupKind value `V` larger than or equal to FirstLiteralRelocationKind
is used to represent the relocation type whose number is V-FirstLiteralRelocationKind.
This is useful for linker tests. Without the feature the assembler
cannot produce certain relocation records (e.g. R_ARM_ALU_PC_G0/R_ARM_LDR_PC_G0)
This helps move forward D75349 and D76575.
Differential Revision: https://reviews.llvm.org/D76746
This flag can be used to mark a symbol as existing only for the purpose of
enabling materialization. Such a symbol can be looked up to trigger
materialization with the lookup returning only once materialization is
complete. Symbols with this flag will never resolve however (to avoid
permanently polluting the symbol table), and should only be looked up using
the SymbolLookupFlags::WeaklyReferencedSymbol flag. The primary use case for
this flag is initialization symbols.
Summary:
Implement several XCOFF hooks to get '-r' option working for llvm-objdump -r.
Reviewer: DiggerLin, hubert.reinterpretcast, jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D75131
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, jrtc27, atanasyan, jfb, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76925
Before this patch, it wasn't possible to extend the ThinLTO threads to all SMT/CMT threads in the system. Only one thread per core was allowed, instructed by usage of llvm::heavyweight_hardware_concurrency() in the ThinLTO code. Any number passed to the LLD flag /opt:lldltojobs=..., or any other ThinLTO-specific flag, was previously interpreted in the context of llvm::heavyweight_hardware_concurrency(), which means SMT disabled.
One can now say in LLD:
/opt:lldltojobs=0 -- Use one std::thread / hardware core in the system (no SMT). Default value if flag not specified.
/opt:lldltojobs=N -- Limit usage to N threads, regardless of usage of heavyweight_hardware_concurrency().
/opt:lldltojobs=all -- Use all hardware threads in the system. Equivalent to /opt:lldltojobs=$(nproc) on Linux and /opt:lldltojobs=%NUMBER_OF_PROCESSORS% on Windows. When an affinity mask is set for the process, threads will be created only for the cores selected by the mask.
When N > number-of-hardware-threads-in-the-system, the threads in the thread pool will be dispatched equally on all CPU sockets (tested only on Windows).
When N <= number-of-hardware-threads-on-a-CPU-socket, the threads will remain on the CPU socket where the process started (only on Windows).
Differential Revision: https://reviews.llvm.org/D75153
This is the first part extracted from D71179 and cleaned up.
This patch provides parsing support for `omp begin/end declare variant`,
as defined in OpenMP technical report 8 (TR8) [0].
A major purpose of this patch is to provide proper math.h/cmath support
for OpenMP target offloading. See PR42061, PR42798, PR42799. The current
code was developed with this feature in mind, see [1].
[0] https://www.openmp.org/wp-content/uploads/openmp-TR8.pdf
[1] https://reviews.llvm.org/D61399#change-496lQkg0mhRN
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D74941
Summary: This patch is the first effort to adding basic optimizations for FREEZE in SelDag.
Reviewers: spatel, lebedev.ri
Reviewed By: spatel
Subscribers: xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76707
This can avoid all sorts of mistakes with implicit conversion
(indirectly) to int, etc. I'm quite surprise there aren't any things to
fixup with this - but I guess most uses of function_ref aren't
optional/nullable.
These transforms rely on a vector reduction flag on the SDNode
set by SelectionDAGBuilder. This flag exists because SelectionDAG
can't see across basic blocks so SelectionDAGBuilder is looking
across and saving the info. X86 is the only target that uses this
flag currently. By removing the X86 code we can remove the flag
and the SelectionDAGBuilder code.
This pass adds a dedicated IR pass for X86 that looks across the
blocks and transforms the IR into a form that the X86 SelectionDAG
can finish.
An advantage of this new approach is that we can enhance it to
shrink the phi nodes and final reduction tree based on the zeroes
that we need to concatenate to bring the partially reduced
reduction back up to the original width.
Differential Revision: https://reviews.llvm.org/D76649
SUMMARY:
SUMMARY
for a source file "test.c"
void foo() {};
llc will generate assembly code as (assembly patch)
.globl foo
.globl .foo
.csect foo[DS]
foo:
.long .foo
.long TOC[TC0]
.long 0
and symbol table as (xcoff object file)
[4] m 0x00000004 .data 1 unamex foo
[5] a4 0x0000000c 0 0 SD DS 0 0
[6] m 0x00000004 .data 1 extern foo
[7] a4 0x00000004 0 0 LD DS 0 0
After first patch, the assembly will be as
.globl foo[DS] # -- Begin function foo
.globl .foo
.align 2
.csect foo[DS]
.long .foo
.long TOC[TC0]
.long 0
and symbol table will as
[6] m 0x00000004 .data 1 extern foo
[7] a4 0x00000004 0 0 DS DS 0 0
Change the code for the assembly path and xcoff objectfile patch for llc.
Reviewers: Jason Liu
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D76162
Summary:
- Even though the bindless surface/texture interfaces are promoted,
there are still code using surface/texture references. For example,
[PR#26400](https://bugs.llvm.org/show_bug.cgi?id=26400) reports the
compilation issue for code using `tex2D` with texture references. For
better compatibility, this patch proposes the support of
surface/texture references.
- Due to the absent documentation and magic headers, it's believed that
`nvcc` does use builtins for texture support. From the limited NVVM
documentation[^nvvm] and NVPTX backend texture/surface related
tests[^test], it's believed that surface/texture references are
supported by replacing their reference types, which are annotated with
`device_builtin_surface_type`/`device_builtin_texture_type`, with the
corresponding handle-like object types, `cudaSurfaceObject_t` or
`cudaTextureObject_t`, in the device-side compilation. On the host
side, that global handle variables are registered and will be
established and updated later when corresponding binding/unbinding
APIs are called[^bind]. Surface/texture references are most like
device global variables but represented in different types on the host
and device sides.
- In this patch, the following changes are proposed to support that
behavior:
+ Refine `device_builtin_surface_type` and
`device_builtin_texture_type` attributes to be applied on `Type`
decl only to check whether a variable is of the surface/texture
reference type.
+ Add hooks in code generation to replace that reference types with
the correponding object types as well as all accesses to them. In
particular, `nvvm.texsurf.handle.internal` should be used to load
object handles from global reference variables[^texsurf] as well as
metadata annotations.
+ Generate host-side registration with proper template argument
parsing.
---
[^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf
[^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
[^bind]: See section 3.2.11.1.2 ``Texture reference API` in [CUDA C Programming Guide](https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf).
[^texsurf]: According to NVVM IR, `nvvm.texsurf.handle` should be used. But, the current backend doesn't have that supported. We may revise that later.
Reviewers: tra, rjmccall, yaxunl, a.sidorin
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76365
```
// llvm-objdump -d output (before)
400000: e8 0b 00 00 00 callq 11
400005: e8 0b 00 00 00 callq 11
// llvm-objdump -d output (after)
400000: e8 0b 00 00 00 callq 0x400010
400005: e8 0b 00 00 00 callq 0x400015
// GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled
400000: e8 0b 00 00 00 callq 400010
400005: e8 0b 00 00 00 callq 400015
```
In llvm-objdump, we pass the address of the next MCInst. Ideally we
should just thread the address of the current address, unfortunately we
cannot call X86MCCodeEmitter::encodeInstruction (X86MCCodeEmitter
requires MCInstrInfo and MCContext) to get the length of the MCInst.
MCInstPrinter::printInst has other callers (e.g llvm-mc -filetype=asm, llvm-mca) which set Address to 0.
They leave MCInstPrinter::PrintBranchImmAsAddress as false and this change is a no-op for them.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D76580
Follow-up of D72172 and D72180
This patch passes `uint64_t Address` to print methods of PC-relative
operands so that subsequent target specific patches can change
`*InstPrinter::print{Operand,PCRelImm,...}` to customize the output.
Add MCInstPrinter::PrintBranchImmAsAddress which is set to true by
llvm-objdump.
```
// Current llvm-objdump -d output
aarch64: 20000: bl #0
ppc: 20000: bl .+4
x86: 20000: callq 0
// Ideal output
aarch64: 20000: bl 0x20000
ppc: 20000: bl 0x20004
x86: 20000: callq 0x20005
// GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled
aarch64: 20000: bl 20000
ppc: 20000: bl 0x20004
x86: 20000: callq 20005
```
In `lib/Target/X86/X86GenAsmWriter1.inc` (generated by `llvm-tblgen -gen-asm-writer`):
```
case 12:
// CALL64pcrel32, CALLpcrel16, CALLpcrel32, EH_SjLj_Setup, JCXZ, JECXZ, J...
- printPCRelImm(MI, 0, O);
+ printPCRelImm(MI, Address, 0, O);
return;
```
Some targets have 2 `printOperand` overloads, one without `Address` and
one with `Address`. They should annotate derived `Operand` properly with
`let OperandType = "OPERAND_PCREL"`.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D76574
Summary:
The existing helper function can only create a libcall to functions available in
RTLIB. Add a helper function that can create a libcall to a given function name
using the provided calling convention.
Reviewers: aditya_nandakumar, t.p.northover, rovka, arsenm, dsanders
Reviewed By: arsenm
Subscribers: wdng, hiraditya, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76845
This is a NFC splitted from D75342.
Previously obj2yaml never dumped a normal SHT_NULL section (i.e. when it is just zeroed)
or non-allocatable SHT_STRTAB/SHT_SYMTAB/SHT_DYNSYM sections.
This patch does not change the output, but it changes the logic so that we now dump these
sections, and them remove them later. It allows us to create and work with our internal representation
of sections, i.e. to work with the vector of Chunks, what looks cleaner.
It is used by D75342 and also should help us to support dumping a content that does not
belong to a section (i.e. to dump some data as `Fill` chunks).
Differential revision: https://reviews.llvm.org/D76684
When Clang crashes a useful message is output:
"PLEASE submit a bug report to https://bugs.llvm.org/ and include the
crash backtrace, preprocessed source, and associated run script."
A similar message is now output for all tools.
Differential Revision: https://reviews.llvm.org/D74324
Summary:
This patch adds initial support for the following intrinsics:
* llvm.aarch64.sve.st2
* llvm.aarch64.sve.st3
* llvm.aarch64.sve.st4
For storing two, three and four vectors worth of data. Basic codegen for
reg+immediate forms are implemented. Reg+reg addressing modes will be
addressed in a later patch.
These intrinsics are intended for use in the Arm C Language Extension
(ACLE).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75947
Summary:
This patch introduces command-line support for the Armv8.6-a architecture and assembly support for BFloat16. Details can be found
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
in addition to the GCC patch for the 8..6-a CLI:
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg02647.html
In detail this patch
- march options for armv8.6-a
- BFloat16 assembly
This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.
Based on work by:
- labrinea
- MarkMurrayARM
- Luke Cheeseman
- Javed Asbar
- Mikhail Maltsev
- Luke Geeson
Reviewers: SjoerdMeijer, craig.topper, rjmccall, jfb, LukeGeeson
Reviewed By: SjoerdMeijer
Subscribers: stuij, kristof.beyls, hiraditya, dexonsmith, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D76062
This particular debug shows all the time when I'm processing any MLIR
module:
discovered a new reachable node ^bb0
discovered a new reachable node ^bb0
discovered a new reachable node ^bb0
discovered a new reachable node ^bb0
...
(repeated x1875)
I think that printing all the basic blocks in the function is likely low
value enough that we can get away with removing this.
Differential Revision: https://reviews.llvm.org/D76813
Summary:
Rename `pred_const_range` to `const_pred_range` to make it consistent with
the other pred/succ iterator definitions.
Reviewers: nicholas, dblaikie, nlewycky
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75962
Summary:
Rename `succ_const_iterator` to `const_succ_iterator` and
`succ_const_range` to `const_succ_range` for consistency with the
predecessor iterators, and the corresponding iterators in
MachineBasicBlock.
Reviewers: nicholas, dblaikie, nlewycky
Subscribers: hiraditya, bmahjour, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75952
Summary:
This patch implements the following CDE intrinsics:
T __arm_vcx1q_m(int coproc, T inactive, uint32_t imm, mve_pred_t p);
T __arm_vcx2q_m(int coproc, T inactive, U n, uint32_t imm, mve_pred_t p);
T __arm_vcx3q_m(int coproc, T inactive, U n, V m, uint32_t imm, mve_pred_t p);
T __arm_vcx1qa_m(int coproc, T acc, uint32_t imm, mve_pred_t p);
T __arm_vcx2qa_m(int coproc, T acc, U n, uint32_t imm, mve_pred_t p);
T __arm_vcx3qa_m(int coproc, T acc, U n, V m, uint32_t imm, mve_pred_t p);
The intrinsics are not part of the released ACLE spec, but internally at
Arm we have reached consensus to add them to the next ACLE release.
Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen
Reviewed By: simon_tatham
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76610
Summary:
One change: because there's no way to signal failure individually for
each cursor, we now "succeed" with an empty range with no parent if a
cursor doesn't point at anything.
Reviewers: usaxena95
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76741
to remap object file paths (but no source paths) before
processing. This is meant to be used for Clang objects where the
module cache location was remapped using ``-fdebug-prefix-map``; to
help dsymutil find the Clang module cache.
<rdar://problem/55685132>
Differential Revision: https://reviews.llvm.org/D76391
For some operations, the type is unimportant and only the number of
bits matters. For example I don't want to treat <4 x s8> as a legal
type, but I also don't want to decompose loads of this into smaller
pieces to get legal register types.
On AMDGPU in SelectionDAG, we legalize a number of operations (most
notably load and store) by coercing all types to vectors of i32. For
GlobalISel, I'm trying very hard to avoid doing this for every type,
but I don't think this strategy can be completely avoided. I'm trying
to avoid bitcasts for any legitimately legal type we can operate on,
since the intervening bitcasts have proven to be a hassle.
For loads, I think I can get away without ever casting the result
type, and handling any arbitrary bitwidth during selection (I will
eventually want new tablegen support to help with this, rather than
having to add every possible type as legal). The unmerge required to
do anything with the value should expand to the expected shifts. This
is trickier for stores, since it would now require handling a wide
array of truncates during selection which I don't want.
Future potentially interesting case are for vector indexing, where
sub-dword type should be indexed in s32 pieces.
This patch integrates operand bundle llvm.assumes [0] with the
Attributor. Most IRAttributes will now look at uses of the associated
value and if there are llvm.assume operand bundle uses with the right
tag we will check if they are in the must-be-executed-context (around
the context instruction). Droppable users, which is currently only
llvm::assume, are handled special in some places now as well.
[0] http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D74888
Record the address of a tail-calling branch instruction within its call
site entry using DW_AT_call_pc. This allows a debugger to determine the
address to use when creating aritificial frames.
This creates an extra attribute + relocation at tail call sites, which
constitute 3-5% of all call sites in xnu/clang respectively.
rdar://60307600
Differential Revision: https://reviews.llvm.org/D76336
The initial implementation just delegates to APInt's implementation of
XOR for single element ranges and conservatively returns the full set
otherwise.
Reviewers: nikic, spatel, lebedev.ri
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D76453
The algorithm supports both assigning a fixed offset to a field prior to
layout and allowing fields to have sizes that aren't multiples of their
required alignments. This means that the well-known algorithm of sorting
by decreasing alignment isn't always good enough. Still, we start with
that, and only if that leaves padding around do we fall back on a greedy
padding-minimizing algorithm.
There is no known efficient algorithm for producing a guaranteed-minimal
layout in all cases. In fact, allowing arbitrary fixed-offset fields means
there's a straightforward reduction from bin-packing, making this NP-hard.
But as usual with such problems, we can still efficiently produce adequate
solutions to the cases that matter most to us.
I intend to use this in coroutine frame layout, where the retcon lowerings
very badly want to minimize total space usage, and where the switch lowering
can indeed produce a header with interior padding if the promise field is
highly-aligned. But it may be useful in a much wider variety of situations.
When we find something like this:
```
%a:_(s32) = G_SOMETHING ...
...
%select:_(s32) = G_SELECT %cond(s1), %a, %a
```
We can remove the select and just replace it entirely with `%a` because it's
always going to result in `%a`.
Same if we have
```
%select:_(s32) = G_SELECT %cond(s1), %a, %b
```
where we can deduce that `%a == %b`.
This implements the following cases:
- `%select:_(s32) = G_SELECT %cond(s1), %a, %a` -> `%a`
- `%select:_(s32) = G_SELECT %cond(s1), %a, %some_copy_from_a` -> `%a`
- `%select:_(s32) = G_SELECT %cond(s1), %a, %b` -> `%a` when `%a` and `%b`
are defined by identical instructions
This gives a few minor code size improvements on CTMark at -O3 for AArch64.
Differential Revision: https://reviews.llvm.org/D76523
Summary:
When building a large Xcode project with multiple module dependencies, and mixed Objective-C & Swift, I observed a large number of clang processes stalling at zero CPU for 30+ seconds throughout the build. This was especially prevalent on my 18-core iMac Pro.
After some sampling, the major cause appears to be the lock file implementation for precompiled modules in the module cache. When the lock is heavily contended by multiple clang processes, the exponential backoff runs in lockstep, with some of the processes sleeping for 30+ seconds in order to acquire the file lock.
In the attached patch, I implemented a more aggressive polling mechanism that limits the sleep interval to a max of 500ms, and randomizes the wait time. I preserved a limited form of exponential backoff. I also updated the code to use cross-platform timing, thread sleep, and random number capabilities available in C++11.
On iMac Pro (2.3 GHz Intel Xeon W, 18 core):
Xcode 11.1 bundled clang:
502.2 seconds (average of 5 runs)
Custom clang build with LockFileManager patch applied:
276.6 seconds (average of 5 runs)
This is a 1.82x speedup for this use case.
On MacBook Pro (4 core 3.1GHz Intel i7):
Xcode 11.1 bundled clang:
539.4 seconds (average of 2 runs)
Custom clang build with LockFileManager patch applied:
509.5 seconds (average of 2 runs)
As expected, machines with fewer cores benefit less from this change.
```
Call graph:
2992 Thread_393602 DispatchQueue_1: com.apple.main-thread (serial)
2992 start (in libdyld.dylib) + 1 [0x7fff6a1683d5]
2992 main (in clang) + 297 [0x1097a1059]
2992 driver_main(int, char const**) (in clang) + 2803 [0x1097a5513]
2992 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (in clang) + 1608 [0x1097a7cc8]
2992 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (in clang) + 3299 [0x1097dace3]
2992 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (in clang) + 509 [0x1097dcc1d]
2992 clang::FrontendAction::Execute() (in clang) + 42 [0x109818b3a]
2992 clang::ParseAST(clang::Sema&, bool, bool) (in clang) + 185 [0x10981b369]
2992 clang::Parser::ParseFirstTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&) (in clang) + 37 [0x10983e9b5]
2992 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&) (in clang) + 141 [0x10983ecfd]
2992 clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) (in clang) + 695 [0x10983f3b7]
2992 clang::Parser::ParseObjCAtDirectives(clang::Parser::ParsedAttributesWithRange&) (in clang) + 637 [0x10a9be9bd]
2992 clang::Parser::ParseModuleImport(clang::SourceLocation) (in clang) + 170 [0x10c4841ba]
2992 clang::Parser::ParseModuleName(clang::SourceLocation, llvm::SmallVectorImpl<std::__1::pair<clang::IdentifierInfo*, clang::SourceLocation> >&, bool) (in clang) + 503 [0x10c485267]
2992 clang::Preprocessor::Lex(clang::Token&) (in clang) + 316 [0x1098285cc]
2992 clang::Preprocessor::LexAfterModuleImport(clang::Token&) (in clang) + 690 [0x10cc7af62]
2992 clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<std::__1::pair<clang::IdentifierInfo*, clang::SourceLocation> >, clang::Module::NameVisibilityKind, bool) (in clang) + 7989 [0x10bba6535]
2992 compileAndLoadModule(clang::CompilerInstance&, clang::SourceLocation, clang::SourceLocation, clang::Module*, llvm::StringRef) (in clang) + 296 [0x10bba8318]
2992 llvm::LockFileManager::waitForUnlock() (in clang) + 91 [0x10b6953ab]
2992 nanosleep (in libsystem_c.dylib) + 199 [0x7fff6a22c914]
2992 __semwait_signal (in libsystem_kernel.dylib) + 10 [0x7fff6a2a0f32]
```
Differential Revision: https://reviews.llvm.org/D69575
Since intrinsics can now specify when an argument is required to be
constant, it is now OK to replace arguments with variables if they
aren't. This means intrinsics must now be accurately marked with
immarg.
I think we can save the MRI argument from these since it's in
GISelKnownBits already, but currently not accessible.
Implementation deferred to avoid dependency on other patches.
This is NFC-ish. The results should be identical, but perf is hopefully
better with the fast-path for no scaling. Added a unit test for that.
The code is adapted from what used to be the DAGCombiner equivalent
function before D76508 (rG0eeee83d7513).
Summary:
Add new generic MIR opcodes G_SADDSAT etc. Add support in IRTranslator
for translating the saturating add/subtract intrinsics to the new
opcodes.
Reviewers: aemerson, dsanders, paquette, arsenm
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76600
We have some long-standing missing shuffle optimizations that could
use this transform via VectorCombine now:
https://bugs.llvm.org/show_bug.cgi?id=35454
(and we still don't get that case in the backend either)
This function is apparently templated because there's existing code
in IR that treats mask values as unsigned and backend code that
treats masks values as signed.
The mask values are not endian-dependent (as shown by the existing
bitcast transform from DAGCombiner).
Differential Revision: https://reviews.llvm.org/D76508
D75801 removed the last and only user of this option, so we can
drop it now. The original idea behind this was to only run expensive
transforms under -O3, but apart from the one known bits transform,
this has never really taken off. I believe nowadays the recommendation
is to put expensive transforms in AggressiveInstCombine instead,
though that isn't terribly popular either :)
Differential Revision: https://reviews.llvm.org/D76540
This is the Waymarking algorithm implemented as an independent utility.
The utility is operating on a range of sequential elements.
First we "tag" the elements, by calling `fillWaymarks`.
Then we can "follow" the tags from every element inside the tagged
range, and reach the "head" (the first element), by calling
`followWaymarks`.
Differential Revision: https://reviews.llvm.org/D74415
Summary:
DataLayout::getTypeStoreSize() returns TypeSize.
For cases where it can not be scalable vector (e.g., GlobalVariable),
explicitly call TypeSize::getFixedSize().
For cases where scalable property doesn't matter, (e.g., check for
zero-sized type), use TypeSize::isNonZero().
Reviewers: sdesmalen, efriedma, apazos, reames
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76454
advanceToLowerBound moves an iterator to the first bit set at, or after,
the given index. This can be faster than doing IntervalMap::find.
rdar://60046261
Differential Revision: https://reviews.llvm.org/D76466
Avoid making a heap allocation when constructing a CoalescingBitVector.
This reduces time spent in LiveDebugValues when compiling sqlite3 by
700ms (0.5% of the total User Time).
rdar://60046261
Differential Revision: https://reviews.llvm.org/D76465
The sll/srl/sra scalar vector shifts can be replaced with generic shifts if the shift amount is known to be in range.
This also required public DemandedElts variants of llvm::computeKnownBits to be exposed (PR36319).
Summary:
I've implemented them as target-specific IR intrinsics rather than
using `@llvm.experimental.vector.reduce.add`, on the grounds that the
'experimental' intrinsic doesn't currently have much code generation
benefit, and my replacements encapsulate the sign- or zero-extension
so that you don't expose the illegal MVE vector type (`<4 x i64>`) in
IR.
The machine instructions come in two versions: with and without an
input accumulator. My new IR intrinsics, like the 'experimental' one,
don't take an accumulator parameter: we represent that by just adding
on the input value using an ordinary i32 or i64 add. So if you write
the `vaddvaq` C-language intrinsic with an input accumulator of zero,
it can be optimised to VADDV, and conversely, if you write something
like `x += vaddvq(y)` then that can be combined into VADDVA.
Most of this is achieved in isel lowering, by converting these IR
intrinsics into the existing `ARMISD::VADDV` family of custom SDNode
types. For the difficult case (64-bit accumulators), isel lowering
already implements the optimization of folding an addition into a
VADDLV to make a VADDLVA; so once we've made a VADDLV, our job is
already done, except that I had to introduce a parallel set of ARMISD
nodes for the //predicated// forms of VADDLV.
For the simpler VADDV, we handle the predicated form by just leaving
the IR intrinsic alone and matching it in an ordinary dag pattern.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76491
Summary:
I've implemented these as target-specific IR intrinsics, because
they're not //quite// enough like @llvm.experimental.vector.reduce.min
(which doesn't take the extra scalar parameter). Also this keeps the
predicated and unpredicated versions looking similar, and the
floating-point minnm/maxnm versions fold into the same schema.
We had a couple of min/max reductions already implemented, from the
initial pathfinding exercise in D67158. Those were done by having
separate IR intrinsic names for the signed and unsigned integer
versions; as part of this commit, I've changed them to use a flag
parameter indicating signedness, which is how we ended up deciding
that the rest of the MVE intrinsics family ought to work. So now
hopefully the ewhole lot is consistent.
In the new llc test, the output code from the `v8f16` test functions
looks quite unpleasant, but most of it is PCS lowering (you can't pass
a `half` directly in or out of a function). In other circumstances,
where you do something else with your `half` in the same function, it
doesn't look nearly as nasty.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76490
Summary:
This patch implements the following intrinsics:
uint8x16_t __arm_vcx1q_u8 (int coproc, uint32_t imm);
T __arm_vcx1qa(int coproc, T acc, uint32_t imm);
T __arm_vcx2q(int coproc, T n, uint32_t imm);
uint8x16_t __arm_vcx2q_u8(int coproc, T n, uint32_t imm);
T __arm_vcx2qa(int coproc, T acc, U n, uint32_t imm);
T __arm_vcx3q(int coproc, T n, U m, uint32_t imm);
uint8x16_t __arm_vcx3q_u8(int coproc, T n, U m, uint32_t imm);
T __arm_vcx3qa(int coproc, T acc, U n, V m, uint32_t imm);
Most of them are polymorphic. Furthermore, some intrinsics are
polymorphic by 2 or 3 parameter types, such polymorphism is not
supported by the existing MVE/CDE tablegen backends, also we don't
really want to have a combinatorial explosion caused by 1000 different
combinations of 3 vector types. Because of this some intrinsics are
implemented as macros involving a cast of the polymorphic arguments to
uint8x16_t.
The IR intrinsics are even more restricted in terms of types: all MVE
vectors are cast to v16i8.
Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76299
Summary:
This change implements ACLE CDE intrinsics that translate to
instructions working with general-purpose registers.
The specification is available at
https://static.docs.arm.com/101028/0010/ACLE_2019Q4_release-0010.pdf
Each ACLE intrinsic gets a corresponding LLVM IR intrinsic (because
they have distinct function prototypes). Dual-register operands are
represented as pairs of i32 values. Because of this the instruction
selection for these intrinsics cannot be represented as TableGen
patterns and requires custom C++ code.
Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76296
This reverts commit e9f22fd429.
When building with -DLLVM_USE_SANITIZER="Thread", check-llvm has 70
failing tests with this revision, and 29 without this revision.
(would be nice to revisit the CFG traits and change them to use ranges
rather than begin/end - if anyone wants to do that refactor)
Also use more auto because writing the names of range utilty iterators
isn't helping readability here - they're sort of implementation details
for the most part, especially once you nest a few different filtering
and adapting iterators.
The fix (shooting from the hip since I couldn't reproduce this locally)
was to capture by value in a lambda used in a filtering iterator -
because the iterator would persist beyond the lifetime of the function
(as the iterators are returned to callers).
Originally committed in 79a7ed92a9.
This was reverted in 4a7f2032a3.
Summary:
1. FileLineInfoSpecifier::Default isn't the default for anything.
Rename to RawValue, which accurately reflects its role.
2. Most functions that take a part of a FileLineInfoSpecifier end up
constructing a full one later or plumb two values through. Make them
all just take a complete FileLineInfoSpecifier.
3. Printing basenames only was handled differently from all other
variants, make it parallel to all the other variants.
Reviewers: jhenderson
Subscribers: hiraditya, MaskRay, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76394
Port over the following:
- shuffle undef, undef, any_mask -> undef
- shuffle anything, anything, undef_mask -> undef
This sort of thing shows up a lot when you try to bugpoint code containing
shufflevector.
Differential Revision: https://reviews.llvm.org/D76382
Updates the object buffer ownership scheme in jitLinkForOrc and related
functions: Ownership of both the object::ObjectFile and underlying
MemoryBuffer is passed into jitLinkForOrc and passed back to the onEmit
callback once linking is complete. This avoids the use-after-free errors
that were seen in 98f2bb4461.
In MachOPlatform, obtaining the link-order for a JITDylib requires locking the
session, but also needs to be part of a larger atomic operation that collates
initializer symbols tracked by the platform. Trying to do this under a separate
platform mutex leads to potential locking order issues, e.g.
T1 locks session then tries to lock platform to register a new init symbol
meanwhile
T2 locks platform then tries to lock session to obtain link order.
Removing the platform lock and performing all these operations under the session
lock eliminates this possibility.
At the same time we also need to collate init pointers from the
MachOPlatform::InitScraperPlugin, and we don't need or want to lock the session
for that. The new InitSeqMutex has been added to guard these init pointers, and
the session mutex is never obtained while the InitSeqMutex is held.
Check the path length limit against the length of the UTF-16 version of
the input rather than the UTF-8 equivalent, as the UTF-16 length may be
shorter. Move widenPath from the llvm::sys::path namespace in Path.h to
the llvm::sys::windows namespace in WindowsSupport.h. Only use the
reduced path length limit for create directory. Canonicalize using
sys::path::remove_dots().
Differential Revision: https://reviews.llvm.org/D75372
Summary:
In order to keep the names consistent with other SVE gather loads, the
intrinsics for gather prefetch are renamed as follows:
* @llvm.aarch64.sve.gather.prfb -> @llvm.aarch64.sve.prfb.gather
Reviewed by: fpetrogalli
Differential Revision: https://reviews.llvm.org/D76421
Summary:
* Remove a bunch of asserts checking for unsupported scalable types and
add some more now that they are supported.
* Propagate the scalable flag where necessary.
* Add another `EVT::getExtendedVectorVT` method that takes an
ElementCount parameter.
* Add `EVT::isExtendedScalableVector` and
`EVT::getExtendedVectorElementCount` - latter is currently unused.
Reviewers: sdesmalen, efriedma, rengolin, craig.topper, huntergr
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75672
Summary:
Related to D75672, this patch adds EVT::isFixedLengthVector to determine
if the underlying vector type is of fixed length.
An assert is introduced in EVT::getVectorNumElements that triggers for
types that aren't fixed length. This is currently guarded by a flag
added D75297 that is off by default and has been renamed to the more
generic ENABLE_STRICT_FIXED_SIZE_VECTORS.
Ideally we want to get rid of getVectorNumElements but a quick grep
shows there are >350 uses in lib/CodeGen and 75 in lib/Target/AArch64
alone. All of these probably aren't EVT::getVectorNumElements (some may
be the MVT equivalent), but there are many places to fixup and having
the assert on by default would make the SVE upstreaming effort
difficult.
Reviewers: sdesmalen, efriedma, ctetreau, huntergr, rengolin
Reviewed By: efriedma
Subscribers: mgorny, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76376
(would be nice to revisit the CFG traits and change them to use ranges
rather than begin/end - if anyone wants to do that refactor)
Also use more auto because writing the names of range utilty iterators
isn't helping readability here - they're sort of implementation details
for the most part, especially once you nest a few different filtering
and adapting iterators.
The existence of the class is more confusing than helpful, I think; the
commonality is mostly just "GEP is legal", which can be queried using
APIs on GetElementPtrInst.
Differential Revision: https://reviews.llvm.org/D75660
SelectionDAG CSEs nodes based on their result type and operands, but not their flags. The flags are expected to be intersected when they are CSEd. In SelectionDAGBuilder, for FP nodes we manage both the fast math flags and the nofpexcept flag after the nodes have already been CSEd when they were created with getNode. The management of the fastmath flags before the constrained nodes prevents the nofpexcept management from working correctly.
This commit moves the FMF handling for constrained intrinsics into their visitor and disables the common FMF handling for these nodes.
Differential Revision: https://reviews.llvm.org/D75224
This patch generates TableGen descriptions for the specified register
banks which contain a list of register sizes corresponding to the
available HwModes. The appropriate size is used during codegen according
to the current HwMode. As this HwMode was not available on generation,
it is set upon construction of the RegisterBankInfo class. Targets
simply need to provide the HwMode argument to the
<target>GenRegisterBankInfo constructor.
The RISC-V RegisterBankInfo constructor has been updated accordingly
(plus an unused argument removed).
Differential Revision: https://reviews.llvm.org/D76007
This is fixing up various places that use the implicit
TypeSize->uint64_t conversion.
The new overloads in MemoryLocation.h are already used in various places
that construct a MemoryLocation from a TypeSize, including MemorySSA.
(They were using the implicit conversion before.)
Differential Revision: https://reviews.llvm.org/D76249
This ports some combines from DAGCombiner.cpp which perform some trivial
transformations on instructions with undef operands.
Not having these can make it extremely annoying to find out where we differ
from SelectionDAG by looking at existing lit tests. Without them, we tend to
produce pretty bad code generation when we run into instructions which use
undef operands.
Also remove the nonpow2_store_narrowing testcase from arm64-fallback.ll, since
we no longer fall back on the add.
Differential Revision: https://reviews.llvm.org/D76339
Summary:
This is another set of instructions too complicated to be sensibly
expressed in IR by anything short of a target-specific intrinsic.
Given input vectors a,b, the instruction generates intermediate values
2*(a[0]*b[0]+a[1]+b[1]), 2*(a[2]*b[2]+a[3]+b[3]), etc; takes the high
half of each double-width values, and overwrites half the lanes in the
output vector c, which you therefore have to provide the input value
of. Optionally you can swap the elements of b so that the are things
like a[0]*b[1]+a[1]*b[0]; optionally you can round to nearest when
taking the high half; and optionally you can take the difference
rather than sum of the two products. Finally, saturation is applied
when converting back to a single-width vector lane.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: miyuki
Subscribers: kristof.beyls, hiraditya, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76359
Summary:
As noted in [[ https://bugs.llvm.org/show_bug.cgi?id=45201 | PR45201 ]],
[[ https://bugs.llvm.org/show_bug.cgi?id=10090 | PR10090 ]] SCEV doesn't
always avoid recursive algorithms, and that causes issues with
large expression depths and/or smaller stack sizes.
In `SCEVExpander::isHighCostExpansion*()` case, the refactoring to avoid
recursion is rather idiomatic. We simply need to place the root expr
into a vector, and iterate over vector elements accounting for the cost
of each one, adding new exprs at the end of the vector,
thus achieving recursion-less traversal.
The order in which we will visit exprs doesn't matter here,
so we will be fine with the most basic approach of using SmallVector
and inserting/extracting from the back, which accidentally is the same
depth-first traversal that we were doing previously recursively.
Reviewers: mkazantsev, reames, wmi, ekatz
Reviewed By: mkazantsev
Subscribers: hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76273
When optimising for code size at the expense of performance, it is often
worth saving and restoring some of r0-r3, if IPRA will be able to take
advantage of them. This doesn't cost any extra code size if we already
have a PUSH/POP pair, and increases the number of available registers
across any calls to the function.
We already have an optimisation which tries fold the subtract/add of the
SP into the PUSH/POP by using extra registers, which somewhat conflicts
with this. I've made the new optimisation less aggressive in cases where
the existing one is likely to trigger, which gives better results than
either of these optimisations by themselves.
Differential revision: https://reviews.llvm.org/D69936
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jholewinski, arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76348
Summary:
This fixes a discrepancy between the non-temporal loads/store
intrinsics and other SVE load intrinsics (such as nf/ff), so
that Clang can use the same code to generate these intrinsics.
Reviewers: andwar, kmclaughlin, rengolin, efriedma
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76237
Summary:
In D67768/D67492 I added support for entry values having blocks larger
than one byte, but I now noticed that the DIE implementation I added there
was broken. The takeNodes() function, that moves the entry value block
from a temporary buffer to the output buffer, would destroy the input
iterator when transferring the first node, meaning that only that node
was moved.
In practice, this meant that when emitting a call site value using a
DW_OP_entry_value operation with a DWARF register number larger than 31,
that multi-byte DW_OP_regx expression would be truncated.
Reviewers: djtodoro, aprantl, vsk
Reviewed By: djtodoro
Subscribers: llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D76279
Summary:
These are complicated integer multiply+add instructions with extra
saturation, taking the high half of a double-width product, and
optional rounding. There's no sensible way to represent that in
standard IR, so I've converted the clang builtins directly to
target-specific intrinsics.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: miyuki
Subscribers: kristof.beyls, hiraditya, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76123
Summary:
These instructions compute multiply+add in integers, with one of the
operands being a splat of a scalar. (VMLA and VMLAS differ in whether
the splat operand is a multiplier or the addend.)
I've represented these in IR using existing standard IR operations for
the unpredicated forms. The predicated forms are done with target-
specific intrinsics, as usual.
When operating on n-bit vector lanes, only the bottom n bits of the
i32 scalar operand are used. So we have to tell that to isel lowering,
to allow it to remove a pointless sign- or zero-extension instruction
on that input register. That's done in `PerformIntrinsicCombine`, but
first I had to enable `PerformIntrinsicCombine` for MVE targets
(previously all the intrinsics it handled were for NEON), and make it
a method of `ARMTargetLowering` so that it can get at
`SimplifyDemandedBits`.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76122
RDF is designed to be target agnostic. Therefore it would be useful to have it available for other targets, such as X86.
Based on a previous patch by Krzysztof Parzyszek
Differential Revision: https://reviews.llvm.org/D75932
Summary: Prevent InstCombine from removing llvm.assume for which the arguement is true when they have operand bundles with usefull information.
Reviewers: jdoerfert, nikic, lebedev.ri
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76147
MCTargetOptionsCommandFlags.inc and CommandFlags.inc are headers which contain
cl::opt with static storage.
These headers are meant to be incuded by tools to make it easier to parametrize
codegen/mc.
However, these headers are also included in at least two libraries: lldCommon
and handle-llvm. As a result, when creating DYLIB, clang-cpp holds a reference
to the options, and lldCommon holds another reference. Linking the two in a
single executable, as zig does[0], results in a double registration.
This patch explores an other approach: the .inc files are moved to regular
files, and the registration happens on-demand through static declaration of
options in the constructor of a static object.
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1756977#c5
Differential Revision: https://reviews.llvm.org/D75579
With -fstack-protector-strong we check if a non-array variable has its address
taken in a way that could cause a potential out-of-bounds access. However what
we don't catch is when the address is directly used to create an out-of-bounds
memory access.
Fix this by examining the offsets of GEPs that are ultimately derived from
allocas and checking if the resulting address is out-of-bounds, and by checking
that any memory operations using such addresses are not over-large.
Fixes PR43478.
Differential revision: https://reviews.llvm.org/D75695
This patch makes `Relocation::Addend` to be `ELFYAML::YAMLIntUInt` and not `int64_t`.
`ELFYAML::YAMLIntUInt` it is a new type and it has the following benefits/features:
1) For an 64-bit object any hex/decimal addends
in the range [INT64_MIN, UINT64_MAX] is accepted.
2) For an 32-bit object any hex/decimal addends
in range [INT32_MIN, UINT32_MAX] is accepted.
3) Negative hex numbers like -0xffffffff are not accepted.
4) It is printed as decimal. I.e. obj2yaml will print
something like "Addend: 125", this matches the current behavior.
This fixes all FIXMEs in `relocation-addend.yaml`.
Differential revision: https://reviews.llvm.org/D75527
Summary:
Run StackSafetyAnalysis at the end of the IR pipeline and annotate
proven safe allocas with !stack-safe metadata. Do not instrument such
allocas in the AArch64StackTagging pass.
Reviewers: pcc, vitalybuka, ostannard
Reviewed By: vitalybuka
Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, gilang, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73513
This is the second patch in a series of patches to enable basic block
sections support.
This patch adds support for:
* Creating direct jumps at the end of basic blocks that have fall
through instructions.
* New pass, bbsections-prepare, that analyzes placement of basic blocks
in sections.
* Actual placing of a basic block in a unique section with special
handling of exception handling blocks.
* Supports placing a subset of basic blocks in a unique section.
* Support for MIR serialization and deserialization with basic block
sections.
Parent patch : D68063
Differential Revision: https://reviews.llvm.org/D73674
I used the implementation for floor instead of round. It also turns
out the OpenCL builtin library wasn't using the round builtin, but
implemented the expanded form.
Follow-up for D74433
What the function returns are almost standard BFD names, except that "ELF" is
in uppercase instead of lowercase.
This patch changes "ELF" to "elf" and changes ARM/AArch64 to use their BFD names.
MIPS and PPC64 have endianness differences as well, but this patch does not intend to address them.
Advantages:
* llvm-objdump: the "file format " line matches GNU objdump on ARM/AArch64 objects
* "file format " line can be extracted and fed into llvm-objcopy -O literally.
(https://github.com/ClangBuiltLinux/linux/issues/779 has such a use case)
Affected tools: llvm-readobj, llvm-objdump, llvm-dwarfdump, MCJIT (internal implementation detail, not exposed)
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D76046
Summary:
Truncating the result of a merge means that most likely we could have done without merge in the first place and just used the input merge inputs directly. This can be done in three cases:
1. If the truncation result is smaller than the merge source, we can use the source in the trunc directly
2. If the sizes are the same, we can replace the register or use a copy
3. If the truncation size is a multiple of the merge source size, we can build a smaller merge
This gets rid of most of the larger, hard-to-legalize merges.
Reviewers: qcolombet, aditya_nandakumar, aemerson, paquette, arsenm, Petar.Avramovic
Reviewed By: arsenm
Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, jrtc27, atanasyan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75915
This adds the --debug-vars option to llvm-objdump, which prints
locations (registers/memory) of source-level variables alongside the
disassembly based on DWARF info. A vertical line is printed for each
live-range, with a label at the top giving the variable name and
location, and the position and length of the line indicating the program
counter range in which it is valid.
Currently, this only works for object files, not executables or shared
libraries.
Differential revision: https://reviews.llvm.org/D70720
alignBranches is X86 specific, change the name in a
more general one since other target can do some state
chang before and after emitting the instruction.
Enable use of ExecutionEngine JITEventListeners in RTDyldObjectLinkingLayer.
This allows existing MCJIT clients to more easily migrate to LLJIT / ORCv2.
Example usage in llvm/examples/OrcV2Examples/LLJITWithGDBRegistrationListener.
Differential Revision: https://reviews.llvm.org/D75838
This patch removes compiler runtime assertions that ensure the implicit
conversion are only guaranteed to work for fixed-width vectors.
With the assert it would be impossible to get _anything_ to build until
the
entire codebase has been upgraded, even when the indiscriminate uses of
the size as uint64_t would work fine for both scalable and fixed-width
types.
This issue will need to be addressed differently, with build-time errors
rather than assertion failures, but that effort falls beyond the scope
of this patch.
Returning the scalable size and avoiding the assert in getFixedSize()
is a temporary stop-gap in order to use LLVM for compiling and using
the SVE ACLE intrinsics.
Reviewers: efriedma, huntergr, rovka, ctetreau, rengolin
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75297
This patch adds a new singlecrfromundef lattice value, indicating a
single element constant range which was merge with undef at some point.
Merging it with another constant range results in overdefined, as we
won't be able to replace all users with a single value.
This patch uses a ConstantRange instead of a Constant*, because regular
integer constants are represented as single element constant ranges as
well and this allows the existing code working without additional
changes.
Reviewers: efriedma, nikic, reames, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75845
This is the first in a series of patches to enable Basic Block Sections
in LLVM.
We introduce a new compiler option, -fbasicblock-sections=, which places every
basic block in a unique ELF text section in the object file along with a
symbol labeling the basic block. The linker can then order the basic block
sections in any arbitrary sequence which when done correctly can encapsulate
block layout, function layout and function splitting optimizations. However,
there are a couple of challenges to be addressed for this to be feasible:
1) The compiler must not allow any implicit fall-through between any two
adjacent basic blocks as they could be reordered at link time to be
non-adjacent. In other words, the compiler must make a fall-through
between adjacent basic blocks explicit by retaining the direct jump
instruction that jumps to the next basic block. These branches can only
be removed later by the linker after the blocks have been reordered.
2) All inter-basic block branch targets would now need to be resolved by
the linker as they cannot be calculated during compile time. This is
done using static relocations which bloats the size of the object files.
Further, the compiler tries to use short branch instructions on some ISAs
for branch offsets that can be accommodated in one byte. This is not
possible with basic block sections as the offset is not determined at
compile time, and long branch instructions have to be used everywhere.
3) Each additional section bloats object file sizes by tens of bytes. The
number of basic blocks can be potentially very large compared to the
size of functions and can bloat object sizes significantly. Option
fbasicblock-sections= also takes a file path which can be used to
specify a subset of basic blocks that needs unique sections to keep
the bloats small.
4) Debug Info and CFI need special handling and will be presented as
separate patches.
Basic Block Labels
With -fbasicblock-sections=labels, or when a basic block is placed in a
unique section, it is labelled with a symbol. This allows easy mapping of
virtual addresses from PMU profiles back to the corresponding basic blocks.
Since the number of basic blocks is large, the labeling bloats the symbol
table sizes and the string table sizes significantly. While the binary size
does increase, it does not affect performance as the symbol table is not
loaded in memory during run-time. The string table size bloat is kept very
minimal using a unary naming scheme that uses string suffix compression.
The basic blocks for function foo are named "a.BB.foo", "aa.BB.foo", ...
This turns out to be very good for string table sizes and the bloat in the
string table size for a very large binary is ~8 %. The naming also allows
using the --symbol-ordering-file option in LLD to arbitrarily reorder the
sections.
Differential Revision: https://reviews.llvm.org/D68063
Renames the llvm/examples/LLJITExamples directory to llvm/examples/OrcV2Examples
since it is becoming a home for all OrcV2 examples, not just LLJIT.
See http://llvm.org/PR31103.
This patch adds a new undef lattice state, which is used to represent
UndefValue constants or instructions producing undef.
The main difference to the unknown state is that merging undef values
with constants (or single element constant ranges) produces the
constant/constant range, assuming all uses of the merge result will be
replaced by the found constant.
Contrary, merging non-single element ranges with undef needs to go to
overdefined. Using unknown for UndefValues currently causes mis-compiles
in CVP/LVI (PR44949) and will become problematic once we use
ValueLatticeElement for SCCP.
Reviewers: efriedma, reames, davide, nikic
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75120
Patch based on https://reviews.llvm.org/D75912 by Alexander Shishkin. Thanks
Alexander!
To minimize disruption to existing clients, who may be relying on the fact that
unused references to unresolved symbols do not generate an error, this patch
makes error checking opt-in: Clients can call ExecutionEngine::hasError or
LLVMExecutionEngineGetError to check whether and error has occurred.
Differential revision: https://reviews.llvm.org/D75912
For context, the proposed RISC-V bit manipulation extension has a subset
of instructions which require one of two SubtargetFeatures to be
enabled, 'zbb' or 'zbp', and there is no defined feature which both of
these can imply to use as a constraint either (see comments in D65649).
AssemblerPredicates allow multiple SubtargetFeatures to be declared in
the "AssemblerCondString" field, separated by commas, and this means
that the two features must both be enabled. There is no equivalent to
say that _either_ feature X or feature Y must be enabled, short of
creating a dummy SubtargetFeature for this purpose and having features X
and Y imply the new feature.
To solve the case where X or Y is needed without adding a new feature,
and to better match a typical TableGen style, this replaces the existing
"AssemblerCondString" with a dag "AssemblerCondDag" which represents the
same information. Two operators are defined for use with
AssemblerCondDag, "all_of", which matches the current behaviour, and
"any_of", which adds the new proposed ORing features functionality.
This was originally proposed in the RFC at
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139138.html
Changes to all current backends are mechanical to support the replaced
functionality, and are NFCI.
At this stage, it is illegal to combine features with ands and ors in a
single AssemblerCondDag. I suspect this case is sufficiently rare that
adding more complex changes to support it are unnecessary.
Differential Revision: https://reviews.llvm.org/D74338
When emitting PDBs, the TypeStreamMerger class is used to merge .debug$T records from the input .OBJ files into the output .PDB stream.
Records in .OBJs are not required to be aligned on 4-bytes, and "The Netwide Assembler 2.14" generates non-aligned records.
When compiling with -DLLVM_ENABLE_ASSERTIONS=ON, an assert was triggered in MergingTypeTableBuilder when non-ghash merging was used.
With ghash merging there was no assert.
As a result, LLD could potentially generate a non-aligned TPI stream.
We now align records on 4-bytes when record indices are remapped, in TypeStreamMerger::remapIndices().
Differential Revision: https://reviews.llvm.org/D75081
This patch add mayContainUnboundedCycle helper function which checks whether a function has any cycle which we don't know if it is bounded or not.
Loops with maximum trip count are considered bounded, any other cycle not.
It also contains some fixed tests and some added tests contain bounded and
unbounded loops and non-loop cycles.
Reviewed By: jdoerfert, uenoku, baziotis
Differential Revision: https://reviews.llvm.org/D74691
Calculating SCEVs can be cumbersome, and may take very long time (even
hours, for very long expressions). To prevent recalculating expressions
over and over again, we cache them.
This change add cache queries to key positions, to prevent recalculation
of the expressions.
Fix PR43571.
Differential Revision: https://reviews.llvm.org/D70097
Summary:
This intrinsic implements the unpredicated duplication of scalar values
and is mapped to (through ISD::SPLAT_VECTOR):
* DUP <Zd>.<T>, #<imm>
* DUP <Zd>.<T>, <R><n|SP>
Reviewed by: sdesmalen
Differential Revision: https://reviews.llvm.org/D75900
After a crash catched by the CrashRecoveryContext, this patch prevents from accessing dangling pointers in TimerGroup structures before the clang tool exits. Previously, the default TimerGroup had internal linked lists which were still pointing to old Timer or TimerGroup instances, which lived in stack frames released by the CrashRecoveryContext.
Fixes PR45164.
Differential Revision: https://reviews.llvm.org/D76099
LLVM currently supports CSK_MD5 and CSK_SHA1 source file checksums in
debug info. This change adds support for CSK_SHA256 checksums.
The SHA256 checksums are supported by the CodeView debug format.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D75785
These may be accessed from multiple threads if concurrent materialization is
enabled in ORC.
Testcase coming in a follow-up patch that enables eh-frame registration for
LLJIT.
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.
This change is needed for D73753.
Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74386
Summary:
Using the default DAG.UnrollVectorOp on v16i8 and v8i16 vectors
results in i8 or i16 nodes being inserted into the SelectionDAG. Since
those are illegal types, this causes a legalization assertion failure
for some code patterns, as uncovered by PR45178. This change unrolls
shifts manually to avoid this issue by adding and using a new optional
EVT argument to DAG.ExtractVectorElements to control the type of the
extract_element nodes.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76043
Use GraphTraits in the implementation of the GraphDiff's own GraphTraits
so GraphDiff can be used across all graph types that provide
GraphTraits.
Also use partial template specializations to make the traits a bit more
compact.
Reviewers: asbirlea
Differential Revision: https://reviews.llvm.org/D76034
Summary:
This patch adds the following intrinsics for non-temporal gather loads
and scatter stores:
* aarch64_sve_ldnt1_gather_index
* aarch64_sve_stnt1_scatter_index
These intrinsics implement the "scalar + vector of indices" addressing
mode.
As opposed to regular and first-faulting gathers/scatters, there's no
instruction that would take indices and then scale them. Instead, the
indices for non-temporal gathers/scatters are scaled before the
intrinsics are lowered to `ldnt1` instructions.
The new ISD nodes, GLDNT1_INDEX and SSTNT1_INDEX, are only used as
placeholders so that we can easily identify the cases implemented in
this patch in performGatherLoadCombine and performScatterStoreCombined.
Once encountered, they are replaced with:
* GLDNT1_INDEX -> SPLAT_VECTOR + SHL + GLDNT1
* SSTNT1_INDEX -> SPLAT_VECTOR + SHL + SSTNT1
The patterns for lowering ISD::SHL for scalable vectors (required by
this patch) were missing, so these are added too.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D75601
Summary:
This adds the ACLE intrinsic family for the VFMA and VFMS
instructions, which perform fused multiply-add on vectors of floats.
I've represented the unpredicated versions in IR using the cross-
platform `@llvm.fma` IR intrinsic. We already had isel rules to
convert one of those into a vector VFMA in the simplest possible way;
but we didn't have rules to detect a negated argument and turn it into
VFMS, or rules to detect a splat argument and turn it into one of the
two vector/scalar forms of the instruction. Now we have all of those.
The predicated form uses a target-specific intrinsic as usual, but
I've stuck to just one, for a predicated FMA. The subtraction and
splat versions are code-generated by passing an fneg or a splat as one
of its operands, the same way as the unpredicated version.
In arm_mve_defs.h, I've had to introduce a tiny extra piece of
infrastructure: a record `id` for use in codegen dags which implements
the identity function. (Just because you can't declare a Tablegen
value of type dag which is //only// a `$varname`: you have to wrap it
in something. Now I can write `(id $varname)` to get the same effect.)
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75998
Summary: This patch adds the basic utilities to deal with dropable uses. dropable uses are uses that we rather drop than prevent transformations, for now they are limited to uses in llvm.assume.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: uenoku, lebedev.ri, mgorny, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73404
Summary:
Currently, a BoundaryAlign fragment may be inserted after the branch
that needs to be aligned to truncate the current fragment, this fragment is
unused at most of time. To avoid that, we can insert a new empty Data
fragment instead. Non-relaxable instruction is usually emitted into Data
fragment, so the inserted empty Data fragment will be reused at a high
possibility.
Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight
Reviewed By: reames, LuoYuanke
Subscribers: llvm-commits, dexonsmith, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75438
In order for dsymutil to collect .apinotes files (which capture
attributes such as nullability, Swift import names, and availability),
I want to propose adding an apinotes: field to DIModule that gets
translated into a DW_AT_LLVM_apinotes (path) nested inside
DW_TAG_module. This will be primarily used by LLDB to indirectly
extract the Swift names of Clang declarations that were deserialized
from DWARF.
<rdar://problem/59514626>
Differential Revision: https://reviews.llvm.org/D75585
Summary: Add verification that operand bundles on an llvm.assume are well formed to the verify pass.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75269
This is part of PR44213 https://bugs.llvm.org/show_bug.cgi?id=44213
When importing (system) Clang modules, LLDB needs to know which SDK
(e.g., MacOSX, iPhoneSimulator, ...) they came from. While the sysroot
attribute contains the absolute path to the SDK, this doesn't work
well when the debugger is run on a different machine than the
compiler, and the SDKs are installed in different directories. It thus
makes sense to just store the name of the SDK instead of the absolute
path, so it can be found relative to LLDB.
rdar://problem/51645582
Differential Revision: https://reviews.llvm.org/D75646
Summary:
The IR intrinsics are mapped to the following SVE2 instructions:
* WHILERW <Pd>.<T>, <Xn>, <Xm>
* WHILEWR <Pd>.<T>, <Xn>, <Xm>
The intrinsics introduced in this patch are the IR counterpart of the
SVE ACLE functions `svwhilerw` and `svwhilewr` (all data type
variants).
Patch by Maciej Gąbka <maciej.gabka@arm.com>.
Reviewers: kmclaughlin, rengolin
Reviewed By: kmclaughlin
Subscribers: tschuett, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75862
A downstream test case (see included reduced test) revealed that we have a bug in how we handle duplicate relocations. If we have the same SDValue relocated twice, and that value happens to be a constant (such as null), we only export one of the two llvm::Values. Exporting on a per llvm::Value basis is required to allow lowering of gc.relocates in following basic blocks (e.g. invokes). Without it, we end up with a use of an undefined vreg and bad things happen.
Rather than fixing the optimization - which appears to be hard - I propose we simply remove it. There are no tests in tree that change with this code removed. If we find out later that this did matter for something, we can reimplement a variation of this in CodeGenPrepare to catch the easy cases without complicating the lowering code.
Thanks to Denis and Serguei who did all the hard work of figuring out what went wrong here. The patch is by far the easy part. :)
Differential Revision: https://reviews.llvm.org/D75964
Refines the gather/scatter cost model, but also changes the TTI
function getIntrinsicInstrCost to accept an additional parameter
which is needed for the gather/scatter cost evaluation.
This did require trivial changes in some non-ARM backends to
adopt the new parameter.
Extending gathers and truncating scatters are now priced cheaper.
Differential Revision: https://reviews.llvm.org/D75525
This essentially reverts some of the SimplifyLibcalls part changes of D45736 [SimplifyLibcalls] Replace locked IO with unlocked IO.
C11 7.21.5.2 The fflush function
> If stream is a null pointer, the fflush function performs this flushing action on all streams for which the behavior is defined above.
i.e. fopen'ed FILE* is inherently captured.
POSIX.1-2017 getc_unlocked, getchar_unlocked, putc_unlocked, putchar_unlocked - stdio with explicit client locking
> These functions can safely be used in a multi-threaded program if and only if they are called while the invoking thread owns the ( FILE *) object, as is the case after a successful call to the flockfile() or ftrylockfile() functions.
After a thread fopen'ed a FILE*, when it is calling foobar() which is now replaced by foobar_unlocked(),
if another thread is concurrently calling fflush(0), the behavior is undefined.
C11 7.22.4.4 The exit function
> Next, all open streams with unwritten buffered data are flushed, all open streams are closed, and all files created by the tmpfile function are removed.
The replacement is only feasible if the program is single threaded, or exit or fflush(0) is never called.
See also http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180528/556615.html
for how the replacement makes libc interceptors difficult to implement.
dalias: in a worst case, it's unbounded data corruption because of concurrent access to pointers
without synchronization. f->wpos or rpos could get outside of the buffer, thread A could do
f->wpos += j after knowing j is in bounds, while thread B also changes it concurrently.
This can produce exploitable conditions depending on libc internals.
Revert the SimplifyLibcalls part change because the cons obviously
overweigh the pros. Even when the replacement is feasible, the benefit
is indemonstrable, more so in an application instead of an artificial
glibc benchmark. Theoretically the replacement could be beneficial when
calling getc_unlocked/putc_unlocked in a loop, but then it is better
using a blocked IO operation and the user is likely aware of that.
The function attribute inference is still useful and thus kept.
Reviewed By: xbolva00
Differential Revision: https://reviews.llvm.org/D75933
Summary:
This patch extends the TargetMachine to let targets specify the integer size
used by the sjljehprepare pass. This is 64bit for the VE target and otherwise
defaults to 32bit for all targets, which was hard-wired before.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D71337
Summary: Rewrite the fsub-0.0 idiom to fneg and always emit fneg for fp
negation. This also extends the scalarization cost in instcombine for unary
operators to result in the same IR rewrites for fneg as for the idiom.
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D75467
Summary:
This patch generalizes the existing code to support CDE intrinsics
which will share some properties with existing MVE intrinsics
(some of the intrinsics will be polymorphic and accept/return values
of MVE vector types).
Specifically the patch:
* Adds new tablegen backends -gen-arm-cde-builtin-def,
-gen-arm-cde-builtin-codegen, -gen-arm-cde-builtin-sema,
-gen-arm-cde-builtin-aliases, -gen-arm-cde-builtin-header based on
existing MVE backends.
* Renames the '__clang_arm_mve_alias' attribute into
'__clang_arm_builtin_alias' (it will be used with CDE intrinsics as
well as MVE intrinsics)
* Implements semantic checks for the coprocessor argument of the CDE
intrinsics as well as the existing coprocessor intrinsics.
* Adds one CDE intrinsic __arm_cx1 to test the above changes
Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen
Reviewed By: simon_tatham
Subscribers: sdesmalen, mgorny, kristof.beyls, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75850
This allows Spiller.h to be used and included outside of
the lib/CodeGen directory. For example to be used in the
lib/Target directory or other places.
The line_range value of a debug line program header is used in divisions
related to special opcodes and DW_LNS_const_add_pc opcodes. As such, a
value of 0 cannot be used. This change introduces a new warning, if such
a situation is identified, and does not perform the relevant
calculations.
Reviewed by: probinson, aprantl
Differential Revision: https://reviews.llvm.org/D43470
This patch adds a check which reports an unsupported value of the
maximum_operations_per_instruction field in a debug line table header.
This is reported once per line table, at most, and only if the tablet
would otherwise need to use it (i.e. never for tables with version 3 or
less, or for tables which don't use DW_LNS_const_add_pc or special
opcodes). Unsupported values are currently any apart from 1.
Reviewed by: probinson, MaskRay
Differential Revision: https://reviews.llvm.org/D74819
This patch introduces the propagation of known information based on path exploration.
For example,
```
int u(int c, int *p){
if(c) {
return *p;
} else {
return *p + 1;
}
}
```
An argument `p` is dereferenced whatever c's value is.
For an instruction `CtxI`, we accumulate branch instructions in the must-be-executed-context of `CtxI` and then, we take the conjunction of the successors' known state.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D65593
Summary:
Assume bundles need to be usable by Analysis and Transforms/Utils isn't.
so this commit moves utilities to deal with asusme bundles to IR.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75618
Summary: Finding what information is know about a value from a use is generally useful and can be done quickly.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75616
With the addition of the LLD time tracing it made sense to include coverage
for LLVM's various passes. Doing so ensures that ThinLTO is also covered
with a time trace.
Before:
{F11333974}
After:
{F11333928}
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D74516
Added a write method for TimeTrace that takes two strings representing
file names. The first is any file name that may have been provided by the
user via `time-trace-file` flag, and the second is a fallback that should
be configured by the caller. This method makes it cleaner to write the
trace output because there is no longer a need to check file names at the
caller and simplifies future TimeTrace usages.
Reviewed By: modocache
Differential Revision: https://reviews.llvm.org/D74514
Summary:
These implement the usual IEEE-style floating point comparison
semantics, e.g. +0.0 == -0.0 and all operators except != return false
if either argument is NaN.
Subscribers: arsenm, jvesely, nhaehnle, hiraditya, dexonsmith, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75237
With AssertingVHs instead of bare pointers in
BlockFrequencyInfoImpl::Nodes (but without CallbackVHs) ~1/36 of all
tests ran by make check fail. It means that there are users of BFI that
delete basic blocks while keeping BFI. Some of those transformations add
new basic blocks, so if a new basic block happens to be allocated at
address where an already deleted block was and we don't explicitly set
block frequency for that new block, BFI will report some non-default
frequency for the block even though frequency for the block was never
set. Inliner is an example of a transformation that adds and removes BBs
while querying and updating BFI.
With this patch, thanks to updates via CallbackVH, BFI won't keep stale
pointers in its Nodes map.
This is a resubmission of 408349a25d with
fixed compiler warning and MSVC compilation error.
Reviewers: davidxl, yamauchi, asbirlea, fhahn, fedor.sergeev
Reviewed-By: asbirlea, davidxl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75341
Summary:
We already have overloaded binary arithemetic operators so you can write
A+B etc. This patch lets you write -A instead of neg(A).
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75236
* Delete boilerplate
* Change functions to return `Error`
* Test parsing errors
* Update callers of ARMAttributeParser::parse() to check the `Error` return value.
Since this patch touches nearly everything in the file, I apply
http://llvm.org/docs/Proposals/VariableNames.html and change variable
names to lower case.
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D75015
Summary:
This performs better for sample PGO.
NFC as PGSOColdCodeOnlyForSamplePGO is still true.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75550
Summary:
```
br i1 c, BB1, BB2:
BB1:
use1(c)
BB2:
use2(c)
```
In BB1 and BB2, c is never undef or poison because otherwise the branch would have triggered UB.
This is a resubmission of 952ad47 with crash fix of llvm/test/Transforms/LoopRotate/freeze-crash.ll.
Checked with Alive2
Reviewers: xbolva00, spatel, lebedev.ri, reames, jdoerfert, nlopes, sanjoy
Reviewed By: reames
Subscribers: jdoerfert, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75401
This reverts commit 8975aa6ea8.
Causes a compilation warning:
llvm-project/llvm/include/llvm/Analysis/BlockFrequencyInfoImpl.h:1037:43: warning: 'llvm::BlockFrequencyInfoImpl<llvm::BasicBlock>::BFICallbackVH' has virtual functions but non-virtual destructor [-Wnon-virtual-dtor]
class BlockFrequencyInfoImpl<BasicBlock>::BFICallbackVH : public CallbackVH {
^
1 warning generated.
With AssertingVHs instead of bare pointers in
BlockFrequencyInfoImpl::Nodes (but without CallbackVHs) ~1/36 of all
tests ran by make check fail. It means that there are users of BFI that
delete basic blocks while keeping BFI. Some of those transformations add
new basic blocks, so if a new basic block happens to be allocated at
address where an already deleted block was and we don't explicitly set
block frequency for that new block, BFI will report some non-default
frequency for the block even though frequency for the block was never
set. Inliner is an example of a transformation that adds and removes BBs
while querying and updating BFI.
With this patch, thanks to updates via CallbackVH, BFI won't keep stale
pointers in its Nodes map.
This is a resubmission of 408349a25d with
fixed MSVC compilation errors.
Reviewers: davidxl, yamauchi, asbirlea, fhahn, fedor.sergeev
Reviewed-By: asbirlea, davidxl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75341
With AssertingVHs instead of bare pointers in
BlockFrequencyInfoImpl::Nodes (but without CallbackVHs) ~1/36 of all
tests ran by make check fail. It means that there are users of BFI that
delete basic blocks while keeping BFI. Some of those transformations add
new basic blocks, so if a new basic block happens to be allocated at
address where an already deleted block was and we don't explicitly set
block frequency for that new block, BFI will report some non-default
frequency for the block even though frequency for the block was never
set. Inliner is an example of a transformation that adds and removes BBs
while querying and updating BFI.
With this patch, thanks to updates via CallbackVH, BFI won't keep stale
pointers in its Nodes map.
Reviewers: davidxl, yamauchi, asbirlea, fhahn, fedor.sergeev
Reviewed-By: asbirlea, davidxl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75341
This fixes printing long values that might reside in CIE and FDE,
including offsets, lengths, and addresses.
Differential Revision: https://reviews.llvm.org/D73887
It fixes now what 1c991f907a tried to fix.
(A test case failture on 32-bit Arch Linux)
On 32-bit hosts it still fails (because it truncates the `Pos` value to 32 bits).
It seems happens because of `sizeof` that returns `size_t`, which has a
different size on 32/64 bits hosts.
I've tested on a 32-bit host and verified that relocation-errors.test test and
other LLVM tools tests pass now.
The LLJIT::MachOPlatformSupport class used to unconditionally attempt to
register __objc_selrefs and __objc_classlist sections. If libobjc had not
been loaded this resulted in an assertion, even if no objc sections were
actually present. This patch replaces this unconditional registration with
a check that no objce sections are present if libobjc has not been loaded.
This will allow clients to use MachOPlatform with LLJIT without requiring
libobjc for non-objc code.
Summary: Decompose callThroughToSymbol() into findReexport(), resolveSymbol(), notifyResolved() and reportCallThroughError(). This allows derived classes to reuse the functionality while adding their own code in between.
Reviewers: lhames
Reviewed By: lhames
Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75084
Spin-off from D75407. As described there, ConstantFoldConstant()
currently returns null for non-ConstantExpr/ConstantVector inputs,
but otherwise always returns non-null, independently of whether
any folding has happened or not.
This is confusing and makes consumer code more complicated.
I would expect either that ConstantFoldConstant() returns only if
it actually folded something, or that it always returns non-null.
I'm going to the latter possibility here, which appears to be more
useful considering existing usage.
Differential Revision: https://reviews.llvm.org/D75543
The addition of MatrixBuilder.h broke the modules build:
```
While building module 'LLVM_intrinsic_gen' imported from llvm-project/llvm/lib/IR/AbstractCallSite.cpp:19:
While building module 'LLVM_IR' imported from llvm-project/llvm/include/llvm/IR/Argument.h:19:
In file included from <module-includes>:6:
llvm-project/llvm/include/llvm/IR/MatrixBuilder.h:19:10: fatal error: cyclic dependency in module 'LLVM_intrinsic_gen': LLVM_intrinsic_gen -> LLVM_IR -> LLVM_intrinsic_gen
^
While building module 'LLVM_intrinsic_gen' imported from llvm-project/llvm/lib/IR/AbstractCallSite.cpp:19:
In file included from <module-includes>:1:
llvm-project/llvm/include/llvm/IR/Argument.h:19:10: fatal error: could not build module 'LLVM_IR'
~~~~~~~~^~~~~~~~~~~~~~~~~
llvm-project/llvm/lib/IR/AbstractCallSite.cpp:19:10: fatal error: could not build module 'LLVM_intrinsic_gen'
~~~~~~~~^~~~~~~~~~~~~~~~~~~~
```
As discussed in the commit thread for rGa253a2a and D73978, we can do more undef folding for FP ops.
The nnan and ninf fast-math-flags specify that if an operand is the disallowed value, the result is
poison, so we can produce an undef result.
But this doesn't work as expected (the undef operand cases remain) because of a Flags propagation
problem in SelectionDAGBuilder.
I've added DAGCombiner calls to enable these for the other cases because we've shown in other
patches that (because of the limited way that SDAG iterates), it is possible to miss simplifications
like this if they are done only at node creation time.
Several potential follow-ups to expand on this patch are possible.
Differential Revision: https://reviews.llvm.org/D75576
Summary:
getInitialLength is a *DWARF*DataExtractor method so I had to "upgrade"
some DataExtractors to be able to make use of it.
Reviewers: ikudrin, jhenderson, probinson
Subscribers: aprantl, hiraditya, llvm-commits, dblaikie
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75535
This builder provides a convenient way for targets to lower various matrix
operations to LLVM IR, making use of matrix intrinsics where available.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D72280
We use size_t for a file offset what is wrong, because size_t is 32-bit
value on 32-bit platforms.
I was reported that after my 0b511c2302
"[llvm-readobj] - Report warnings instead of errors for broken relocations."
The following error is observed on 32-bit Arch Linux:
[100%] Running all regression tests
FAIL: LLVM :: tools/llvm-readobj/ELF/relocation-errors.test (52954 of 54768)
******************** TEST 'LLVM :: tools/llvm-readobj/ELF/relocation-errors.test' FAILED ***
...
llvm-project/llvm/test/tools/llvm-readobj/ELF/relocation-errors.test:9:14:error: LLVM-NEXT: expected string not found in input
# LLVM-NEXT: warning: '[[FILE]]': unable to print relocation 1 in section 3: unable to access section [index 6] data at 0x17e7e7e8b0: offset goes past the end of file
^
<stdin>:9:1: note: scanning from here
/llvm-project/build/bin/llvm-readobj: warning: 'llvm-project/build/test/tools/llvm-readobj/ELF/Output/relocation-errors.test.tmp64': unable to print relocation 1 in section 3: unable to access section [index 6] data at 0xe7e7e8b0: offset goes past the end of file
This patch should fix the issue.
Summary:
The VSHLC instruction performs a left shift of a whole vector register
by an immediate shift count up to 32, shifting in new bits at the low
end from a GPR and delivering the shifted-out bits from the high end
back into the same GPR.
Since the instruction produces two outputs (the shifted vector
register and the output GPR of shifted-out bits), it has to be
instruction-selected in C++ rather than Tablegen.
Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard
Reviewed By: miyuki
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75445
Summary:
These are exactly parallel to the existing `vadciq` intrinsics, which
we implemented last year as part of the original MVE intrinsics
framework setup.
Just like VADC/VADCI, the MVE VSBC/VSBCI instructions deliver two
outputs, both of which the intrinsic exposes: a modified vector
register and a carry flag. So they have to be instruction-selected in
C++ rather than Tablegen. However, in this case, that's trivial: the
same C++ isel routine we already have for VADC works unchanged, and
all we have to do is to pass it a different instruction id.
Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard
Reviewed By: miyuki
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75444
Summary:
```
br i1 c, BB1, BB2:
BB1:
use1(c)
BB2:
use2(c)
```
In BB1 and BB2, c is never undef or poison because otherwise the branch would have triggered UB.
Checked with Alive2
Reviewers: xbolva00, spatel, lebedev.ri, reames, jdoerfert, nlopes, sanjoy
Reviewed By: reames
Subscribers: jdoerfert, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75401
Summary:
https://gist.github.com/modocache/ed7c62f6e570766c0f39b35dad675c2f
is an example of a small C++ program that uses C++20 coroutines that
is difficult to debug, due to the loss of debug info for variables that
"spill" across coroutine suspension boundaries. This patch addresses
that issue by inserting 'llvm.dbg.declare' intrinsics that point the
debugger to the variables' location at an offset to the coroutine frame.
With this patch, I confirmed that running the 'frame variable' commands in
https://gist.github.com/modocache/ed7c62f6e570766c0f39b35dad675c2f at
the specified breakpoints results in the correct values being printed
for coroutine frame variables 'i' and 'j' when using an lldb built from
trunk, as well as with gdb 8.3 (lldb 9.0.1, however, could not print the
values). The added test case also verifies this improved behavior.
The existing coro-debug.ll test case is also modified to reflect the
locations at which Clang actually places calls to 'dbg.declare', and
additional checks are added to ensure this patch works as intended in that
example as well.
Reviewers: vsk, jmorse, GorNishanov, lewissbaker, wenlei
Subscribers: EricWF, aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75338
Summary: This patch adds an analysis pass to collect loop nests and
summarize properties of the nest (e.g the nest depth, whether the nest
is perfect, what's the innermost loop, etc...).
The motivation for this patch was discussed at the latest meeting of the
LLVM loop group (https://ibm.box.com/v/llvm-loop-nest-analysis) where we
discussed
the unimodular loop transformation framework ( “A Loop Transformation
Theory and an Algorithm to Maximize Parallelism”, Michael E. Wolf and
Monica S. Lam, IEEE TPDS, October 1991). The unimodular framework
provides a convenient way to unify legality checking and code generation
for several loop nest transformations (e.g. loop reversal, loop
interchange, loop skewing) and their compositions. Given that the
unimodular framework is applicable to perfect loop nests this is one
property of interest we expose in this analysis. Several other utility
functions are also provided. In the future other properties of interest
can be added in a centralized place.
Authored By: etiotto
Reviewer: Meinersbur, bmahjour, kbarton, Whitney, dmgreen, fhahn,
reames, hfinkel, jdoerfert, ppc-slack
Reviewed By: Meinersbur
Subscribers: bryanpkc, ppc-slack, mgorny, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D68789
```
// clang -c -gdwarf-5 a.s -o a.o
.section .init; ret
.text; ret
```
.debug_info contains DW_AT_ranges and llvm-dwarfdump will report
a verification error because .debug_rnglists does not exist (not
implemented).
This patch generates .debug_rnglists for assembly files.
emitListsTableHeaderStart() in DwarfDebug.cpp can be shared with
MCDwarf.cpp. Because CodeGen depends on MC, I move the function to
MCDwarf.cpp
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D75375
Summary:
The argument that sets the prefetch type of a prefetch intrinsic must
be an immediate value.
Reviewers: andwar, sdesmalen, efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75482
Use MIOperand in collectLocalKilledOperands to make the search
global, as we already have to search for global uses too. This
allows us to delete more dead code when tail predicating.
Differential Revision: https://reviews.llvm.org/D75167
Summary: This patch adds an analysis pass to collect loop nests and
summarize properties of the nest (e.g the nest depth, whether the nest
is perfect, what's the innermost loop, etc...).
The motivation for this patch was discussed at the latest meeting of the
LLVM loop group (https://ibm.box.com/v/llvm-loop-nest-analysis) where we
discussed
the unimodular loop transformation framework ( “A Loop Transformation
Theory and an Algorithm to Maximize Parallelism”, Michael E. Wolf and
Monica S. Lam, IEEE TPDS, October 1991). The unimodular framework
provides a convenient way to unify legality checking and code generation
for several loop nest transformations (e.g. loop reversal, loop
interchange, loop skewing) and their compositions. Given that the
unimodular framework is applicable to perfect loop nests this is one
property of interest we expose in this analysis. Several other utility
functions are also provided. In the future other properties of interest
can be added in a centralized place.
Authored By: etiotto
Reviewer: Meinersbur, bmahjour, kbarton, Whitney, dmgreen, fhahn,
reames, hfinkel, jdoerfert, ppc-slack
Reviewed By: Meinersbur
Subscribers: bryanpkc, ppc-slack, mgorny, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D68789
Summary: This patch adds a new way to query operand bundles of an llvm.assume that is much better suited to some users like the Attributor that need to do many queries on the operand bundles of llvm.assume. Some modifications of the IR like replaceAllUsesWith can cause information in the map to be outdated, so this API is more suited to analysis passes and passes that don't make modification that could invalidate the map.
Reviewers: jdoerfert, sstefan1, uenoku
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75020
llvm/Support/Base64, fix its implementation and provide a decent test suite.
Previous implementation code was using + operator instead of | to combine
results, which is a problem when shifting signed values. (0xFF << 16) is
implicitly converted to a (signed) int, and thus results in 0xffff0000,
h is
negative. Combining negative numbers with a + in that context is not what we
want to do.
This is a recommit of 5a1958f267 with UB removved.
This fixes https://github.com/llvm/llvm-project/issues/149.
Differential Revision: https://reviews.llvm.org/D75057