Using the legacy PM for the optimization pipeline is deprecated and in
the process of being removed. This is a small step in that direction.
For an example of migrating to the new PM:
853b57fe80
This class will be used to properly solve the `__imp_` symbol and jump-thunk generation issues. It is assumed to be the last definition generator to be called, and as it's the last generator the only symbols remaining in the lookup set are the symbols that are supposed to be queried outside this jitdylib. Instead of just letting them through, we issue another lookup invocation and fetch the allocated addresses, and then create jitlink graph containing `__imp_` GOT symbols and jump-thunks targetting the fetched addresses.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D131833
The basic patterns look like this:
https://alive2.llvm.org/ce/z/MDj9EC
The tests have a use of the overflow value too.
Otherwise, existing folds should reduce already.
This was noted as a missing IR fold in:
926e7312b2
Hopefully, this makes it easier to implement a backend
fix because we should get the same IR regardless of
whether the source used builtins or inline code.
We have a good selection of W instructions, so promoting a truncated
value back to i64 is often free.
This appears to be a net code size reduction on SPECINT2006.
This has been split from D130397 as one of the patches needed to
complete that.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D131819
We already support SGE, so the same logic should hold for SLE with
the LHS and RHS swapped.
I didn't see this in the wild. Just happened to walk past this code
and thought it was odd that it was asymmetric in what condition
codes it handled.
Reviewed By: spatel, reames
Differential Revision: https://reviews.llvm.org/D131805
I had hoped to make this a generic fold in DAGCombine, but there's quite a few regressions in Thumb2 MVE that need addressing first.
Fixes regressions from D106675.
This is a followup to D131350, which caused another problem for i64
types being split into i32 on i32 targets. This patch tries to make sure
that either Illegal types are OK, or that the element types of a
buildvector are legal and bigger than or equal to the size of the
original elements.
Differential Revision: https://reviews.llvm.org/D131883
R1 is a reserved register, but LLVM gives the APIs to know when it is
used or not. So this patch uses these APIs to only save/clear/restore R1
in interrupts when necessary.
The main issue here was getting inline assembly to work. One could argue
that this is the job of Clang, but for consistency I've made sure that
R1 is always usable in inline assembly even if that means clearing it
when it might not be needed.
Information on inline assembly in AVR can be found here:
https://www.nongnu.org/avr-libc/user-manual/inline_asm.html#asm_code
Essentially, this seems to suggest that r1 can be freely used in avr-gcc
inline assembly, even without specifying it as an input operand.
Differential Revision: https://reviews.llvm.org/D117426
The code to support the case when the register allocator has assigned
the same register to the src and the dst register operand isn't actually
needed:
* LDWRdPtr and LDDWRdPtrQ have an @earlyclobber on the output
register, so the register allocator will make sure to allocate a
different register for the output register.
* LDDWRdYQ does not have an @earlyclobber, but the pointer register is
the fixed Y register which is reserved. The register allocator won't
use reserved registers for the output value.
This removes a special case in the code that makes the pseudo
instruction expansion pass more complicated than it needs to be.
Differential Revision: https://reviews.llvm.org/D131844
This patch really just extends D39946 towards stores as well as loads.
While the patch is in SelectionDAGBuilder, it only applies to AVR (the
only target that supports unaligned atomic operations).
Differential Revision: https://reviews.llvm.org/D128483
This reverts commit 34ae308c73.
Our internal testing found a miscompile. Not sure if it's caused by
this patch or it revealed something else. Reverting while investigating.
The outer signext_inreg is redundant in the following:
Fold (signext_inreg (extract_subvector (zext|anyext|sext iN_value to _) _) from iN)
-> (extract_subvector (signext iN_value to iM))
Tests are precommitted and clone those by analogy from the AND case in
the same file. Add a negative test to check extension width is handled
correctly.
This patch supersedes D130700.
Differential Revision: https://reviews.llvm.org/D131503
These are guaranteed not to create undef/poison (although they may pass through) - the associated ISD::VALUETYPE node is also guaranteed never to generate poison
`getContext().setMCLineTableRootFile` (from D62074) sets `RootFile.Name` to
`FirstCppHashFilename`. `RootFile.Name` is not processed by -fdebug-prefix-map
and will go to DW_TAG_compile_unit's DT_AT_name and DW_TAG_label's
DW_AT_decl_file. Remap `RootFile.Name`.
Fix another issue reported by https://github.com/llvm/llvm-project/issues/56609
Reviewed By: #debug-info, dblaikie, raj.khem
Differential Revision: https://reviews.llvm.org/D131848
Add a fix to check that FDE pc-begin targets are defined before calling
getBlock (which will crash if the target is not defined). FDE pc-begins
pointing at undefined symbols are expected to arise only in obscure
circumstances (malformed objects, or removal of targets by JITLink
passes), but we want to handle them gracefully. With this patch the
FDE will be retained, but without any keepalive edge to it. Unless
some pass takes action to mark it as live it will be dead-stripped.
To make it easier for passes to connect FDEs to their targets a new
EHFrameCFIBlockInspector utility is added. This allows clients to
quickly determine whether a CFI record is a CIE or an FDE (assuming
that it's valid), and retrieve any personality, pc-begin, cie, or
LSDA edges associated with it.
MapperJITLinkMemoryManager uses a free list to keep track of available
memory regions. Using an IntervalMap instead of vector allow automatic
coalescing of memory regions as they are freed.
Differential Revision: https://reviews.llvm.org/D131831
This is to avoid f16->i64 being lowered to `__fixhfdi/__fixunshfdi` on 32-bits since neither libgcc nor compiler-rt provide them. https://godbolt.org/z/cjWEsea5v
It also helps to improve the performance by promoting the vector type.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D131828
This reverts commit 38c2366b3f.
This patch seems to break boostraping LLVM with `-fglobal-isel -O3`
on AArch64 hardware. Without the revert, there are 500+ test
failures for the `check-llvm-codegen-x86` target.
(A | ?) | (A ^ B) --> (A | ?) | B
https://alive2.llvm.org/ce/z/dbNQw4
This extends the existing transform to peek through
another 'or' instruction for the common operand.
This is the underlying missing fold that should allow
issue #56711 and issue #57120 to reduce even more.
(setcc x, y, eq/neq) are seqz, snez that set rd = 0/1.
addi is used to process immediate, which can save instructions for load immediate.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D131471
Make the sched_getaffinity based implementation available to all architectures
(except s390x/x86 which have a custom implementation). The `CPU_ALLOC(2048)`
code supports all `CONFIG_NR_CPUS` values in Linux kernel `arch/*/configs/`.
The function is mainly used by in-process ThinLTO to decide the default number
of threads. Returning -1 will use just one thread.
Android is excluded because of the higher API level requirement:
`sched_getaffinity; # introduced-arm=12 introduced-arm64=21 introduced-x86=12 introduced-x86_64=21`
When memory is deallocated from MapperJITLinkMemoryManager deinitialize
actions are run through mapper and in case of InProcessMapper, memory
protections of the region are reset to read/write as they were previously
changed and can be reused in future.
Differential Revision: https://reviews.llvm.org/D131768
Reland commit 719658d078
The base RA support infrastructure that only allow a specific register
class be allocated in RA pss. Since greedy RA, basic RA derived from
base RA, they all allow allocating specific register class. Fast RA
doesn't support allocating register for specific register class. This
patch is to enable ShouldAllocateClass in fast RA, so that it can
support allocating register for specific register class.
Differential Revision: https://reviews.llvm.org/D131825
Initial platform support for COFF/x86_64.
Completed features:
* Statically linked orc runtime.
* Full linking/initialization of static/dynamic vc runtimes and microsoft stl libraries.
* SEH exception handling.
* Full static initializers support
* dlfns
* JIT side symbol lookup/dispatch
Things to note:
* It uses vc runtime libraries found in vc toolchain installations.
* Bootstrapping state is separated because when statically linking orc runtime it needs microsoft stl functions to initialize the orc runtime, but static initializers need to be ran in order to fully initialize stl libraries.
* Process symbols can't be used blidnly on msvc platform; otherwise duplicate definition error gets generated. If process symbols are used, it's destined to get out-of-reach error at some point.
* Atexit currently not handled -- will be handled in the follow-up patches.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D130479
When memory is deallocated from MapperJITLinkMemoryManager deinitialize
actions are run through mapper and in case of InProcessMapper, memory
protections of the region are reset to read/write as they were previously
changed and can be reused in future.
Differential Revision: https://reviews.llvm.org/D131768
DWARF v5 6.2.4 The Line Number Program Header says:
> The first entry is the current directory of the compilation. Each additional
> path entry is either a full path name or is relative to the current directory of
> the compilation.
When forming a path, relative DW_AT_comp_dir and directories[0] are not supposed
to be joined together. Fix getFileNameByIndex to special case DWARF v5 DirIdx == 0.
Reviewed By: #debug-info, dblaikie
Differential Revision: https://reviews.llvm.org/D131804
The linker may convert such an ADD into a LEA, so we must not
use the EFLAGS output.
This causes miscompiles with -fsanitize=null after
bacdf80f42 added
llvm.threadlocal.address -- previously, global variables were known to
be non-null, but the intrinsic is not currently known to return
nonnull. (That should be corrected, but it shouldn't've caused
miscompiles!)
Differential Revision: https://reviews.llvm.org/D131716
For generated assembly debug info, MCDwarfLineTableHeader::CompilationDir is an
unmapped path set in MCContext::setGenDwarfRootFile. Remap it.
A relative destination path of -fdebug-prefix-map= exposes a llvm-dwarfdump bug
which joins relative DW_AT_comp_dir and directories[0].
Fix https://github.com/llvm/llvm-project/issues/56609
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D131749
This patch supports SPIR-V capabilities and extensions. In addition,
it inserts decorations related to MIFlags and improves support of switches.
Five tests are included to demonstrate the improvement.
Differential Revision: https://reviews.llvm.org/D131221
Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com>
Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com>
Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>
Allows for even more savings in the binary image while simultaneously removing the name of the offending stack variable.
Depends on D131631
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D131728
We were dereferencing an empty Optional if IgnoreErrors was true and the
stat failed.
rdar://60887887
Differential Revision: https://reviews.llvm.org/D131791
The value of the attribute is a size in bytes. It has the effect of
suppressing inlining of functions whose stacksizes exceed the given value.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D129904
We're seeing non-determinism with loading sample profiles. It seems to
be related to the order in which we merge FunctionSamples in
promoteMergeNotInlinedContextSamples(). Use a MapVector to iterate over
NonInlinedCallSites in the order entries were inserted.
Reviewed By: wenlei, davidxl
Differential Revision: https://reviews.llvm.org/D131592
The code was relicensed by its owner (Unicode.org) a long time back,
but we still had the old (problematic) license in our fork.
Note that the source files have not been distributed from unicode.org
since 2009 (due to being buggy and unmaintained upstream), but they
were given this license before that.
Fixes https://github.com/llvm/llvm-project/issues/32309
Differential Revision: https://reviews.llvm.org/D66390
The goal is to reduce the size of the MSAN with track origins binary, by making
the variable name locations constant which will allow the linker to compress
them.
Follows: https://reviews.llvm.org/D131415
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D131631
SimplifyMultipleUseDemandedBits shouldn't be creating general nodes like this - although we allow bitcasts, even general constant folding is avoided.
Removing it causes a number of regressions that need addressing first, but I've added a TODO for now.
Contextual knowledge may be used to prove invariance of some conditions.
For example, in this case:
```
; %len >= 0
guard(%iv = {start,+,1}<nuw> <s %len)
guard(%iv = {start,+,1}<nuw> <u %len)
```
the 2nd check always fails if `start` is negative and always passes otherwise.
It looks like there are more opportunities of this kind that are still to be
implemented in the future.
Differential Revision: https://reviews.llvm.org/D129753
Reviewed By: apilipenko
Closing https://github.com/llvm/llvm-project/issues/56329
The problem happens when we try to simplify the suspend points. We might
break the assumption that the final suspend lives in the last slot of
Shape.CoroSuspends. This patch tries to main the assumption and fixes
the problem.
TLS debug on AIX is not ready for now.
The location generated in no-integrated-as mode is wrong and
in integrated-as mode causes AIX linker error.
Reviewed By: Esme
Differential Revision: https://reviews.llvm.org/D130245
Since c5b3de6745 (git main,
August 11th), Clang does generate working hidden visibility
on MinGW targets. Using that reduces the number of exports from
a dylib build of LLVM significantly, which is vital for fitting
within the limit of 64k exported symbols from a DLL.
It's essential that if we set CMAKE_CXX_VISIBILITY_PRESET=hidden
(which passes -fvisibility=hidden on the command line), we also
must define LLVM_EXTERNAL_VISIBILITY consistently to override
it. (If there are mismatches, e.g. setting hidden visibility generally
but never overriding it back to default for the symbols that do need
to be exported, we'd get broken builds in such configurations.)
We don't want to be using __attribute__((visibility("hidden"))) on
MinGW with GCC, because GCC produces a warning about it. (GCC hasn't
warned about the command line options that set hidden visibility
though.) Clang has historically not warned about either of them, so
it is harmless to use the hidden visibility when building with older
Clang (so we don't need to detect the exact version of Clang/LLVM where
it has an effect).
This reduces the number of exported symbols for a dylib build of LLVM;
previously libLLVM exported around 64650 symbols (when the maximum is
65536) when the ARM, AArch64 and X86 targets were enabled. If enabling
more targets (or if building with e.g. assertions enabled), it would
exceed the limit. Now with visibility flags in use, the same build
with ARM, AArch64 and X86 ends up at around 35k exported symbols.
Differential Revision: https://reviews.llvm.org/D131661
Including patterns to select addiw if only the lower 32 bits are used.
I'm not excited about adding this many patterns. I'm looking at whether
we can create the xori during lowering and move the ineg patterns to
DAGCombiner.
lowerShuffleWithVPMOV currently only matches shuffle(truncate(x)) patterns, but on VLX targets the truncate isn't usually necessary to make the VPMOV node worthwhile (as we're only targetting v16i8/v8i16 shuffles we're almost always ending up with a PSHUFB node instead). PACKSS/PACKUS are still preferred vs VPMOV due to their lower uop count.
Fixes the remaining regression from the fixes in rG293899c64b75
My most recent change for D131607 had a formatting error that I didn't
notice until after I committed it. Let me fix it now so changes to this
file will be back-to-back from me.
We manage to iteratively achieve this result with no extra
uses, and the reassociate pass can also do this, but this
pattern falls through the cracks in the example from
issue #57053.
Another ticket split out of D107285, this extends the optimization
of 0.0 - -X to just X when using constrained intrinsics and the
optimization is allowed.
If the negation of X is done with fsub then the match fails because of
the lack of IR Matcher support for constrained intrinsics.
While I'm here, remove some TODO notices since the work is no longer
planned.
Differential Revision: https://reviews.llvm.org/D131607
As mentioned on D128934 - we weren't including the CPUID bit handling for the RDPRU instruction
AMD's APMv3 (24594) lists it as CPUID Fn8000_0008_EBX Bit#4
A bfloat select operation will currently crash, but is allowed from C.
This adds handling for the operation, turning it into a FCSELHrrr if
fullfp16 is present, or converting it to a FCSELSrrr if not. The
FCSELSrrr is created via using INSERT_SUBREG/EXTRACT_SUBREG to convert
the bf16 to a f32 and using the f32 pattern for FCSELSrrr. (I originally
attempted to do this via a tablegen pattern, but it appears that the
nzcv glue is places onto the wrong node, causing it to be forgotten and
incorrect scheduling to be emitted).
The FCSELSrrr can also be used for fp16 selects when +fullfp16 is not
present, which helps avoid an unnecessary promotion to f32.
Differential Revision: https://reviews.llvm.org/D131253
Src1 for mbcnt can be a non-zero literal or register. Take this into account
when calculating known bits.
Differential Revision: https://reviews.llvm.org/D131478
The patch uses peephole method to fold merge.vvm and unmasked intrinsics to
masked intrinsics. Using peephole intead of tablegen patterns is to avoid large
auto gnerated code.
Note: The patch ignores segment loads since I don't know how to test them.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D130442
Follow-up after D131595, see comments in the review thread.
The intention of having two constructors was to minimize the copies of
`vector`, but a lack of `std::move` on the call site caused the wrong
constructor to be called.
Switched to a single constructor that accepts a value.
Accepting by value allows to have a single constructor and still decide
to copy or move on the call site.