Summary:
This patch checks if the section order is correct when reading a wasm
object file in `WasmObjectFile` and converting YAML to wasm object in
yaml2wasm. (It is not possible to check when reading YAML because it is
handled exclusively by the YAML reader.)
This checks the ordering of all known sections (core sections + known
custom sections). This also adds section ID DataCount section that will
be scheduled to be added in near future.
Reviewers: sbc100
Subscribers: dschuff, mgorny, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54924
llvm-svn: 349221
Summary:
This allows us to register it with the MachineFunction delegate and be
notified automatically about erasure and creation of instructions. However,
we still need explicit notification for modifications such as those caused
by setReg() or replaceRegWith().
There is a catch with this though. The notification for creation is
delivered before any operands can be added. While appropriate for
scheduling combiner work. This is unfortunate for debug output since an
opcode by itself doesn't provide sufficient information on what happened.
As a result, the work list remembers the instructions (when debug output is
requested) and emits a more complete dump later.
Another nit is that the MachineFunction::Delegate provides const pointers
which is inconvenient since we want to use it to schedule future
modification. To resolve this GISelWorkList now has an optional pointer to
the MachineFunction which describes the scope of the work it is permitted
to schedule. If a given MachineInstr* is in this function then it is
permitted to schedule work to be performed on the MachineInstr's. An
alternative to this would be to remove the const from the
MachineFunction::Delegate interface, however delegates are not permitted
to modify the MachineInstr's they receive.
In addition to this, the observer has three interface changes.
* erasedInstr() is now erasingInstr() to indicate it is about to be erased
but still exists at the moment.
* changingInstr() and changedInstr() have been added to report changes
before and after they are made. This allows us to trace the changes
in the debug output.
* As a convenience changingAllUsesOfReg() and
finishedChangingAllUsesOfReg() will report changingInstr() and
changedInstr() for each use of a given register. This is primarily useful
for changes caused by MachineRegisterInfo::replaceRegWith()
With this in place, both combine rules have been updated to report their
changes to the observer.
Finally, make some cosmetic changes to the debug output and make Combiner
and CombinerHelp
Reviewers: aditya_nandakumar, bogner, volkan, rtereshin, javed.absar
Reviewed By: aditya_nandakumar
Subscribers: mgorny, rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D52947
llvm-svn: 349167
Implement options in clang to enable recording the driver command-line
in an ELF section.
Implement a new special named metadata, llvm.commandline, to support
frontends embedding their command-line options in IR/ASM/ELF.
This differs from the GCC implementation in some key ways:
* In GCC there is only one command-line possible per compilation-unit,
in LLVM it mirrors llvm.ident and multiple are allowed.
* In GCC individual options are separated by NULL bytes, in LLVM entire
command-lines are separated by NULL bytes. The advantage of the GCC
approach is to clearly delineate options in the face of embedded
spaces. The advantage of the LLVM approach is to support merging
multiple command-lines unambiguously, while handling embedded spaces
with escaping.
Differential Revision: https://reviews.llvm.org/D54487
Clang Differential Revision: https://reviews.llvm.org/D54489
llvm-svn: 349155
Summary:
The two utility functions were added in D47919 to support SHT_RELR.
However, these are just relative relocations types and are't
necessarily be named Relr.
Reviewers: phosek, dberris
Reviewed By: dberris
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55691
llvm-svn: 349133
build version load commands in the object file
This commit introduces a new metadata node called "SDK Version". It will be set
by the frontend to mark the platform SDK (macOS/iOS/etc) version which was used
during that particular compilation.
This node is used when machine code is emitted, by either saving the SDK version
into the appropriate macho load command (version min/build version), or by
emitting the assembly for these load commands with the SDK version specified as
well.
The assembly for both load commands is extended by allowing it to contain the
sdk_version X, Y [, Z] trailing directive to represent the SDK version
respectively.
rdar://45774000
Differential Revision: https://reviews.llvm.org/D55612
llvm-svn: 349119
Summary:
This patch computes the synthetic function entry count on the whole
program callgraph (based on module summary) and writes the entry counts
to the summary. After function importing, this count gets attached to
the IR as metadata. Since it adds a new field to the summary, this bumps
up the version.
Reviewers: tejohnson
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D43521
llvm-svn: 349076
Summary:
llvm-size uses "isText()" etc. which seem to indicate whether the section contains code-like things, not whether or not it will actually go in the text segment when in a fully linked executable.
The unit test added (elf-sizes.test) shows some types of sections that cause discrepencies versus the GNU size tool. llvm-size is not correctly reporting sizes of things mapping to text/data segments, at least for ELF files.
This fixes pr38723.
Reviewers: echristo, Bigcheese, MaskRay
Reviewed By: MaskRay
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54369
llvm-svn: 349074
This is split from D55452 with the correct patch this time.
Pairwise reductions require two shuffles on every level but the last. On the last level the two shuffles are <1, u, u, u...> and <0, u, u, u...>, but <0, u, u, u...> will be dropped by InstCombine/DAGCombine as being an identity shuffle.
Differential Revision: https://reviews.llvm.org/D55615
llvm-svn: 349072
When calling BinaryStreamArray::drop_front(), if the stream
is skewed it means we must never drop the first bytes of the
stream since offsets which occur in records assume the existence
of those bytes. So if we want to skip the first record in a
stream, then what we really want to do is just set the begin
pointer to the next record. But we shouldn't actually remove
those bytes from the underlying view of the data.
llvm-svn: 349066
Move existing rotation expansion code into TargetLowering and set it up for vectors as well.
Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment.
Begun removing x86 vector rotate custom lowering to use the expansion.
llvm-svn: 349025
Summary:
In addition to knowing that an instruction is changed. It's also useful to
know when it's about to change. For example, it might print the instruction so
you can track the changes in a debug log, it might remove it from some queue
while it's being worked on, or it might want to change several instructions as
a single transaction and act on all the changes at once.
Added changingInstr() to all existing uses of changedInstr()
Reviewers: aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D55623
llvm-svn: 348992
Summary:
There's little of interest that can be done to an already-erased instruction.
You can't inspect it, write it to a debug log, etc. It ought to be notification
that we're about to erase it. Rename the function to clarify the timing of the
event and reflect current usage.
Also fixed one case where we were trying to print an erased instruction.
Reviewers: aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D55611
llvm-svn: 348976
Use the replacement execute once threading support in LLVM for PPC64le. It
seems that GCC does not define `__ppc__` and so we would actually call out to
the C++ runtime there which is not what the current code intended. Check both
`__ppc__` and `__PPC__`. This avoids the need for checking the endianness.
Thanks to nemanjai for the hint about GCC's behaviour and the fact that the
reviewed condition could be simplified.
Original patch by Sarvesh Tamba!
llvm-svn: 348970
Continue to present HSA metadata as YAML in ASM and when output by tools
(e.g. llvm-readobj), but encode it in Messagepack in the code object.
Differential Revision: https://reviews.llvm.org/D48179
llvm-svn: 348963
This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns.
It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller.
A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS).
I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection.
Differential Revision: https://reviews.llvm.org/D55426
llvm-svn: 348953
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.
#pragma clang loop unroll_and_jam(enable)
#pragma clang loop distribute(enable)
is the same as
#pragma clang loop distribute(enable)
#pragma clang loop unroll_and_jam(enable)
and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.
This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,
!0 = !{!0, !1, !2}
!1 = !{!"llvm.loop.unroll_and_jam.enable"}
!2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
!3 = !{!"llvm.loop.distribute.enable"}
defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.
Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.
For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.
Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.
To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.
With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).
Reviewed By: hfinkel, dmgreen
Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288
llvm-svn: 348944
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D54719
llvm-svn: 348912
Without this check, we hit an assertion in getZExtValue, if the constant
value does not fit into an uint64_t.
As getZExtValue returns an uint64_t, should we update
getAggregateElement to take an uin64_t as well?
This fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6109.
Reviewers: efriedma, craig.topper, spatel
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D55547
llvm-svn: 348906
https://reviews.llvm.org/D55516
Add the ability to pass in flags to buildInstr calls. Currently no
validation is performed but that can be easily performed based on the
opcode (if necessary).
Reviewed by: paquette.
llvm-svn: 348893
IR-printing AfterPass instrumentation might be called on a loop
that has just been invalidated. We should skip printing it to
avoid spurious asserts.
Reviewed By: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D54740
llvm-svn: 348887
This change makes DT_SONAME treated as an optional trait for ELF TextAPI
stubs. This change accounts for the fact that shared objects aren't
guaranteed to have a DT_SONAME entry. Tests have been updated to check
for correct behavior of an optional soname.
Differential Revision: https://reviews.llvm.org/D55533
llvm-svn: 348817
https://reviews.llvm.org/D55294
Previously MachineIRBuilder::buildInstr used to accept variadic
arguments for sources (which were either unsigned or
MachineInstrBuilder). While this worked well in common cases, it doesn't
allow us to build instructions that have multiple destinations.
Additionally passing in other optional parameters in the end (such as
flags) is not possible trivially. Also a trivial call such as
B.buildInstr(Opc, Reg1, Reg2, Reg3)
can be interpreted differently based on the opcode (2defs + 1 src for
unmerge vs 1 def + 2srcs).
This patch refactors the buildInstr to
buildInstr(Opc, ArrayRef<DstOps>, ArrayRef<SrcOps>)
where DstOps and SrcOps are typed unions that know how to add itself to
MachineInstrBuilder.
After this patch, most invocations would look like
B.buildInstr(Opc, {s32, DstReg}, {SrcRegs..., SrcMIBs..});
Now all the other calls (such as buildAdd, buildSub etc) forward to
buildInstr. It also makes it possible to build instructions with
multiple defs.
Additionally in a subsequent patch, we should make it possible to add
flags directly while building instructions.
Additionally, the main buildInstr method is now virtual and other
builders now only have to override buildInstr (for say constant
folding/cseing) is straightforward.
Also attached here (https://reviews.llvm.org/F7675680) is a clang-tidy
patch that should upgrade the API calls if necessary.
llvm-svn: 348815
Not sure how I missed that in my testing, but obvious enough - this
causes segfaults when attempting to dereference the Error in end
iterators.
llvm-svn: 348814
Using an Error as an out parameter from an indirect operation like
iteration as described in the documentation (
http://llvm.org/docs/ProgrammersManual.html#building-fallible-iterators-and-iterator-ranges
) seems to be a little fussy - so here's /one/ possible solution, though
I'm not sure it's the right one.
Alternatively such APIs may be better off being switched to a standard
algorithm style, where they take a lambda to do the iteration work that
is then called back into (eg: "Error e = obj.for_each_note([](const
Note& N) { ... });"). This would be safer than having an unwritten
assumption that the user of such an iteration cannot return early from
the inside of the function - and must always exit through the gift
shop... I mean error checking. (even though it's guaranteed that if
you're mid-way through processing an iteration, it's not in an error
state).
Alternatively we'd need some other (the super untrustworthy/thing we've
generally tried to avoid) error handling primitive that actually clears
the error state entirely so it's safe to ignore.
Fleshed this solution out a bit further during review - it now relies on
op==/op!= comparison as the equivalent to "if (Err)" testing the Error.
So just like an Error must be checked (even if it's in a success state),
the Error hiding in the iterator must be checked after each increment
(including by comparison with another iterator - perhaps this could be
constrained to only checking if the iterator is compared to the end
iterator? Not sure it's too important).
So now even just creating the iterator and not incrementing it at all
should still assert because the Error has not been checked.
Reviewers: lhames, jakehehrlich
Differential Revision: https://reviews.llvm.org/D55235
llvm-svn: 348811
Summary: The APFloat and Constant APIs taking an APInt allow arbitrary payloads,
and that's great. There's a convenience API which takes an unsigned, and that's
silly because it then directly creates a 64-bit APInt. Just change it to 64-bits
directly.
At the same time, add ConstantFP NaN getters which match the APFloat ones (with
getQNaN / getSNaN and APInt parameters).
Improve the APFloat testing to set more payload bits.
Reviewers: scanon, rjmccall
Subscribers: jkorous, dexonsmith, kristina, llvm-commits
Differential Revision: https://reviews.llvm.org/D55460
llvm-svn: 348791
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.
This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.
Differential Revisions: https://reviews.llvm.org/D53629
llvm-svn: 348788
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, for the Exynos processors.
Differential revision: https://reviews.llvm.org/D55345
llvm-svn: 348774
Summary: The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register?
Reviewers: RKSimon, spatel, ABataev
Reviewed By: RKSimon
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D55480
llvm-svn: 348739
Both intrinsics do the exact same thing so we really only need one.
Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency.
llvm-svn: 348737
Since TBEHandler doesn't maintain state or otherwise have any need to be
a class right now, the read and write functions have been moved out and
turned into standalone functions. Additionally, the TBE read function
has been updated to return an Expected value for better error handling.
Tests have been updated to reflect these changes.
Differential Revision: https://reviews.llvm.org/D55450
llvm-svn: 348735
This trait is used by several AST visitor classes to control whether the AST is visiting const nodes or non-const nodes. These uses cannot be easily replaced with the STL traits directly due to use of an unspecialized templated when a type is expected (due to the template template parameter involved).
llvm-svn: 348729
PE/COFF sections can have section names truncated to 8 chars, in order to
have the name available at runtime. (The string table, where long untruncated
names are stored, isn't loaded at runtime.)
This allows various llvm tools to dump the .eh_frame section from such
executables.
Patch by Peiyuan Song!
Differential Revision: https://reviews.llvm.org/D55407
llvm-svn: 348708
Summary:
WasmSignature used to use its `WasmSignature` member variable only for
function types, but now it also can be used for events as well.
Reviewers: sbc100
Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55247
llvm-svn: 348702
`Saver` is a StringSaver, which has a few overloads of `save` that all
ultimately just call `StringRef save(StringRef)`. Just take a StringRef
here instead of building up a std::string to convert it to a StringRef.
llvm-svn: 348650
Previously we would create an lldb::Function object for each function
parsed, but we would not add these to the clang AST. This is a first
step towards getting local variable support working, as we first need an
AST decl so that when we create local variable entries, they have the
proper DeclContext.
Differential Revision: https://reviews.llvm.org/D55384
llvm-svn: 348631
We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width.
So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops.
There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles.
Differential Revision: https://reviews.llvm.org/D55397
llvm-svn: 348621
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.
Unlike the previous iteration of this patch, getDemandedBits() can now
again be called on arbirary (sized) instructions, even if they don't
have integer or vector of integer type. (For vector types the size of the
returned mask will now be the scalar size in bits though.)
The added LoopVectorize test case shows a case which triggered an
assertion failure with the previous attempt, because getDemandedBits()
was called on a pointer-typed instruction.
Differential Revision: https://reviews.llvm.org/D55297
llvm-svn: 348602
This patch introduces a new instinsic `@llvm.experimental.widenable_condition`
that allows explicit representation for guards. It is an alternative to using
`@llvm.experimental.guard` intrinsic that does not contain implicit control flow.
We keep finding places where `@llvm.experimental.guard` is not supported or
treated too conservatively, and there are 2 reasons to that:
- `@llvm.experimental.guard` has memory write side effect to model implicit control flow,
and this sometimes confuses passes and analyzes that work with memory;
- Not all passes and analysis are aware of the semantics of guards. These passes treat them
as regular throwing call and have no idea that the condition of guard may be used to prove
something. One well-known place which had caused us troubles in the past is explicit loop
iteration count calculation in SCEV. Another example is new loop unswitching which is not
aware of guards. Whenever a new pass appears, we potentially have this problem there.
Rather than go and fix all these places (and commit to keep track of them and add support
in future), it seems more reasonable to leverage the existing optimizer's logic as much as possible.
The only significant difference between guards and regular explicit branches is that guard's condition
can be widened. It means that a guard contains (explicitly or implicitly) a `deopt` block successor,
and it is always legal to go there no matter what the guard condition is. The other successor is
a guarded block, and it is only legal to go there if the condition is true.
This patch introduces a new explicit form of guards alternative to `@llvm.experimental.guard`
intrinsic. Now a widenable guard can be represented in the CFG explicitly like this:
%widenable_condition = call i1 @llvm.experimental.widenable.condition()
%new_condition = and i1 %cond, %widenable_condition
br i1 %new_condition, label %guarded, label %deopt
guarded:
; Guarded instructions
deopt:
call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ]
The new intrinsic `@llvm.experimental.widenable.condition` has semantics of an
`undef`, but the intrinsic prevents the optimizer from folding it early. This form
should exploit all optimization boons provided to `br` instuction, and it still can be
widened by replacing the result of `@llvm.experimental.widenable.condition()`
with `and` with any arbitrary boolean value (as long as the branch that is taken when
it is `false` has a deopt and has no side-effects).
For more motivation, please check llvm-dev discussion "[llvm-dev] Giving up using
implicit control flow in guards".
This patch introduces this new intrinsic with respective LangRef changes and a pass
that converts old-style guards (expressed as intrinsics) into the new form.
The naming discussion is still ungoing. Merging this to unblock further items. We can
later change the name of this intrinsic.
Reviewed By: reames, fedor.sergeev, sanjoy
Differential Revision: https://reviews.llvm.org/D51207
llvm-svn: 348593
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.
Differential Revision: https://reviews.llvm.org/D50141
llvm-svn: 348585
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.
The getDemandedBits() method can now only be called on instructions that
have integer or vector of integer type. Previously it could be called on
any sized instruction (even if it was not particularly useful). The size
of the return value is now always the scalar size in bits (while
previously it was the type size in bits).
Differential Revision: https://reviews.llvm.org/D55297
llvm-svn: 348549
This reverts commit r348203 and reapplies D55085 with an additional
GCOV bugfix to make the change NFC for relative file paths in .gcno files.
Thanks to Ilya Biryukov for additional testing!
Original commit message:
Update Diagnostic handling for changes in CFE.
The clang frontend no longer emits the current working directory for
DIFiles containing an absolute path in the filename: and will move the
common prefix between current working directory and the file into the
directory: component.
https://reviews.llvm.org/D55085
llvm-svn: 348512
VarStreamArray was built on the assumption that it is backed by a
StreamRef, and offset 0 of that StreamRef is the first byte of the first
record in the array.
This is a logical and intuitive assumption, but unfortunately we have
use cases where it doesn't hold. Specifically, a PDB module's symbol
stream is prefixed by 4 bytes containing a magic value, and the first
byte of record data in the array is actually at offset 4 of this byte
sequence.
Previously, we would just truncate the first 4 bytes and then construct
the VarStreamArray with the resulting StreamRef, so that offset 0 of the
underlying stream did correspond to the first byte of the first record,
but this is problematic, because symbol records reference other symbol
records by the absolute offset including that initial magic 4 bytes. So
if another record wants to refer to the first record in the array, it
would say "the record at offset 4".
This led to extremely confusing hacks and semantics in loading code, and
after spending 30 minutes trying to get some math right and failing, I
decided to fix this in the underlying implementation of VarStreamArray.
Now, we can say that a stream is skewed by a particular amount. This
way, when we access a record by absolute offset, we can use the same
values that the records themselves contain, instead of having to do
fixups.
Differential Revision: https://reviews.llvm.org/D55344
llvm-svn: 348499
These opcodes are intended to subsume some of the capability of G_MERGE_VALUES,
as it was too powerful and thus complex to add deal with throughout the GISel
pipeline.
G_BUILD_VECTOR creates a vector value from a sequence of uniformly typed
scalar values. G_BUILD_VECTOR_TRUNC is a special opcode for handling scalar
operands which are larger than the destination vector element type, and
therefore does an implicit truncate.
G_CONCAT_VECTOR creates a vector by concatenating smaller, uniformly typed,
vectors together.
These will be used in a subsequent commit. This commit just adds the initial
infrastructure.
Differential Revision: https://reviews.llvm.org/D53594
llvm-svn: 348430
More refactoring.
After the changes to the pruning logic, and removing CandidateList, there's
no reason for Candiates to be shared_ptrs (or pointers at all).
std::shared_ptr<Candidate> -> Candidate.
llvm-svn: 348427
Summary:
We decided to change the event section code from 12 to 13 as new
`DataCount` section in the bulk memory operations proposal will take the
code 12 instead.
Reviewers: sbc100
Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55343
llvm-svn: 348424
Since we're now performing outlining per OutlinedFunction rather than per
Candidate, we can simply outline each candidate as it shows up.
Instead of having a pruning phase, instead, we'll outline entire functions.
Then we'll update the UnsignedVec we mapped to reflect the deletion. If any
candidate is in a space that's marked dirty, then we'll drop it.
This lets us remove the pruning logic entirely, and greatly simplifies the
code.
llvm-svn: 348420
Mostly NFC, only change is the order of outlined function names.
Loop over the outlined functions instead of walking the candidate list.
This is a bit easier to understand. It's far more natural to create a function,
then replace all of its occurrences with calls than the other way around.
The functions outlined after this do not change, but their names will be
decided by their benefit. E.g, OUTLINED_FUNCTION_0 will now always be the
most beneficial function, rather than the first one seen.
This makes it easier to enforce an ordering on the outlined functions. So,
this also adds a test to make sure that the ordering works as expected.
llvm-svn: 348414
https://reviews.llvm.org/D54980
This provides a standard API across GISel passes to observe and notify
passes about changes (insertions/deletions/mutations) to MachineInstrs.
This patch also removes the recordInsertion method in MachineIRBuilder
and instead provides method to setObserver.
Reviewed by: vkeles.
llvm-svn: 348406
Some gardening/refactoring.
It's cleaner to copy the instructions into the MachineFunction using the first
candidate instead of going to the mapper.
Also, by doing this we can remove the Seq member from OutlinedFunction entirely.
llvm-svn: 348390
Revert https://reviews.llvm.org/D55217 due to warnings-turned-into-errors in
AMGPU targets. I'll fix the warnings first, then re-commit this patch.
llvm-svn: 348375
Summary:
Many functions on `llvm::AttributeList` and `llvm::AttributeSet` are
documented with "returns a new {list,set} because attribute
{lists,sets} are immutable." This documentation can be aided by the
addition of an attribute, `LLVM_NODISCARD`. Adding this prevents
unsuspecting users of the API from expecting
`AttributeList::setAttributes` from modifying the underlying list.
At the very least, it would have saved me a few hours of debugging, since I
had been doing just that! I had a bug in my program where I was calling
`setAttributes` but then passing in the unmutated `AttributeList`.
I tried adding LLVM_NODISCARD and confirmed that it would have made my bug
immediately obvious.
Reviewers: rnk, javed.absar
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55217
llvm-svn: 348372
This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.
Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.
Differential Revision: https://reviews.llvm.org/D54698
llvm-svn: 348353
Like the already existing zip_shortest/zip_first iterators, zip_longest
iterates over multiple iterators at once, but has as many iterations as
the longest sequence.
This means some iterators may reach the end before others do.
zip_longest uses llvm::Optional's None value to mark a
past-the-end value.
zip_longest is not reverse-iteratable because the tuples iterated over
would be different for different length sequences (IMHO for the same
reason neither zip_shortest nor zip_first should be reverse-iteratable;
one can still reverse the ranges individually if that's the expected
behavior).
In contrast to zip_shortest/zip_first, zip_longest tuples contain
rvalues instead of references. This is because llvm::Optional cannot
contain reference types and the value-initialized default does not have
a memory location a reference could point to.
The motivation for these iterators is to use C++ foreach to compare two
lists of ordered attributes in D48100 (SemaOverload.cpp and
ASTReaderDecl.cpp).
Idea by @hfinkel.
This re-commits r348301 which was reverted by r348303.
The compilation error by gcc 5.4 was resolved using make_tuple in the in
the initializer_list.
The compileration error by msvc14 was resolved by splitting
ZipLongestValueType (which already was a workaround for msvc15) into
ZipLongestItemType and ZipLongestTupleType.
Differential Revision: https://reviews.llvm.org/D48348
llvm-svn: 348323
This is no longer used and is just taking up space in the structure.
Heap allocation of this structure is on the critical path, so space
actually matters.
llvm-svn: 348318
Previously these were dropped. We now understand them sufficiently
well to start emitting them. From the debugger's perspective, this
now enables us to have debug info about typedefs (both global and
function-locally scoped)
Differential Revision: https://reviews.llvm.org/D55228
llvm-svn: 348306
Like the already existing zip_shortest/zip_first iterators, zip_longest
iterates over multiple iterators at once, but has as many iterations as
the longest sequence.
This means some iterators may reach the end before others do.
zip_longest uses llvm::Optional's None value to mark a
past-the-end value.
zip_longest is not reverse-iteratable because the tuples iterated over
would be different for different length sequences (IMHO for the same
reason neither zip_shortest nor zip_first should be reverse-iteratable;
one can still reverse the ranges individually if that's the expected
behavior).
In contrast to zip_shortest/zip_first, zip_longest tuples contain
rvalues instead of references. This is because llvm::Optional cannot
contain reference types and the value-initialized default does not have
a memory location a reference could point to.
The motivation for these iterators is to use C++ foreach to compare two
lists of ordered attributes in D48100 (SemaOverload.cpp and
ASTReaderDecl.cpp).
Idea by @hfinkel.
Differential Revision: https://reviews.llvm.org/D48348
llvm-svn: 348301
Currently if you use -{start,stop}-{before,after}, it picks
the first instance with the matching pass name. If you run
the same pass multiple times, there's no way to distinguish them.
Allow specifying a run index wih ,N to specify which you mean.
llvm-svn: 348285
This reverts commit r348203.
Reason: this produces absolute paths in .gcno files, breaking us
internally as we rely on them being consistent with the filenames passed
in the command line.
Also reverts r348157 and r348155 to account for revert of r348154 in
clang repository.
llvm-svn: 348279
PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction.
The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment.
The X87 cases don't need the strict flag but generates much nicer code with it....
Differential Revision: https://reviews.llvm.org/D53794
llvm-svn: 348251
This patch renames both methods (NotifyObjectEmitted -> notifyObjectLoaded, and
NotifyObjectFreed -> notifyObjectFreed), adds an abstract "ObjectKey" (uint64_t)
parameter to notifyObjectLoaded, and replaces the ObjectFile parameter for
notifyObjectFreed with an ObjectKey. Using an ObjectKey to track identify
events, rather than a reference to the ObjectFile, allows us to free the
ObjectFile after notifyObjectLoaded is called, saving memory.
https://reviews.llvm.org/D53773
llvm-svn: 348223
If a PHI node out of extracted region has multiple incoming values from it,
split this PHI on two parts. First PHI has incomings only from region and
extracts with it (they are placed to the separate basic block that added to the
list of outlined), and incoming values in original PHI are replaced by first
PHI. Similar solution is already used in CodeExtractor for PHIs in entry block
(severSplitPHINodes method). It covers PR39433 bug.
Patch by Sergei Kachkov!
Differential Revision: https://reviews.llvm.org/D55018
llvm-svn: 348205
This allows obtaining smaller, more readable identifiers
in a more comfortable way.
Differential Revision: https://reviews.llvm.org/D54486
llvm-svn: 348197
This flag does not exist in GNU objcopy but has a major use case.
Debugging tools support the .build-id directory structure to find
debug binaries. There is no easy way to build this structure up
however. One way to do it is by using llvm-readelf and some crazy
shell magic. This implements the feature directly. It is most often
the case that you'll want to strip a file and send the original to
the .build-id directory but if you just want to send a file to the
.build-id directory you can copy to /dev/null instead.
Differential Revision: https://reviews.llvm.org/D54384
llvm-svn: 348174
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126472.html
TextAPI is a library and accompanying tool that allows conversion between binary shared object stubs and textual counterparts. The motivations and uses cases for this are explained thoroughly in the llvm-dev proposal [1]. This initial commit proposes a potential structure for the TAPI library, also including support for reading/writing text-based ELF stubs (.tbe) in addition to preliminary support for reading binary ELF files. The goal for this patch is to ensure the project architecture appropriately welcomes integration of Mach-O stubbing from Apple's TAPI [2].
Added:
- TextAPI library
- .tbe read support
- .tbe write (to raw_ostream) support
[1] http://lists.llvm.org/pipermail/llvm-dev/2018-September/126472.html
[2] https://github.com/ributzka/tapi
Differential Revision: https://reviews.llvm.org/D53051
llvm-svn: 348170
The clang frontend no longer emits the current working directory for
DIFiles containing an absolute path in the filename: and will move the
common prefix between current working directory and the file into the
directory: component.
https://reviews.llvm.org/D55085
llvm-svn: 348155
There are potential improvements to the structure of this API
raised by D54994, but remove some cosmetic blemishes before
making any functional changes.
llvm-svn: 348149
Summary:
SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SSBS, as it was previously only possible to
enable by selecting -march=armv8.5-a.
Similar patch upstream in GNU binutils:
https://sourceware.org/ml/binutils/2018-09/msg00274.html
Reviewers: olista01, samparker, aemerson
Reviewed By: samparker
Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits
Differential Revision: https://reviews.llvm.org/D54629
llvm-svn: 348137
Currently, variadic operands on an MCInst are assumed to be uses,
because they come after the defs. However, this is not always the case,
for example the Arm/Thumb LDM instructions write to a variable number of
registers.
This adds a property of instruction definitions which can be used to
mark variadic operands as defs. This only affects MCInst, because
MachineInstruction already tracks use/def per operand in each instance
of the instruction, so can already represent this.
This property can then be checked in MCInstrDesc, allowing us to remove
some special cases in ARMAsmParser::isITBlockTerminator.
Differential revision: https://reviews.llvm.org/D54853
llvm-svn: 348114
In the Arm assembly parser, we first match an instruction, then call
processInstruction to possibly change it to a different encoding, to
match rules in the architecture manual which can't be expressed by the
table-generated matcher.
This adds debug printing so that this process is visible when using the
-debug option.
To support this, I've added a new overload of MCInst::dump_pretty which
takes the opcode name as a StringRef, since we don't have an InstPrinter
instance in the assembly parser. Instead, we can get the same
information directly from the MCInstrInfo.
Differential revision: https://reviews.llvm.org/D54852
llvm-svn: 348113
We were duplicating code around the existing isImpliedCondition() that
checks for a predecessor block/dominating condition, so make that a
wrapper call.
llvm-svn: 348088
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.
For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.
Fixes PR37731
Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997
Differential Revision: https://reviews.llvm.org/D54585
llvm-svn: 348076
Summary:
The VirtReg2Value mapping is crucial for getting consistently
reliable divergence information into the SelectionDAG. This
patch fixes a bunch of issues that lead to incorrect divergence
info and introduces tight assertions to ensure we don't regress:
1. VirtReg2Value is generated lazily; there were some cases where
a lookup was performed before all relevant virtual registers were
created, leading to an out-of-sync mapping. Those cases were:
- Complex code to lower formal arguments that generated CopyFromReg
nodes from live-in registers (fixed by never querying the mapping
for live-in registers).
- Code that generates CopyToReg for formal arguments that are used
outside the entry basic block (fixed by never querying the
mapping for Register nodes, which don't need the divergence info
anyway).
2. For complex values that are lowered to a sequence of registers,
all registers must be reflected in the VirtReg2Value mapping.
I am not adding any new tests, since I'm not actually aware of any
bugs that these problems are causing with trunk as-is. However,
I recently added a test case (in r346423) which fails when D53283 is
applied without this change. Also, the new assertions should provide
most of the effective test coverage.
There is one test change in sdwa-peephole.ll. The underlying issue
is that since the divergence info is now correct, the DAGISel will
select V_OR_B32 directly instead of S_OR_B32. This leads to an extra
COPY which affects the behavior of MachineLICM in a way that ends up
with the S_MOV_B32 with the constant in a different basic block than
the V_OR_B32, which is presumably what defeats the peephole.
Reviewers: alex-t, arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D54340
llvm-svn: 348049
Summary:
This is patch #3 of the new DivergenceAnalysis
<https://lists.llvm.org/pipermail/llvm-dev/2018-May/123606.html>
The GPUDivergenceAnalysis is intended to eventually supersede the existing
LegacyDivergenceAnalysis. The existing LegacyDivergenceAnalysis produces
incorrect results on unstructured Control-Flow Graphs:
<https://bugs.llvm.org/show_bug.cgi?id=37185>
This patch adds the option -use-gpu-divergence-analysis to the
LegacyDivergenceAnalysis to turn it into a transparent wrapper for the
GPUDivergenceAnalysis.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: jholewinski, jvesely, jfb, llvm-commits, alex-t, sameerds, arsenm, nhaehnle
Differential Revision: https://reviews.llvm.org/D53493
llvm-svn: 348048
MSVC 2015 and newer have std::is_trivially_copyable available for use.
We should prefer that over the std::is_class to get this check be
correct.
llvm-svn: 348042
This patch adds BPF Debug Format (BTF) as a standalone
LLVM debuginfo. The BTF related sections are directly
generated from IR. The BTF debuginfo is generated
only when the compilation target is BPF.
What is BTF?
============
First, the BPF is a linux kernel virtual machine
and widely used for tracing, networking and security.
https://www.kernel.org/doc/Documentation/networking/filter.txthttps://cilium.readthedocs.io/en/v1.2/bpf/
BTF is the debug info format for BPF, introduced in the below
linux patch
69b693f0ae (diff-06fb1c8825f653d7e539058b72c83332)
in the patch set mentioned in the below lwn article.
https://lwn.net/Articles/752047/
The BTF format is specified in the above github commit.
In summary, its layout looks like
struct btf_header
type subsection (a list of types)
string subsection (a list of strings)
With such information, the kernel and the user space is able to
pretty print a particular bpf map key/value. One possible example below:
Withtout BTF:
key: [ 0x01, 0x01, 0x00, 0x00 ]
With BTF:
key: struct t { a : 1; b : 1; c : 0}
where struct is defined as
struct t { char a; char b; short c; };
How BTF is generated?
=====================
Currently, the BTF is generated through pahole.
https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=68645f7facc2eb69d0aeb2dd7d2f0cac0feb4d69
and available in pahole v1.12
https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=4a21c5c8db0fcd2a279d067ecfb731596de822d4
Basically, the bpf program needs to be compiled with -g with
dwarf sections generated. The pahole is enhanced such that
a .BTF section can be generated based on dwarf. This format
of the .BTF section matches the format expected by
the kernel, so a bpf loader can just take the .BTF section
and load it into the kernel.
8a138aed4a
The .BTF section layout is also specified in this patch:
with file include/llvm/BinaryFormat/BTF.h.
What use cases this patch tries to address?
===========================================
Currently, only the bpf instruction stream is required to
pass to the kernel. The kernel verifies it, jits it if configured
to do so, attaches it to a particular kernel attachment point,
and later executes when a particular event happens.
This patch tries to expand BTF to support two more use cases below:
(1). BPF supports subroutine calls.
During performance analysis, it would be good to
differentiate which call is hot instead of just
providing a virtual address. This would require to
pass a unique identifier for each subroutine to
the kernel, the subroutine name is a natual choice.
(2). If a particular jitted instruction is hot, we want
user to know which source line this jitted instruction
belongs to. This would require the source information
is available to various profiling tools.
Note that in a single ELF file,
. there may be multiple loadable bpf programs,
. for a particular to-be-loaded bpf instruction stream,
its instructions may come from multiple PROGBITS sections,
the bpf loader needs to merge them together to a single
consecutive insn stream before loading to the kernel.
For example:
section .text: subroutines funcFoo
section _progA: calling funcFoo
section _progB: calling funcFoo
The bpf loader could construct two loadable bpf instruction
streams and load them into the kernel:
. _progA funcFoo
. _progB funcFoo
So per ELF section function offset and instruction offset
will need to be adjusted before passing to the kernel, and
the kernel essentially expect only one code section regardless
of how many in the ELF file.
What do we propose and Why?
===========================
To support the above two use cases, we propose to
add an additional section, .BTF.ext, to the ELF file
which is the input of the bpf loader. A different section
is preferred since loader may need to manipulate it before
loading part of its data to the kernel.
The .BTF.ext section has a similar header to the .BTF section
and it contains two subsections for func_info and line_info.
. the func_info maps the func insn byte offset to a func
type in the .BTF type subsection.
. the line_info maps the insn byte offset to a line info.
. both func_info and line_info subsections are organized
by ELF PROGBITS AX sections.
pahole is not a good place to implement .BTF.ext as
pahole is mostly for structure hole information and more
importantly, we want to pass the actual code to the kernel.
. bpf program typically is small so storage overhead
should be small.
. in bpf land, it is totally possible that
an application loads the bpf program into the
kernel and then that application quits, so
holding debug info by the user space application
is not practical as you may not even know who
loads this bpf program.
. having source codes directly kept by kernel
would ease deployment since the original source
code does not need ship on every hosts and
kernel-devel package does not need to be
deployed even if kernel headers are used.
LLVM is a good place to implement.
. The only reliable time to get the source code is
during compilation time. This will result in both more
accurate information and easier deployment as
stated in the above.
. Another consideration is for JIT. The project like bcc
(https://github.com/iovisor/bcc)
use MCJIT to compile a C program into bpf insns and
load them to the kernel. The llvm generated BTF sections
will be readily available for such cases as well.
Design and implementation of emiting .BTF/.BTF.ext sections
===========================================================
The BTF debuginfo format is defined. Both .BTF and .BTF.ext
sections are generated directly from IR when both
"-target bpf" and "-g" are specified. Note that
dwarf sections are still generated as dwarf is used
by user space tools like llvm-objdump etc. for BPF target.
This patch also contains tests to verify generated
.BTF and .BTF.ext sections for all supported types, func_info
and line_info subsections. The patch is also tested
against linux kernel bpf sample tests and selftests.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D53736
llvm-svn: 347999
Summary:
This simplifies writing predicates for pattern fragments that are
automatically re-associated or commuted.
For example, a followup patch adds patterns for fragments of the form
(add (shl $x, $y), $z) to the AMDGPU backend. Such patterns are
automatically commuted to (add $z, (shl $x, $y)), which makes it basically
impossible to refer to $x, $y, and $z generically in the PredicateCode.
With this change, the PredicateCode can refer to $x, $y, and $z simply
as `Operands[i]`.
Test confirmed that there are no changes to any of the generated files
when building all (non-experimental) targets.
Change-Id: I61c00ace7eed42c1d4edc4c5351174b56b77a79c
Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand
Subscribers: wdng, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D51994
llvm-svn: 347992
Adding a new reduction pattern match for vectorizing code similar
to TSVC s3111:
for (int i = 0; i < N; i++)
if (a[i] > b)
sum += a[i];
This patch adds support for fadd, fsub and fmull, as well as multiple
branches and different (but compatible) instructions (ex. add+sub) in
different branches.
The difference from the previous patch(https://reviews.llvm.org/D49168)
is as follows:
- Added check of fast-math property of fp-instruction to the
previous patch
- Fix/add some pattern for if-reduction.ll
Differential Revision: https://reviews.llvm.org/D54464
Patch by Takahiro Miyoshi <takahiro.miyoshi@linaro.org>
and Masakazu Ueno <masakazu.ueno@linaro.org>
llvm-svn: 347989
DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend
operands when it is able to do so. For some targets this is more expensive
than a sign-extension, which is also a valid choice. Introduce the
isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger
helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy ==
MVT::i64, as it can be performed using a single instruction.
Differential Revision: https://reviews.llvm.org/D52978
llvm-svn: 347977
Utilise a similar ('late') lowering strategy to D47882. The changes to
AtomicExpandPass allow this strategy to be utilised by other targets which
implement shouldExpandAtomicCmpXchgInIR.
All cmpxchg are lowered as 'strong' currently and failure ordering is ignored.
This is conservative but correct.
Differential Revision: https://reviews.llvm.org/D48131
llvm-svn: 347914
Also revert fix r347876
One of the buildbots was reporting a failure in some relevant tests that I can't
repro or explain at present, so reverting until I can isolate.
llvm-svn: 347911
Currently CaptureTracker gives up if it encounters a value with more than 20
uses. The motivation for this cap is to keep it relatively cheap for
BasicAliasAnalysis use case, where the results can't be cached. Although, other
clients of CaptureTracker might be ok with higher cost. This patch introduces an
argument for PointerMayBeCaptured functions to specify the max number of uses to
explore. The motivation for this change is a downstream user of CaptureTracker,
but I believe upstream clients of CaptureTracker might also benefit from more
fine grained cap.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D55042
llvm-svn: 347910
Summary:
Replace `aext([asz]ext x)` with `aext/sext/zext x` in order to
reduce the number of instructions generated to clean up some
legalization artifacts.
Reviewers: aditya_nandakumar, dsanders, aemerson, bogner
Reviewed By: aemerson
Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D54174
llvm-svn: 347893
Summary:
We can sometimes end up with multiple copies of a local variable that
have the same GUID in the index. This happens when there are local
variables with the same name that are in different source files having the
same name/path at compile time (but compiled into different bitcode objects).
In this case make sure we import the copy in the caller's module.
This enables importing both of the variables having the same GUID
(but which will have different promoted names since the module paths,
and therefore the module hashes, will be distinct).
Importing the wrong copy is particularly problematic for read only
variables, since we must import them as a local copy whenever
referenced. Otherwise we get undefs at link time.
Note that the llvm-lto.cpp and ThinLTOCodeGenerator changes are needed
for testing the distributed index case via clang, which will be sent as
a separate clang-side patch shortly. We were previously not doing the
dead code/read only computation before computing imports when testing
distributed index generation (like it was for testing importing and
other ThinLTO mechanisms alone).
Reviewers: evgeny777
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D55047
llvm-svn: 347886
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.
This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.
This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).
There's an additional fix now to avoid a dmask=0
For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.
Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.
The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:
%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1
Differential revision: https://reviews.llvm.org/D48826
Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
llvm-svn: 347871
Change meaning of TargetOptions::EnableGlobalISel. The flag was
previously set only when a target switched on GlobalISel but it is now
always set when the GlobalISel pipeline is enabled. This makes the flag
consistent with TargetOptions::EnableFastISel and allows its use in
other parts of the compiler to determine when GlobalISel is enabled.
The EnableGlobalISel flag had previouly only one use in
TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value
to determine if GlobalISel was enabled by a target and returned false in
such a case. To preserve the current behaviour, a new flag
TargetOptions::GlobalISelAbort is introduced to separately record the
abort behaviour.
Differential Revision: https://reviews.llvm.org/D54518
llvm-svn: 347861
This patch adds the ability to specify via tablegen which processor resources
are load/store queue resources.
A new tablegen class named MemoryQueue can be optionally used to mark resources
that model load/store queues. Information about the load/store queue is
collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to
initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and
`StoreQueueID`. Those two fields are identifiers for buffered resources used to
describe the load queue and the store queue.
Field `BufferSize` is interpreted as the number of entries in the queue, while
the number of units is a throughput indicator (i.e. number of available pickers
for loads/stores).
At construction time, LSUnit in llvm-mca checks for the presence of extra
processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If
that information is available, and fields LoadQueueID and StoreQueueID are set
to a value different than zero (i.e. the invalid processor resource index), then
LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value
declared by the two processor resources.
With this patch, we more accurately track dynamic dispatch stalls caused by the
lack of LS tokens (i.e. load/store queue full). This is also shown by the
differences in two BdVer2 tests. Stalls that were previously classified as
generic SCHEDULER FULL stalls, are not correctly classified either as "load
queue full" or "store queue full".
About the differences in the -scheduler-stats view: those differences are
expected, because entries in the load/store queue are not released at
instruction issue stage. Instead, those are released at instruction executed
stage. This is the main reason why for the modified tests, the load/store
queues gets full before PdEx is full.
Differential Revision: https://reviews.llvm.org/D54957
llvm-svn: 347857
Moving to PlatformType from BinaryFormat had some UB fallout when handing
unknown platforms or malformed input files.
This should fix the sanitizer bots.
llvm-svn: 347836
Add the required target triples to LLVMSupport to support Hurd
in LLVM (formally `pc-hurd-gnu`).
Patch by sthibaul (Samuel Thibault)
Differential Revision: https://reviews.llvm.org/D54378
llvm-svn: 347832
Add basic infrastructure for reading and writting TBD files (version 1 - 3).
The TextAPI library is not used by anything yet (besides the unit tests). Tool
support will be added in a separate commit.
The TBD format is currently documented in the implementation file (TextStub.cpp).
https://reviews.llvm.org/D53945
Update: This contains changes to fix issues discovered by the bots:
- add parentheses to silence warnings.
- rename variables
- use PlatformType from BinaryFormat
llvm-svn: 347823
Add basic infrastructure for reading and writting TBD files (version 1 - 3).
The TextAPI library is not used by anything yet (besides the unit tests). Tool
support will be added in a separate commit.
The TBD format is currently documented in the implementation file (TextStub.cpp).
https://reviews.llvm.org/D53945
llvm-svn: 347808
Packing the flags into one bitcode word will save effort in
adding new flags in the future.
Differential Revision: https://reviews.llvm.org/D54755
llvm-svn: 347806
Currently, instructions doing memory accesses through a base operand that is
not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`.
This means that functions such as `TII::shouldClusterMemOps` will bail
out on instructions using an FI as a base instead of a register.
The goal of this patch is to refactor all this to return a base
operand instead of a base register.
Then in a separate patch, I will add FI support to the mem op clustering
in the MachineScheduler.
Differential Revision: https://reviews.llvm.org/D54846
llvm-svn: 347746
This reverts r294500. DwarfCompileUnit::addAddressExpr uses DIEExpr
for PCOffset. In that case the expression is unrelated to thread locals
and so emitting a value of the DIEExpr does not have to always mean
emit-debug-thread-local.
llvm-svn: 347744
separate files to enable future changes.
This moves ARM and AArch64 target parsing into their
own files. They are still accessible through
TargetParser.h as before.
Several functions in AArch64 which were just forwarders to ARM
have been removed. All except AArch64::getFPUName were unused,
and that was only used in a test. Which itself was overlapping
one in ARM, so it has also been removed.
Differential revision: https://reviews.llvm.org/D53980
llvm-svn: 347741
Summary:
This speeds up linking clang.exe/pdb with /DEBUG:GHASH by 31%, from
12.9s to 9.8s.
Symbol records are typically small (16.7 bytes on average), but we
processed them one at a time. CVSymbol is a relatively "large" type. It
wraps an ArrayRef<uint8_t> with a kind an optional 32-bit hash, which we
don't need. Before this change, each DbiModuleDescriptorBuilder would
maintain an array of CVSymbols, and would write them individually with a
BinaryItemStream.
With this change, we now add symbols that happen to appear contiguously
in bulk. For each .debug$S section (roughly one per function), we
allocate two copies, one for relocation, and one for realignment
purposes. For runs of symbols that go in the module stream, which is
most symbols, we now add them as a single ArrayRef<uint8_t>, so the
vector DbiModuleDescriptorBuilder is roughly linear in the number of
.debug$S sections (O(# funcs)) instead of the number of symbol records
(very large).
Some stats on symbol sizes for the curious:
PDB size: 507M
sym bytes: 316,508,016
sym count: 18,954,971
sym byte avg: 16.7
As future work, we may be able to skip copying symbol records in the
linker for realignment purposes if we make LLVM write them aligned into
the object file. We need to double check that such symbol records are
still compatible with link.exe, but if so, it's definitely worth doing,
since my profile shows we spend 500ms in memcpy in the symbol merging
code. We could potentially cut that in half by saving a copy.
Alternatively, we could apply the relocations *after* we iterate the
symbols. This would require some careful re-engineering of the
relocation processing code, though.
Reviewers: zturner, aganea, ruiu
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D54554
llvm-svn: 347687
This is skylake-avx512 with the addition of avx512vnni ISA.
Patch by Jianping Chen
Differential Revision: https://reviews.llvm.org/D54785
llvm-svn: 347681
This makes Doxygen correctly associate the doc comment with the current
file rather than adding to the documentation for namespace llvm.
llvm-svn: 347679
Summary:
This (very specialized) function was added to enable an LLDB use case.
Now that a more generic interface (overriding of parser functions -
D52992) is available, and LLDB has been converted to use that (D54074),
the function is unused and can be removed.
Reviewers: erik.pilkington, sgraenitz, rsmith
Subscribers: mgorny, hiraditya, christof, libcxx-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D54893
llvm-svn: 347670
Summary:
Analysis produces StackSafetyInfo which contains information with how allocas
and parameters were used in functions.
From prototype by Evgenii Stepanov and Vlad Tsyrklevich.
Reviewers: eugenis, vlad.tsyrklevich, pcc, glider
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D54504
llvm-svn: 347603
Summary:
The old legacy LTO API had a separate cache key computation, which was
a subset of the cache key computation in the new LTO API (from what I
can tell this is largely just because certain features such as CFI,
dsoLocal, etc are only utilized via the new LTO API). However, having
separate computations is unnecessary (much of the code is duplicated),
and can lead to bugs when adding new optimizations if both cache
computation algorithms aren't updated properly - it's much easier to
maintain if we have a single facility.
This patch refactors the old LTO API code to use the cache key
computation from the new LTO API. To do this, we set up an lto::Config
object and fill in the fields that the old LTO was hashing (the others
will just use the defaults).
There are two notable changes:
- I added a Freestanding flag to the LTO Config. Currently this is only
used by the legacy LTO API. In the patch that added it (D30791) I had
asked about adding it to the new LTO API, but it looks like that was not
addressed. This should probably be discussed as a follow up to this
change, as it is orthogonal.
- The legacy LTO API had some code that was hashing the GUID of all
preserved symbols defined in the module. I looked back at the history of
this (which was added with the original hashing in the legacy LTO API in
D18494), and there is a comment in the review thread that it was added
in preparation for future internalization. We now do the internalization
of course, and that is handled in the new LTO API cache key computation
by hashing the recorded linkage type of all defined globals. Therefore I
didn't try to move over and keep the preserved symbols handling.
Reviewers: steven_wu, pcc
Subscribers: mehdi_amini, inglorion, eraman, dexonsmith, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D54635
llvm-svn: 347592
Summary:
Add a hook to the GCMetadataPrinter for emitting stack maps in
custom format. The hook will be called at stack map generation
time. The default stack map format is used if there is no hook.
For this to be useful a few data structures and accessors are
exposed from the StackMaps class, so the custom printer can
access the stack map data.
This patch authored by Cherry Zhang <cherryyz@google.com>.
Reviewers: thanm, apilipenko, reames
Reviewed By: reames
Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D53892
llvm-svn: 347584
This reverts commit r346970.
It was causing PR39774, a crash in slp-vectorizer on a rather simple loop
with just a bunch of 'and's in the body.
llvm-svn: 347541
Summary:
getLastAccessedTime() and getLastModificationTime() provided times in nanoseconds but with only 1 second resolution, even when the underlying file system could provide more precise times than that.
These changes add sub-second precision for unix platforms that support improved precision.
Also add some comments to make sure people are aware that the resolution of times can vary across different file systems.
Reviewers: labath, zturner, aaron.ballman, kristina
Reviewed By: aaron.ballman, kristina
Subscribers: lebedev.ri, mgorny, kristina, llvm-commits
Differential Revision: https://reviews.llvm.org/D54826
llvm-svn: 347530
This should likely be adjusted to limit this transform
further, but these diffs should be clear wins.
If we have blendv/conditional move, then we should assume
those are cheap ops. The loads become independent of the
compare, so those can be speculated before we need to use
the values in the blend/mov.
llvm-svn: 347526
rL347502 moved the null sibling, so we should group all of these
together. I'm not sure why these aren't methods of the SDValue
class itself, but that's another patch if that's possible.
llvm-svn: 347523
This changeset is modeled after Intel's submission for SVML. It enables
trigonometry functions vectorization via SLEEF: http://sleef.org/.
* A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF.
* A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF.
* A comprehensive test case is included in this changeset.
* In a separate changeset (for clang), a new vectorization library argument is
added to -fveclib: -fveclib=SLEEF.
Trigonometry functions that are vectorized by sleef:
acos
asin
atan
atanh
cos
cosh
exp
exp2
exp10
lgamma
log10
log2
log
sin
sinh
sqrt
tan
tanh
tgamma
Patch by Stefan Teleman
Differential Revision: https://reviews.llvm.org/D53927
llvm-svn: 347510
`llvm-mca` relies on the predicates to be based on `MCSchedPredicate` in order
to resolve the scheduling for variant instructions. Otherwise, it aborts
the building of the instruction model early.
However, the scheduling model emitter in `TableGen` gives up too soon, unless
all processors use only such predicates.
In order to allow more processors to be used with `llvm-mca`, this patch
emits scheduling transitions if any processor uses these predicates. The
transition emitted for the processors using legacy predicates is the one
specified with `NoSchedPred`, which is based on `MCSchedPredicate`.
Preferably, `llvm-mca` should instead assume a reasonable default when a
variant transition is not based on `MCSchedPredicate` for a given processor.
This issue should be revisited in the future.
Differential revision: https://reviews.llvm.org/D54648
llvm-svn: 347504
...and use them to avoid creating obviously undef values as
discussed in the post-commit thread for r347478.
The diffs in vector div/rem show that we were missing real
optimizations by creating bogus shift nodes.
llvm-svn: 347502