Since 1725f28841, this should check
isFMADLegalForFAddFSub rather than the the plain isOperationLegal.
This would assert in a subset of cases due to an oddity in how FMAD is
selected. We will allow FMA formation pre-legalize, but not FMAD even
in cases where it would be valid.
The current hook requires passing in the root fadd/fsub. However, in
this distributed case, this would be far more complicated to pass in
the relevant operand. AMDGPU doesn't get any value from the node, and
only needs the type and is the only implementor, so I'm not sure why
we have this complexity. Just rename and expand the assert to avoid
the more complicated checks spread through the distribution logic.
This patch intends to fix incomplete relocation printing for
XCOFF (potentially for other targets).
Differential Revision: https://reviews.llvm.org/D77580
Summary:
Fold (shift (shift X, C2), C1) -> (shift X, (C1 + C2)) for logical as
well as arithmetic shifts. This is needed to prevent regressions from
an upcoming funnel shift expansion change.
While we're here, fold (VSRAI -1, C) -> -1 too.
Reviewers: RKSimon, craig.topper
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77300
Improve the chances of folding the writemask into the combined shuffle by scaling a wider shuffle mask to match the root's original type.
This creates a few minor issues with variable shuffles, preventing combines of shuffles because of the more limited support binary shuffle types. In most cases we're probably better off combining the shuffles and losing the writemask fold, but this isn't always going to be true.
Now that's D77928 landed we need to try harder to match shuffle and mask widths. This is a couple of tests showing where variable shuffle masks have been widened preventing them from folding with the mask.
Otherwise, depending on the lit location used to run the test, llvm-mc adds an
include_directories entry in the dwarf output, which breaks tests in some setup.
Differential Revision: https://reviews.llvm.org/D77876
* Reorganize tests and add coverage
* Improve diagnostic testing
* Make assert() tests more relevant
* Rename tests to macro-* or altmacro-*
This is not NFC because a (previously untested) diagnostic message is changed.
Sometimes LegalizeTypes knows about common subexpressions before SelectionDAG
does, leading to accidental SDValue removal before its reference count was
truly zero.
Differential Revision: https://reviews.llvm.org/D76994
Reviewed-By: bjope
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45049
Reverted in 3ce77142a6 because the previous patch
broke the expensive-checks bots. The new patch removes the broken check.
As discussed in PR36617:
https://bugs.llvm.org/show_bug.cgi?id=36617#c13
...we can avoid the likely slow round-trip from XMM to GPR to XMM
by using the vector versions of the convert instructions.
Based on experimental results from recent Intel/AMD chips, we don't
need to worry about triggering denorm stalls while operating on
garbage data in the high lanes with convert instructions, so this is
expected to always be as good or better perf than the scalar
instruction equivalent. FP exceptions are also not a concern because
strict code should not be using the regular SDAG opcodes.
Differential Revision: https://reviews.llvm.org/D77895
A lot of vectorized code doesn't use masks so we shouldn't penalize them by not doing shuffle combining on avx512 targets.
I've added support for VALIGNQ/VALIGND and 512-bit SHUF128 to prevent some regressions. I also prevented recombining 256-bit SHUF128 to PERM2X128. We may not need to add 256-bit SHUF128 support, but I don't think I found any cases requiring that in my testing.
Differential Revision: https://reviews.llvm.org/D77928
Summary: A count profile may affect tail duplication's heuristic causing a block to be duplicated in only a part of its predecessors. This is not allowed in the Machine Block Placement pass where an assert will go off. I'm removing the assert and making the optimization bail out when such case happens.
Reviewers: wenlei, davidxl, Carrot
Reviewed By: Carrot
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77748
Selection would fail after the post legalize combiner put an illegal
zextload back together.
The base combiner has parameter to only allow legal operations, but
they appear to not be used. I also don't see a nice way to remove a
single entry from all_combines, so just hack around this.
Summary: We allow non-relaxable instructions emitted into relaxable Fragment when we prefix padding branch. So we need to check if the instruction need relaxation before relaxing it. Without this patch, it currently triggers a `report_fatal_error` in `llvm::MCAsmBackend::relaxInstruction` when we prefix padding branch along with `--mc-relax-all`.
Reviewers: LuoYuanke, reames, MaskRay
Reviewed By: MaskRay
Subscribers: MaskRay, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77851
There was another issue introduced by this commit that the OP
initially missed. Namely, for functions that are free to use
R2 as a callee-saved register, we emit a TOC expression based
on the address of the GEP label without emitting the GEP label.
Since we only emit such expressions for the large code model, this
issue only surfaced there.
I have confirmed that with this fix, the kernel build is successful
with target "all".
This reverts commit 60c642e74b.
This patch is making the TLI "closed" for a predefined set of VecLib
while at the moment it is extensible for anyone to customize when using
LLVM as a library.
Reverting while we figure out a way to re-land it without losing the
generality of the current API.
Differential Revision: https://reviews.llvm.org/D77925
This probably isn't ideal - the error was being printed specifically
inline with the dumping that was more legible - but then the error
wasn't reported to stderr and didn't produce a non-zero exit code.
Probably the error message could be improved by adding more context now
that it isn't printed in-situ of the DIE dumping as much.
Revision a1c05fe <https://reviews.llvm.org/rGa1c05fe20f3def1f1be9f50d2adefc6b6f1578ad>
removed bitcast from the list of problematic transformations, however:
%97 = fptrunc ppc_fp128 %2 to double // we need to check ppc_fp128 here to prevent the transformation
%98 = bitcast double %97 to i64 // a1c05fe checks ppc_fp128 at here
%99 = icmp slt i64 %98, 0
%100 = zext i1 %99 to i8
store i8 %100, i8* %7, align 1
so this patch does that. I'm also disabling it in the presence of extend just in case.
I verified separately that the hash of -std::infinity and std::infinity don't match now.
Differential Revision: https://reviews.llvm.org/D77911
Summary:
Don't attempt to analyze the decomposed GEP for scalable type.
GEP index scale is not compile-time constant for scalable type.
Be conservative, return MayAlias.
Explicitly call TypeSize::getFixedSize() to assert on places where
scalable type doesn't make sense.
Add unit tests to check functionality of -basicaa for scalable type.
This patch is needed for D76944.
Reviewers: sdesmalen, efriedma, spatel, bjope, ctetreau
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77828
Summary:
This allows us to test each backend pass under the presence
of debug info using pre-existing tests. The tests should not
fail as a result of this so long as it's true that debug info
does not affect CodeGen.
In practice, a few tests are sensitive to this:
* Tests that check the pass structure (e.g. O0-pipeline.ll)
* Tests that check --debug output. Specifically instruction
dumps containing MMO's (e.g. prelegalizercombiner-extends.ll)
* Tests that contain debugify metadata as mir-strip-debug will
remove it (e.g. fastisel-debugvalue-undef.ll)
* Tests with partial debug info (e.g.
patchable-function-entry-empty.mir had debug info but no
!llvm.dbg.cu)
* Tests that check optimization remarks overly strictly (e.g.
prologue-epilogue-remarks.mir)
* Tests that would inject the pass in an unsafe region (e.g.
seqpairspill.mir would inject between register alloc and
virt reg rewriter)
In all cases, the checks can either be updated or
--debugify-and-strip-all-safe=0 can be used to avoid being
affected by something like llvm-lit -Dllc='llc --debugify-and-strip-all-safe'
I tested this without the lost debug locations verifier to
confirm that AArch64 behaviour is unaffected (with the fixes
in this patch) and with it to confirm it finds the problems
without the additional RUN lines we had before.
Depends on D77886, D77887, D77747
Reviewers: aprantl, vsk, bogner
Subscribers: qcolombet, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77888
Summary:
A few tests start out with debug info and expect it to reach
the output. For these tests we shouldn't strip the debug info
Reviewers: aprantl, vsk, bogner
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77886
These will be widened in the DAG. In the meanwhile early
widening prevents otherwise possible vectorization of
such loads.
Differential Revision: https://reviews.llvm.org/D77835
Summary:
This patch establishes memory layout and adds instrumentation. It does
not add runtime support and does not enable MSan, which will be done
separately.
Memory layout is based on PPC64, with the exception that XorMask
is not used - low and high memory addresses are chosen in a way that
applying AndMask to low and high memory produces non-overlapping
results.
VarArgHelper is based on AMD64. It might be tempting to share some
code between the two implementations, but we need to keep in mind that
all the ABI similarities are coincidental, and therefore any such
sharing might backfire.
copyRegSaveArea() indiscriminately copies the entire register save area
shadow, however, fragments thereof not filled by the corresponding
visitCallSite() invocation contain irrelevant data. Whether or not this
can lead to practical problems is unclear, hence a simple TODO comment.
Note that the behavior of the related copyOverflowArea() is correct: it
copies only the vararg-related fragment of the overflow area shadow.
VarArgHelper test is based on the AArch64 one.
s390x ABI requires that arguments are zero-extended to 64 bits. This is
particularly important for __msan_maybe_warning_*() and
__msan_maybe_store_origin_*() shadow and origin arguments, since non
zeroed upper parts thereof confuse these functions. Therefore, add ZExt
attribute to the corresponding parameters.
Add ZExt attribute checks to msan-basic.ll. Since with
-msan-instrumentation-with-call-threshold=0 instrumentation looks quite
different, introduce the new CHECK-CALLS check prefix.
Reviewers: eugenis, vitalybuka, uweigand, jonpa
Reviewed By: eugenis
Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits, stefansf, Andreas-Krebbel
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76624
If we're inserting into v2i8/v4i8/v8i8/v2i16/v4i16 style sub-128bit vectors ensure we don't use the SK_PermuteTwoSrc cost of the legalized value type - this is a followup to rG12c629ec6c59 which added equivalent sub-128bit shuffle costs
For non-integer constants/expressions and overdefined, I think we can
just use SimplifyBinOp to do common folds. By just passing a context
with the DL, SimplifyBinOp should not try to get additional information
from looking at definitions.
For overdefined values, it should be enough to just pass the original
operand.
Note: The comment before the `if (isconstant(V1State)...` was wrong
originally: isConstant() also matches integer ranges with a single
element. It is correct now.
Reviewers: efriedma, davide, mssimpso, aartbik
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D76459
Summary:
The patch D63957 is to avoid empty string when scrubbing loop comments,
it will replace loop comments to a `#`, that's correct.
But if the line has something else not only loop comments, we will get
a extra `#`.
The patch is to remove the extra `#`.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D77357
In LowerFP_TO_INTForReuse, when emitting `stfiwx`, alignment of 4 is
set for the `MachineMemOperand`, but RLI(ReuseLoadInfo)'s alignment is
not updated for following loads.
It's related to failed alignment check reported in
https://bugs.llvm.org/show_bug.cgi?id=45297
Differential Revision: https://reviews.llvm.org/D77624
Makes it easier to test "this doesn't produce an error" (& indeed makes
that the implied default so we don't accidentally write tests that have
silent/sneaky errors as well as the positive behavior we're testing for)
Though the support for applying relocations is patchy enough that a
bunch of tests treat lack of relocation application as more of a warning
than an error - so rather than me trying to figure out how to add
support for a bunch of relocation types, let's degrade that to a warning
to match the usage (& indeed, it's sort of more of a tool warning anyway
- it's not that the DWARF is wrong, just that the tool can't fully cope
with it - and it's not like the tool won't dump the DWARF, it just won't
follow/render certain relocations - I guess in the most general case it
might try to render an unrelocated value & instead render something
bogus... but mostly seems to be about interesting relocations used in
eh_frame (& honestly it might be nice if we were lazier about doing this
relocation resolution anyway - if you're not dumping eh_frame, should we
really be erroring about the relocations in it?))
The transformation currently does not differentiate between explicit
and implicit kills. However, it is not valid to later simply clear
an implicit kill flag since the kill could be due to a call or return.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45374
The change introduces the usage of physical registers for non-gc deopt values.
This require runtime support to know how to take a value from register.
By default usage is off and can be switched on by option.
The change also introduces additional fix-up patch which forces the spilling
of caller saved registers (clobbered after the call) and re-writes statepoint
to use spill slots instead of caller saved registers.
Reviewers: reames, danstrushin
Reviewed By: dantrushin
Subscribers: mgorny, hiraditya, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D77797
Loop simplify form should always be checked because logic of
propagateStoredValueToLoadUsers relies on it (in particular, it
requires preheader).
Reviewed By: Fedor Sergeev, Florian Hahn
Differential Revision: https://reviews.llvm.org/D77775
Summary:
D74269 added debug info to newly created instructions, including calls
to `malloc` and `free`, by taking debug info from existing instructions
around, whose debug info may or may not be empty.
But there are cases debug info is required by the IR verifier: when both
the caller and the callee functions have DISubprograms, meaning we
already have declarations to `malloc` or `free` with a DISubprogram
attached, newly created calls to `malloc` and `free` should have
non-empty debug info. This patch creates a non-empty dummy debug info in
this case to those calls to make the IR verifier pass.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45461.
Reviewers: dschuff
Subscribers: aprantl, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77784
Summary:
Encode `-fveclib` setting as per-function attribute so it can threaded through to LTO backends. Accordingly per-function TLI now reads
the attributes and select available vector function list based on that. Now we also populate function list for all supported vector
libraries for the shared per-module `TargetLibraryInfoImpl`, so each function can select its available vector list independently but without
duplicating the vector function lists. Inlining between incompatbile vectlib attributed is also prohibited now.
Subscribers: hiraditya, dexonsmith, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77632
This is similar to what I recently did for getArithmeticReductionCost.
I'm trying to account for the narrowing from 512->256->128 as we go.
I've also added a new helper method getMinMaxCost that tries to
handle the cases where we have native min/max instructions and
fall back to cmp+select when we don't.
Differential Revision: https://reviews.llvm.org/D76634
When we try to select a SELECT_CC on Power9, we check if it can be matched to a
SETB instruction. In that function, we assert that the output type is i32/i64.
This is unnecessary as it is perfectly reasonable to have an i1 SELECT_CC.
Change that from an assert to an early exit condition.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45448
to store to a stack location for outgoing args.
During call arg lowering we shouldn't be modifying SP so cache the SP copy
vreg for subsequent uses.
Gives a 0.2% geomean code size improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D77838
Summary:
Removes:
* All LLVM-IR level debug info using StripDebugInfo()
* All debugify metadata
* 'Debug Info Version' module flag
* All (valid*) DEBUG_VALUE MachineInstrs
* All DebugLocs from MachineInstrs
This is a more complete solution than the previous MIRPrinter
option that just causes it to neglect to print debug-locations.
* The qualifier 'valid' is used here because AArch64 emits
an invalid one and tests depend on it
Reviewers: vsk, aprantl, bogner
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77747
v2i8/v4i8/v8i8 + v2i16/v4i16 all show up in vectorizer code and by just using the legalized types (v16i8/v8i16) we're highly exaggerating the actual cost of the shuffle.
This adds the instruction encoding and mnenomics for the proposed
RISC-V Bit Manipulation extension (version 0.92). It is implemented with
each category of instruction as its own target feature, with the 'b'
extension feature enabling all options. Since this extension is not yet
ratified, all target features are prefixed with 'experimental-' to note
their status.
Differential Revision: https://reviews.llvm.org/D65649
Summary:
This patch adds support for handling of variadic functions for AIX.
This includes ensuring that use and consume correct type of
va_list (char *va_list) for AIX.
Authored by: ZarkoCA
Reviewers: cebowleratibm, sfertile, jasonliu
Reviewed by: jasonliu
Differential Revision: https://reviews.llvm.org/D76130
Add initial support for PC Relative addressing for constant pool loads.
This includes adding a new relocation for @pcrel and adding a new PowerPC flag
to identify PC relative addressing.
Differential Revision: https://reviews.llvm.org/D74486
Summary:
VE has special operands to represent 0b000...000111...111 (`(m)0`) and
0b111...111000...000 (`(m)1`) bit sequences. This patch supports those
operands not only in machine instructions but also in DAG lowering.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D77769
Introduce a new VPWidenCanonicalIVRecipe to generate a canonical vector
induction for use in fold-tail-with-masking, if a primary induction is absent.
The canonical scalar IV having start = 0 and step = VF*UF, created during code
-gen to control the vector loop, is widened into a canonical vector IV having
start = {<Part*VF, Part*VF+1, ..., Part*VF+VF-1> for 0 <= Part < UF} and
step = <VF*UF, VF*UF, ..., VF*UF>.
Differential Revision: https://reviews.llvm.org/D77635
Use utils/update_llc_test_checks.py to add full CHECK directives to the
test for cxx_fast_tls calling convention. The calling convention is
arguably dead on PowerPC since dropping Darwin subtarget support in the PowerPC
backend. This test change helps show the atrocious code generation for
this lit test which was hidden by having few CHECK directives.
This implements the instruction analysis required to print branch
targets as part of llvm-objdump's disassembly.
Note, this only handles those branches which can be analyzed in a single
instruction, a future patch will handle multiple-instruction patterns,
such as AUIPC/LUI+JALR instruction pairs.
Differential Revision: https://reviews.llvm.org/D77567
The change introduces the usage of physical registers for non-gc deopt values.
This require runtime support to know how to take a value from register.
By default usage is off and can be switched on by option.
The change also introduces additional fix-up patch which forces the spilling
of caller saved registers (clobbered after the call) and re-writes statepoint
to use spill slots instead of caller saved registers.
Reviewers: reames, dantrushin
Reviewed By: reames, dantrushin
Subscribers: mgorny, hiraditya, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D77371
Summary:
New SanitizerCoverage feature `inline-bool-flag` which inserts an
atomic store of `1` to a boolean (which is an 8bit integer in
practice) flag on every instrumented edge.
Implementation-wise it's very similar to `inline-8bit-counters`
features. So, much of wiring and test just follows the same pattern.
Reviewers: kcc, vitalybuka
Reviewed By: vitalybuka
Subscribers: llvm-commits, hiraditya, jfb, cfe-commits, #sanitizers
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D77244
When the Attributor was created the test update scripts were not well
suited to deal with the challenges of IR attribute checking. This
partially improved.
Since then we also added three additional configurations that need
testing; in total we now have the following four:
{ TUNIT, CGSCC } x { old pass manager (OPM), new pass manager (NPM) }
Finally, the number of developers and tests grew rapidly (partially due
to the addition of ArgumentPromotion and IPConstantProp tests), which
resulted in tests only being run in some configurations, different
prefixes being used, and different "styles" of checks being used.
Due to the above reasons I believed we needed to take another look at
the test update scripts. While we started to use them, via UTC_ARGS:
--enable/disable, the other problems remained. To improve the testing
situation for *all* configurations, to simplify future updates to the
test, and to help identify subtle effects of future changes, we now use
the test update scripts for (almost) all Attributor tests.
An exhaustive prefix list minimizes the number of check lines and makes
it easy to identify and compare configurations.
Tests have been adjusted in the process but we tried to keep their
intend unchanged.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D76588
This patch introduces the heat coloring of the Control Flow Graph which is based
on the relative "hotness" of each BB. The patch is a part of sequence of three
patches, related to graphs Heat Coloring.
Reviewers: rcorcs, apilipenko, davidxl, sfertile, fedor.sergeev, eraman, bollu
Differential Revision: https://reviews.llvm.org/D77161
Summary:
Preserve call site info for duplicated instructions. We copy over the
call site info in CloneMachineInstrBundle to avoid repeated calls to
copyCallSiteInfo in CloneMachineInstr.
(Alternatively, we could copy call site info higher up the stack, e.g.
into TargetInstrInfo::duplicate, or even into individual backend passes.
However, I don't see how that would be safer or more general than the
current approach.)
Reviewers: aprantl, djtodoro, dstenb
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77685
Any or all the argument registers can be used to pass a byval formal
argument, with the limitation that the argument must fit in the
available registers (ie: is not split between registers and stack).
Differential Revision: https://reviews.llvm.org/D76902
On PowerPC most functions require a valid TOC pointer.
This is the case because either the function itself needs to use this
pointer to access the TOC or because other functions that are called
from that function expect a valid TOC pointer in the register R2.
The main exception to this is leaf functions that do not access the TOC
since they are guaranteed not to need a valid TOC pointer.
This patch introduces a feature that will allow more functions to not
require a valid TOC pointer in R2.
Differential Revision: https://reviews.llvm.org/D73664
Based on the post-commit comments for rG0f56bbc, there might
be a problem with this transform:
(bitcast (fpext/fptrunc X)) to iX) < 0 --> (bitcast X to iY) < 0
...and the ppc_fp128 data type, so conservatively bypass if we
are bitcasting a ppc_fp128.
We might be able to account for endian or other differences to
enable this for PowerPC again if that is useful.
Differential Revision: https://reviews.llvm.org/D77642
The R_AARCH64_PLT32 relocation type will be documented in the next release
of ELF for the 64-bit Arm Architecture. It is being added in draft state
for the benefit of the position independent vtable feature.
R_AARCH64_PLT32 is very similar to R_AARCH64_PREL32. The intention is to
provide a signed 32-bit integer representing an offset from the place
to a function.
- It relocates 32-bit data
- The expression is S + A - P
- The overflow check for the expression is -2^31 <= X < 2^31
- The relocation generates Thunks/Veneers/Stubs and PLT entries as per
R_AArch64_CALL26
- If the symbol S is an undefined weak the ABI does not define its value.
The ABI defines a code for ilp32 for completeness, I have added the code
but have only added to the existing reloc-types-elf-aarch64.text as there
is no ilp32 equivalent.
Differential Revision: https://reviews.llvm.org/D77647
Summary:
Since D75300 has been landed, I want to support enhanced relaxation when we need to align branches and allow prefix padding. "Enhanced Relaxtion" means we allow an instruction that could not be traditionally relaxed to be emitted into RelaxableFragment so that we increase its length by adding prefixes for optimization.
The motivation is straightforward, RelaxFragment is mostly for relative jumps and we can not increase the length of jumps when we need to align them, so if we need to achieve D75300's purpose (reducing the bytes of nops) when need to align jumps, we have to make more instructions "relaxable".
Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight
Reviewed By: reames
Subscribers: hiraditya, llvm-commits, annita.zhang
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76286
Summary:
This fixes PR45302.
Previously the case
BB1
/ \
| |
TBB FBB
| |
\ /
BB2
was treated as a valid diamond also when TBB and FBB was the same basic
block. This then lead to a failed assertion in IfConvertDiamond.
Since TBB == FBB is quite a degenerated case of a diamond, we now
don't treat it as a valid diamond anymore, and thus we will avoid the
trouble of making IfConvertDiamond handle it correctly.
Reviewers: efriedma, kparzysz
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D77651
This patch adds an analysis of the offset addresses used by gathers
and scatters to the MVEGatherScatterLowering pass to find
multiplications and additions that are loop invariant and thus can
be moved into the loop preheader, avoiding to execute them each time.
Differential Revision: https://reviews.llvm.org/D76681
Summary:
Legalization can introduce the trunc(trunc) pattern. This can cause
problems if one of these intermediate truncs is not legal.
Combine truncs of this pattern, if the resulting trunc is legal.
Reviewers: arsenm, aemerson, dsanders
Reviewed By: arsenm
Subscribers: jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76601
When two sections shared the same address, the disassembly code was
using pointer values when sorting (see the SectionRef less than
operator). Since those values aren't guaranteed to have a specific
order, this meant the disassembly code would sometimes change which
section to pick when finding symbols targeted by calls in fully linked
objects.
This change fixes the non-determinism, so that the same section is
always picked. This might have a negative impact in that now a section
without any symbol might be picked over a section with symbols, but this
will be addressed in a later commit.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45411.
Reviewed by: grimar, MaskRay
Differential Revision: https://reviews.llvm.org/D77640
Summary:
Combine sext(zext x) to (zext x) since the sign-bit is 0
after the zero-extension.
Combine sext(sext x) to (sext x) and ext(zext x) to (zext x)
since the intermediate step is not needed.
Reviewers: arsenm, volkan, aemerson, aditya_nandakumar
Reviewed By: arsenm
Subscribers: jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77210
Summary:
When narrowing G_IMPLICIT_DEF where the original size is not a multiple
of the narrow size, emit a smaller G_IMPLICIT_DEF and use G_ANYEXT.
To prevent a potential endless loop in the legalizer, the condition
to combine G_ANYEXT(G_IMPLICIT_DEF) is changed from isInstUnsupported
to !isInstLegal, since in this case the combine is only valid if
consequent legalization of the newly combined G_IMPLICIT_DEF does not
introduce G_ANYEXT due to narrowing.
Although this legalization for G_IMPLICIT_DEF would also be valid for
the general case, it actually caused a lot of code regressions when
tried due to superfluous COPYs and combines not getting hit anymore.
Reviewers: dsanders, aemerson, volkan, arsenm, aditya_nandakumar
Reviewed By: arsenm
Subscribers: jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76598
In DWARFv5, type units are stored in .debug_info sections, along with
compilation units, and they are distinguished by the unit_type field
in the header, not by the name of the section. It is impossible to
associate the correct index section of a DWP file with the unit before
the unit's header is read. This patch fixes reading DWARFv5 type units
by parsing the header first and then applying the index entry according
to the actual unit type.
Differential Revision: https://reviews.llvm.org/D77552
Summary:
Re-used the IR-level debugify for the most part. The MIR-level code then
adds locations to the MachineInstrs afterwards based on the LLVM-IR debug
info.
It's worth mentioning that the resulting locations make little sense as
the range of line numbers used in a Function at the MIR level exceeds that
of the equivelent IR level function. As such, MachineInstrs can appear to
originate from outside the subprogram scope (and from other subprogram
scopes). However, it doesn't seem worth worrying about as the source is
imaginary anyway.
There's a few high level goals this pass works towards:
* We should be able to debugify our .ll/.mir in the lit tests without
changing the checks and still pass them. I.e. Debug info should not change
codegen. Combining this with a strip-debug pass should enable this. The
main issue I ran into without the strip-debug pass was instructions with MMO's and
checks on both the instruction and the MMO as the debug-location is
between them. I currently have a simple hack in the MIRPrinter to
resolve that but the more general solution is a proper strip-debug pass.
* We should be able to test that GlobalISel does not lose debug info. I
recently found that the legalizer can be unexpectedly lossy in seemingly
simple cases (e.g. expanding one instr into many). I have a verifier
(will be posted separately) that can be integrated with passes that use
the observer interface and will catch location loss (it does not verify
correctness, just that there's zero lossage). It is a little conservative
as the line-0 locations that arise from conflicts do not track the
conflicting locations but it can still catch a fair bit.
Depends on D77439, D77438
Reviewers: aprantl, bogner, vsk
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77446
dso_local leads to direct access even if the definition is not within this compilation unit (it is
still in the same linkage unit). On ELF, such a relocation (e.g. R_X86_64_PC32) referencing a
STB_GLOBAL STV_DEFAULT object can cause a linker error in a -shared link.
If the linkage is changed to available_externally, the dso_local flag should be dropped, so that no
direct access will be generated.
The current behavior is benign, because -fpic does not assume dso_local
(clang/lib/CodeGen/CodeGenModule.cpp:shouldAssumeDSOLocal).
If we do that for -fno-semantic-interposition (D73865), there will be an
R_X86_64_PC32 linker error without this patch.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D74751
Fix the error of show-prof-info.test on some platforms without zlib.
The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.
Differential Revision: https://reviews.llvm.org/D77426
The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.
Differential Revision: https://reviews.llvm.org/D77426
There are a few patterns where we use a superclass for inputs to this
instruction rather than the correct class. This can sometimes lead to
unncessary copies.
This patch extends existing constant folding in logical operations to
handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and
V_AND_OR_B32. Also added a couple of tests for existing folds.
7aecf232 fixed the bug where we would miscompile, but we still generate
a crazy amount of code. Turn off the expansion until someone implements
an appropriate heuristic.
Differential Revision: https://reviews.llvm.org/D77599
We can only collapse adjacent SI_END_CF if outer statement
belongs to a simple SI_IF, otherwise correct mask is not in the
register we expect, but is an argument of an S_XOR instruction.
Even if SI_IF is simple it might be lowered using S_XOR because
lowering is dependent on a basic block layout. It is not
considered simple if instruction consuming its output is
not an SI_END_CF. Since that SI_END_CF might have already been
lowered to an S_OR isSimpleIf() check may return false.
This situation is an opportunity for a further optimization
of SI_IF lowering, but that is a separate optimization. In the
meanwhile move SI_END_CF post the lowering when we already know
how the rest of the CFG was lowered since a non-simple SI_IF
case still needs to be handled.
Differential Revision: https://reviews.llvm.org/D77610
Summary:
- Remove the no longer used Darwin CalleeSavedRegs
- Combine the SVR464 callee saved regs and AIX64 since the two are (and should be) identical into PPC64
- Update tests for 64-bit CSR change
Reviewers: sfertile, ZarkoCA, cebowleratibm, jasonliu, #powerpc
Reviewed By: sfertile
Subscribers: wuzish, nemanjai, hiraditya, kbarton, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77235
Shuffle combining can insert zero byte sized elements into the shuffle mask, which combineX86ShufflesConstants will attempt to fold without taking into account whether the byte-sized type is legal (e.g. AVX512F only targets).
If we have a full-zeroable vector then we should just return a zero version of the root type, otherwise if the type isn't valid we should bail.
Fixes PR45443
Currently we have no dedicated warnings, but we return error message instead of a result.
It is generally not consistent with another warnings we have.
This change was suggested and discussed here:
https://reviews.llvm.org/D77216#1954873
This change refines error messages we report and also I had to update the API
to implement it.
Differential revision: https://reviews.llvm.org/D77399
If the stack pointer is altered for local variables and we are generating
Thumb2 execute-only code the .pad directive is missing.
Usually the size of the adjustment is stored in a PC-relative location
and loaded into a register which is then added to the stack pointer.
However when we are generating execute-only code code the size of the
adjustment is instead generated using the MOVW/MOVT instruction pair.
As a by product of handling the execute-only case this also fixes an
existing issue that in the none execute-only case the .pad directive was
generated against the load of the constant to a register instruction,
instead of the instruction which adds the register to the stack pointer.
Differential Revision: https://reviews.llvm.org/D76849
This patch updates the code that deals with conditions from predicate
info to make use of constant ranges.
For ssa_copy instructions inserted by PredicateInfo, we have 2 ranges:
1. The range of the original value.
2. The range imposed by the linked condition.
1. is known, 2. can be determined using makeAllowedICmpRegion. The
intersection of those ranges is the range for the copy.
With this patch, we get a nice increase in the number of instructions
eliminated by both SCCP and IPSCCP for some benchmarks:
For MultiSource, SPEC2000 & SPEC2006:
Tests: 237
Same hash: 170 (filtered out)
Remaining: 67
Metric: sccp.NumInstRemoved
Program base patch diff
test-suite...Source/Benchmarks/sim/sim.test 10.00 71.00 610.0%
test-suite...CFP2000/177.mesa/177.mesa.test 361.00 1626.00 350.4%
test-suite...encode/alacconvert-encode.test 141.00 602.00 327.0%
test-suite...decode/alacconvert-decode.test 141.00 602.00 327.0%
test-suite...CI_Purple/SMG2000/smg2000.test 1639.00 4093.00 149.7%
test-suite...peg2/mpeg2dec/mpeg2decode.test 75.00 163.00 117.3%
test-suite...T2006/401.bzip2/401.bzip2.test 358.00 513.00 43.3%
test-suite...rks/FreeBench/pifft/pifft.test 11.00 15.00 36.4%
test-suite...langs-C/unix-tbl/unix-tbl.test 4.00 5.00 25.0%
test-suite...lications/sqlite3/sqlite3.test 541.00 667.00 23.3%
test-suite.../CINT2000/254.gap/254.gap.test 243.00 299.00 23.0%
test-suite...ks/Prolangs-C/agrep/agrep.test 25.00 29.00 16.0%
test-suite...marks/7zip/7zip-benchmark.test 1135.00 1304.00 14.9%
test-suite...lications/ClamAV/clamscan.test 1105.00 1268.00 14.8%
test-suite...urce/Applications/lua/lua.test 398.00 436.00 9.5%
Metric: sccp.IPNumInstRemoved
Program base patch diff
test-suite...C/CFP2000/179.art/179.art.test 1.00 3.00 200.0%
test-suite...006/447.dealII/447.dealII.test 429.00 1056.00 146.2%
test-suite...nch/fourinarow/fourinarow.test 3.00 7.00 133.3%
test-suite...CI_Purple/SMG2000/smg2000.test 818.00 1748.00 113.7%
test-suite...ks/McCat/04-bisect/bisect.test 3.00 5.00 66.7%
test-suite...CFP2000/177.mesa/177.mesa.test 165.00 255.00 54.5%
test-suite...ediabench/gsm/toast/toast.test 18.00 27.00 50.0%
test-suite...telecomm-gsm/telecomm-gsm.test 18.00 27.00 50.0%
test-suite...ks/Prolangs-C/agrep/agrep.test 24.00 35.00 45.8%
test-suite...TimberWolfMC/timberwolfmc.test 43.00 62.00 44.2%
test-suite...encode/alacconvert-encode.test 46.00 66.00 43.5%
test-suite...decode/alacconvert-decode.test 46.00 66.00 43.5%
test-suite...langs-C/unix-tbl/unix-tbl.test 12.00 17.00 41.7%
test-suite...peg2/mpeg2dec/mpeg2decode.test 31.00 41.00 32.3%
test-suite.../CINT2000/254.gap/254.gap.test 117.00 154.00 31.6%
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D76611
From Arm v8 Architecture Reference Manual F5.1.84 LDREXD
The ldrexd instruction in Arm state has the following conditions:
t = UInt(Rt); t2 = t + 1; n = UInt(Rn);
if Rt<0> == '1' || t2 == 15 || n == 15 then UNPREDICTABLE;
In when Rt is odd or if Rt is 14 (making t2 15).
In the implementation when the pair is the UNPREDICTABLE R14_R15 we
would ideally return SOFT_FAIL. We can't because there is no R14_R15
value for us to return so we fail early returning FAIL.
The early return for registers outside the bounds of the table means
the check for Rt == 14 (0xE) redundant which causes a static analyzer
to flag the condition as never being true.
To fix the warning I've removed the check and replaced with a comment
explaining the difference with the specification.
Fixes pr41660
Differential Revision: https://reviews.llvm.org/D77463
Do not commit the llvm/test/ExecutionEngine/MCJIT/cet-code-model-lager.ll because it will
cause build bot fail(not suitable for window 32 target).
Summary:
This patch comes from H.J.'s 2bd54ce7fa
**This patch fix the failed llvm unit tests which running on CET machine. **(e.g. ExecutionEngine/MCJIT/MCJITTests)
The reason we enable IBT at "JIT compiled with CET" is mainly that: the JIT don't know the its caller program is CET enable or not.
If JIT's caller program is non-CET, it is no problem JIT generate CET code or not.
But if JIT's caller program is CET enabled, JIT must generate CET code or it will cause Control protection exceptions.
I have test the patch at llvm-unit-test and llvm-test-suite at CET machine. It passed.
and H.J. also test it at building and running VNCserver(Virtual Network Console), it works too.
(if not apply this patch, VNCserver will crash at CET machine.)
Reviewers: hjl.tools, craig.topper, LuoYuanke, annita.zhang, pengfei
Reviewed By: LuoYuanke
Subscribers: tstellar, efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76900
Summary:
Thanks to Bill Wendling (void) for the report and steps to reproduce. It looks
like this was missed during r350508's cleanup of the CallSite split into
CallBase, CallInst, and CallBrInst.
This was exposed by running pgo on a callbr, which was creating a ptrtoint to
the inline asm thinking it was an indirect call. The relevant callchain looks
like:
IndirectCallPromotionPlugin::run()
-> PGOIndirectCallVisitor::findIndirectCalls()
-> PGOIndirectCallVisitor::visitCallBase()
-> CallBase::isIndirectCall()
Reviewers: void, chandlerc
Reviewed By: void
Subscribers: hiraditya, llvm-commits, craig.topper, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77600
Summary:
In some cases, ASan may insert instrumentation before function arguments
have been stored into their allocas. This causes two issues:
1) The argument value must be spilled until it can be stored into the
reserved alloca, wasting a stack slot.
2) Until the store occurs in a later basic block, the debug location
will point to the wrong frame offset, and backtraces will show an
uninitialized value.
The proposed solution is to move instructions which initialize allocas
for arguments up into the entry block, before the position where ASan
starts inserting its instrumentation.
For the motivating test case, before the patch we see:
```
| 0033: movq %rdi, 0x68(%rbx) | | DW_TAG_formal_parameter |
| ... | | DW_AT_name ("a") |
| 00d1: movq 0x68(%rbx), %rsi | | DW_AT_location (RBX+0x90) |
| 00d5: movq %rsi, 0x90(%rbx) | | ^ not correct ... |
```
and after the patch we see:
```
| 002f: movq %rdi, 0x70(%rbx) | | DW_TAG_formal_parameter |
| | | DW_AT_name ("a") |
| | | DW_AT_location (RBX+0x70) |
```
rdar://61122691
Reviewers: aprantl, eugenis
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77182
Summary:
It can be helpful to test behaviour w.r.t locations without having DEBUG_VALUE
around. In particular, because DEBUG_VALUE has the potential to change CodeGen
behaviour (e.g. hasOneUse() vs hasOneNonDbgUse()) while locations generally
don't.
Reviewers: aprantl, bogner
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77438
A global symbol that is defined in a comdat should not generate an alias since
call sites that would've referred to that symbol will refer to their own
independent local aliases rather than the surviving global comdat one. This
could result in something that looks like:
```
ld.lld: error: relocation refers to a discarded section: .text._ZN3fbl8internal18NullFunctionTargetIvJjjPjEED1Ev.stub
>>> defined in user-x64-clang/obj/system/ulib/minfs/libminfs.a(minfs._sources.file.cc.o)
>>> section group signature: _ZN3fbl8internal18NullFunctionTargetIvJjjPjEED1Ev.stub
>>> prevailing definition is in user-x64-clang/obj/system/ulib/minfs/libminfs.a(minfs._sources.vnode.cc.o)
>>> referenced by function.h:169 (../../zircon/system/ulib/fbl/include/fbl/function.h:169)
>>> minfs._sources.file.cc.o:(minfs::File::AllocateAndCommitData(std::__2::unique_ptr<minfs::Transaction, std::__2::default_delete<minfs::Transaction> >)) in archive user-x64-clang/obj/system/ulib/minfs/libminfs.a
```
We ran into this when experimenting with a new C++ ABI for fuchsia
(refer to D72959) which takes relative offsets between comdat'd functions
which is why the normal C++ user wouldn't run into this.
Differential Revision: https://reviews.llvm.org/D77429
Summary:
A bug report mentioned that LLVM was producing jumps off the end of a
function when using "asm goto with outputs". Further digging pointed to
MachineBasicBlocks that had their address taken and were indirect
targets of INLINEASM_BR being removed by BranchFolder, because their
predecessor list was empty, so they appeared to have no entry.
This was a cascading failure caused earlier, during Pre-RA instruction
scheduling. We have a few special cases in Pre-RA instruction scheduling
where we split a MachineBasicBlock in two. This requires careful
handing of predecessor and successor lists for a MachineBasicBlock that
was split, and careful handing of PHI MachineInstrs that referred to the
MachineBasicBlock before it was split.
The clue that led to this fix was the observation that many callers of
MachineBasicBlock::splice() frequently call
MachineBasicBlock::transferSuccessorsAndUpdatePHIs() to update their PHI
nodes after a splice. We don't want to reuse that method, as we have
custom successor transferring logic for this block split.
This patch fixes 2 pre-existing bugs, and adds tests.
The first bug was that MachineBasicBlock::splice() correctly handles
updating most successors and predecessors; we don't need to do anything
more than removing the previous fallthrough block from the first half of
the split block post splice. Previously, we were updating the successor
list incorrectly (updating successors updates predecessors).
The second bug was that PHI nodes that needed registers from the first
half of the split block were not having entries populated. The register
live out information was correct, and the FuncInfo->PHINodesToUpdate was
correct. Specifically, the check in SelectionDAGISel::FinishBasicBlock:
for (unsigned i = 0, e = FuncInfo->PHINodesToUpdate.size(); i != e; ++i) {
MachineInstrBuilder PHI(*MF, FuncInfo->PHINodesToUpdate[i].first);
if (!FuncInfo->MBB->isSuccessor(PHI->getParent()))
continue;
PHI.addReg(FuncInfo->PHINodesToUpdate[i].second).addMBB(FuncInfo->MBB);
was `continue`ing because FuncInfo->MBB tracks the second half of
the post-split block; no one was updating PHI entries for the first half
of the post-split block.
SelectionDAGBuilder::UpdateSplitBlock() already expects to perform
special handling for MachineBasicBlocks that were split post calls to
ScheduleDAGSDNodes::EmitSchedule(), so I'm confident that it's both
correct for ScheduleDAGSDNodes::EmitSchedule() to return the second half
of the split block `CopyBB` which updates `FuncInfo->MBB` (ie. the
current MachineBasicBlock being processed), and perform special handling
for this in SelectionDAGBuilder::UpdateSplitBlock().
Reviewers: void, craig.topper, efriedma
Reviewed By: void, efriedma
Subscribers: hfinkel, fhahn, MatzeB, efriedma, hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76961
This pass is created in d6de5f12d4 and tested
for new and legacy pass manager but never added to new pass manager pipeline.
I am adding it to new pass manager pipeline.
This pass is get used in Vector Function Database (VFDatabase) and without
this pass in new pass manager pipeline, none of the vector libraries are work
ing with new pass manager.
Related passes:
66c120f025https://reviews.llvm.org/D74944
Differential revision: https://reviews.llvm.org/D75354
The previous code used the type of the first field for the VT
passed to getNode for every field.
I've based the implementation here off what is done in visitSelect
as it removes the need to special case aggregates.
Differential Revision: https://reviews.llvm.org/D77093
So that constant expressions like the following are permitted:
and w0, w0, #~(0xfe<<24)
and w1, w1, #~(0xff<<24)
The behavior matches GNU as (opcodes/aarch64-opc.c:aarch64_logical_immediate_p).
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D75885
The cmyk test is based on the known regression that resulted from:
rGf2fbdf76d8d0
This improves on the equivalent signed min/max change:
rG867f0c3c4d8c
The underlying icmp equivalence is:
~X pred ~Y --> Y pred X
For an icmp with constant, canonicalization results in a swapped pred:
~X < C --> X > ~C
SUMMARY:
For the llvm-objdump -D, the symbol name is used as a label in the disassembly for the specific address (when a symbol address is equal to the virtual address in the dump).
In XCOFF, multiple symbols may have the same name, being differentiated by their storage mapping class. It is helpful to print the QualName and not just the name when forming the output label for a csect symbol. The symbol index further removes any ambiguity caused by duplicate names.
To maintain compatibility with the binutils objdump, the XCOFF-specific --symbol-description option is added to enable the enhanced format.
Reviewers: hubert.reinterpretcast, James Henderson, Jason Liu ,daltenty
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D72973
llvm-readobj prints the file path as part of the output, so
--implicit-check-not patterns can accidentally match it. This patch
makes the test more robust by preventing failures if "foo" is in
somebody's path.
Reviewed by: grimar
Differential Revision: https://reviews.llvm.org/D77546
ExecutionEngine/MCJIT/cet-code-model-lager.ll is failing on 32-bit
windows, see llvm-commits thread for fef2dab.
This reverts commit 43f031d312
and the follow-ups fef2dab100 and
6a800f6f62.
It was failing in 32-bit Windows builds with:
$ ":" "RUN: at line 1"
$ "c:\src\llvm_package_944db8a4\build32_stage0\bin\lli.exe" "-mtriple=i686-pc-windows-msvc-elf" "-code-model=large" "C:\src\llvm_package_944db8a4\llvm-project\llvm\test\ExecutionEngine\MCJIT\cet-code-model-lager.ll"
# command stderr:
Assertion failed: Is64Bit && "Large code model is only legal in 64-bit mode.", file C:\src\llvm_package_944db8a4\llvm-project\llvm\lib\Target\X86\X86ISelLowering.cpp, line 4212
Let's see if this helps.
Summary:
This patch adds support for emission of following DWARFv5 macro forms
in .debug_macro section.
1. DW_MACRO_start_file
2. DW_MACRO_end_file
3. DW_MACRO_define_strp
4. DW_MACRO_undef_strp.
Reviewed By: dblaikie, ikudrin
Differential Revision: https://reviews.llvm.org/D72828
truncateVectorWithPACK has its own vector length controls, so we can rely on those directly. This helps some existing truncation to subvector tests, which were being combined later during shuffle lowering at which point the sign/zero bit detection had become obscured preventing lowerShuffleWithPACK working as well as it could.
Summary:
Modify lea/load/store instructions to accept `disp(index, base)`
style addressing mode (called ASX format). Also, uniform the
number of DAG nodes to have 3 operands for this ASX format
instructions, and update selectADDR functions to lower
appropriate MI.
Reviewers: arsenm, simoll, k-ishizaka
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D76822
This patch adds a -matrix-default-layout option which can be used to
set the default matrix layout to row-major or column-major (default).
The initial patch updates codegen for loads, stores, binary operators
and matrix multiply.
Reviewers: anemet, Gerolf, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D76325
This patch adds initial fusion for load/multiply/store chains of matrix
operations.
The patch contains roughly two parts:
1. Code generation for a fused load/multiply/store chain (LowerMatrixMultiplyFused).
First, we ensure that both loads of the multiply operands do not alias the store.
If they do, we create new non-aliasing copies of the operands. Note that this
may introduce new basic block. Finally we process TileSize x TileSize blocks.
That is: load tiles from the input operands, multiply and store them.
2. Identify fusion candidates & matrix instructions.
As a first step, collect all instructions with shape info and fusion candidates
(currently @llvm.matrix.multiply calls). Next, try to fuse candidates and
collect instructions eliminated by fusion. Finally iterate over all matrix
instructions, skip the ones eliminated by fusion and lower the rest as usual.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D75566
llvm-dwp did not check section identifiers read from input files.
In the case of an unexpected identifier, the calculated index for
Contributions[] pointed outside the array. This fix avoids the issue
by skipping unsupported identifiers.
Differential Revision: https://reviews.llvm.org/D76543
In package files, the base offset provided by index sections should be
used to find the contribution of a unit. The patch adds that base
offset when reading range list tables.
Differential revision: https://reviews.llvm.org/D77401
This fixes the reading of location lists headers for compilation units
in package files by adjusting the reading offset according to the
corresponding record in the unit index. This is required for
DW_FORM_loclistx to work.
Differential revision: https://reviews.llvm.org/D77146
Without the patch, all version 5 compile units in a DWP file read
location tables from the beginning of a .debug_loclists.dwo section.
The patch fixes that by adjusting the reading offset the same way as
for pre-v5 units. The section identifier to find the contribution
entry corresponds to the version of the unit.
Differential revision: https://reviews.llvm.org/D77145
DWARFv5 defines index sections in package files in a slightly different
way than the pre-standard GNU proposal, see Section 7.3.5 in the DWARF
standard and https://gcc.gnu.org/wiki/DebugFissionDWP for GNU proposal.
The main concern here is values for section identifiers, which are
partially overlapped with changed meanings. The patch adds support for
v5 index sections and resolves that difficulty by defining a set of
identifiers for internal use which can represent and distinct values
of both standards.
Differential Revision: https://reviews.llvm.org/D75929
The new and old pass managers (PassManagerBuilder.cpp and
PassBuilder.cpp) are exposed to an `extern` declaration of
`attributor-disable` option which will guard the addition of the
attributor passes to the pass pipelines.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D76871
Add a new overload of StaticLibraryDefinitionGenerator::Load that takes a triple
argument and supports loading archives from MachO universal binaries in addition
to regular archives.
The LLI tool is updated to use this overload.
We had previously limited the shuffle(HORIZOP,HORIZOP) combine to binary shuffles, but we can often merge unary shuffles just as well, folding in UNDEF/ZERO values into the 64-bit half lanes.
For the (P)HADD/HSUB cases this is limited to fast-horizontal cases but PACKSS/PACKUS combines under all cases.
This patch builds upon D76140 by updating metadata on pointer typed
loads in inlined functions, when the load is the return value, and the
callsite contains return attributes which can be updated as metadata on
the load.
Added test cases show this for nonnull, dereferenceable,
dereferenceable_or_null
Reviewed-By: jdoerfert
Differential Revision: https://reviews.llvm.org/D76792
The newly-created constant zero will need an extra register to hold it
in the current statepoint lowering implementation. Remove it if there
exists one.
Summary:
This patch upstreams support the optional ARMv8.0 Data Gathering Hint (DGH)
extension, which adds the Data Gathering Hint instruction to the hint
space.
See ARMv8.0-DGH in the Arm Architecture Reference Manual Armv8 for more
information.
Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, danielkiss, samparker
Reviewed By: SjoerdMeijer
Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77097
Summary:
This patch upstreams support for the ARMv8.6A Enhanced Counter Virtualization
(ECV) extension, which adds 6 new system registers.
See ARMv8.6-ECV in the Arm Architecture Reference Manual Armv8 for more
information.
Reviewers: t.p.northover, rengolin, SjoerdMeijer, pcc, ab, chill
Reviewed By: SjoerdMeijer
Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77094
As discussed in D76983, that patch can turn a chain of insert/extract
with scalar trunc ops into bitcast+extract and existing instcombine
vector transforms end up creating a shuffle out of that (see the
PhaseOrdering test for an example). Currently, that process requires
at least this sequence: -instcombine -early-cse -instcombine.
Before D76983, the sequence of insert/extract would reach the SLP
vectorizer and become a vector trunc there.
Based on a small sampling of public targets/types, converting the
shuffle to a trunc is better for codegen in most cases (and a
regression of that form is the reason this was noticed). The trunc is
clearly better for IR-level analysis as well.
This means that we can induce "spontaneous vectorization" without
invoking any explicit vectorizer passes (at least a vector cast op
may be created out of scalar casts), but that seems to be the right
choice given that we started with a chain of insert/extract, and the
backend would expand back to that chain if a target does not support
the op.
Differential Revision: https://reviews.llvm.org/D77299
Summary:
This patch upstreams support for the ARMv8.6A Fine Grain Traps (FGT)
extension, which adds 5 new system registers.
See ARMv8.6-FGT in the Arm Architecture Reference Manual Armv8 for more
information.
Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, momchil.velikov
Reviewed By: SjoerdMeijer
Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76991
The cmyk tests are based on the known regression that resulted from:
rGf2fbdf76d8d0
So this improvement in analysis might be enough to restore that commit.
Summary:
This patch upstreams v8.6A activity monitors virtualization
assembler support, which consists of 32 new system
registers (two groups, each with 16 numbered registers).
See ARMv8.6-AMU in the Arm Architecture Reference Manual Armv8 for more
information.
Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, john.brawn, ostannard
Reviewed By: ostannard
Subscribers: LukeGeeson, dnsampaio, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76998
Our existing combine allows to merge the shuffle of 2 similar 64-bit wide 'horizontal ops' (HADD/PACK/etc.) if the shuffle was a UNPCK/MOVSD.
This patch generalizes this to decode any target shuffle mask that can be widened to a 128-bit repeating v2*64 mask, which helps us catch PBLENDW/PBLENDD cases.
If we're packing from 128-bits to 64-bits then we don't need the RHS argument. This helps with register allocation, especially as we avoid repeating a use of the input value.