Makes it easier to test "this doesn't produce an error" (& indeed makes
that the implied default so we don't accidentally write tests that have
silent/sneaky errors as well as the positive behavior we're testing for)
Though the support for applying relocations is patchy enough that a
bunch of tests treat lack of relocation application as more of a warning
than an error - so rather than me trying to figure out how to add
support for a bunch of relocation types, let's degrade that to a warning
to match the usage (& indeed, it's sort of more of a tool warning anyway
- it's not that the DWARF is wrong, just that the tool can't fully cope
with it - and it's not like the tool won't dump the DWARF, it just won't
follow/render certain relocations - I guess in the most general case it
might try to render an unrelocated value & instead render something
bogus... but mostly seems to be about interesting relocations used in
eh_frame (& honestly it might be nice if we were lazier about doing this
relocation resolution anyway - if you're not dumping eh_frame, should we
really be erroring about the relocations in it?))
The transformation currently does not differentiate between explicit
and implicit kills. However, it is not valid to later simply clear
an implicit kill flag since the kill could be due to a call or return.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45374
The change introduces the usage of physical registers for non-gc deopt values.
This require runtime support to know how to take a value from register.
By default usage is off and can be switched on by option.
The change also introduces additional fix-up patch which forces the spilling
of caller saved registers (clobbered after the call) and re-writes statepoint
to use spill slots instead of caller saved registers.
Reviewers: reames, danstrushin
Reviewed By: dantrushin
Subscribers: mgorny, hiraditya, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D77797
Loop simplify form should always be checked because logic of
propagateStoredValueToLoadUsers relies on it (in particular, it
requires preheader).
Reviewed By: Fedor Sergeev, Florian Hahn
Differential Revision: https://reviews.llvm.org/D77775
Summary:
D74269 added debug info to newly created instructions, including calls
to `malloc` and `free`, by taking debug info from existing instructions
around, whose debug info may or may not be empty.
But there are cases debug info is required by the IR verifier: when both
the caller and the callee functions have DISubprograms, meaning we
already have declarations to `malloc` or `free` with a DISubprogram
attached, newly created calls to `malloc` and `free` should have
non-empty debug info. This patch creates a non-empty dummy debug info in
this case to those calls to make the IR verifier pass.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45461.
Reviewers: dschuff
Subscribers: aprantl, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77784
Summary:
Encode `-fveclib` setting as per-function attribute so it can threaded through to LTO backends. Accordingly per-function TLI now reads
the attributes and select available vector function list based on that. Now we also populate function list for all supported vector
libraries for the shared per-module `TargetLibraryInfoImpl`, so each function can select its available vector list independently but without
duplicating the vector function lists. Inlining between incompatbile vectlib attributed is also prohibited now.
Subscribers: hiraditya, dexonsmith, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77632
This is similar to what I recently did for getArithmeticReductionCost.
I'm trying to account for the narrowing from 512->256->128 as we go.
I've also added a new helper method getMinMaxCost that tries to
handle the cases where we have native min/max instructions and
fall back to cmp+select when we don't.
Differential Revision: https://reviews.llvm.org/D76634
When we try to select a SELECT_CC on Power9, we check if it can be matched to a
SETB instruction. In that function, we assert that the output type is i32/i64.
This is unnecessary as it is perfectly reasonable to have an i1 SELECT_CC.
Change that from an assert to an early exit condition.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45448
to store to a stack location for outgoing args.
During call arg lowering we shouldn't be modifying SP so cache the SP copy
vreg for subsequent uses.
Gives a 0.2% geomean code size improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D77838
Summary:
Removes:
* All LLVM-IR level debug info using StripDebugInfo()
* All debugify metadata
* 'Debug Info Version' module flag
* All (valid*) DEBUG_VALUE MachineInstrs
* All DebugLocs from MachineInstrs
This is a more complete solution than the previous MIRPrinter
option that just causes it to neglect to print debug-locations.
* The qualifier 'valid' is used here because AArch64 emits
an invalid one and tests depend on it
Reviewers: vsk, aprantl, bogner
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77747
v2i8/v4i8/v8i8 + v2i16/v4i16 all show up in vectorizer code and by just using the legalized types (v16i8/v8i16) we're highly exaggerating the actual cost of the shuffle.
This adds the instruction encoding and mnenomics for the proposed
RISC-V Bit Manipulation extension (version 0.92). It is implemented with
each category of instruction as its own target feature, with the 'b'
extension feature enabling all options. Since this extension is not yet
ratified, all target features are prefixed with 'experimental-' to note
their status.
Differential Revision: https://reviews.llvm.org/D65649
Summary:
This patch adds support for handling of variadic functions for AIX.
This includes ensuring that use and consume correct type of
va_list (char *va_list) for AIX.
Authored by: ZarkoCA
Reviewers: cebowleratibm, sfertile, jasonliu
Reviewed by: jasonliu
Differential Revision: https://reviews.llvm.org/D76130
Add initial support for PC Relative addressing for constant pool loads.
This includes adding a new relocation for @pcrel and adding a new PowerPC flag
to identify PC relative addressing.
Differential Revision: https://reviews.llvm.org/D74486
Summary:
VE has special operands to represent 0b000...000111...111 (`(m)0`) and
0b111...111000...000 (`(m)1`) bit sequences. This patch supports those
operands not only in machine instructions but also in DAG lowering.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D77769
Introduce a new VPWidenCanonicalIVRecipe to generate a canonical vector
induction for use in fold-tail-with-masking, if a primary induction is absent.
The canonical scalar IV having start = 0 and step = VF*UF, created during code
-gen to control the vector loop, is widened into a canonical vector IV having
start = {<Part*VF, Part*VF+1, ..., Part*VF+VF-1> for 0 <= Part < UF} and
step = <VF*UF, VF*UF, ..., VF*UF>.
Differential Revision: https://reviews.llvm.org/D77635
Use utils/update_llc_test_checks.py to add full CHECK directives to the
test for cxx_fast_tls calling convention. The calling convention is
arguably dead on PowerPC since dropping Darwin subtarget support in the PowerPC
backend. This test change helps show the atrocious code generation for
this lit test which was hidden by having few CHECK directives.
This implements the instruction analysis required to print branch
targets as part of llvm-objdump's disassembly.
Note, this only handles those branches which can be analyzed in a single
instruction, a future patch will handle multiple-instruction patterns,
such as AUIPC/LUI+JALR instruction pairs.
Differential Revision: https://reviews.llvm.org/D77567
The change introduces the usage of physical registers for non-gc deopt values.
This require runtime support to know how to take a value from register.
By default usage is off and can be switched on by option.
The change also introduces additional fix-up patch which forces the spilling
of caller saved registers (clobbered after the call) and re-writes statepoint
to use spill slots instead of caller saved registers.
Reviewers: reames, dantrushin
Reviewed By: reames, dantrushin
Subscribers: mgorny, hiraditya, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D77371
Summary:
New SanitizerCoverage feature `inline-bool-flag` which inserts an
atomic store of `1` to a boolean (which is an 8bit integer in
practice) flag on every instrumented edge.
Implementation-wise it's very similar to `inline-8bit-counters`
features. So, much of wiring and test just follows the same pattern.
Reviewers: kcc, vitalybuka
Reviewed By: vitalybuka
Subscribers: llvm-commits, hiraditya, jfb, cfe-commits, #sanitizers
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D77244
When the Attributor was created the test update scripts were not well
suited to deal with the challenges of IR attribute checking. This
partially improved.
Since then we also added three additional configurations that need
testing; in total we now have the following four:
{ TUNIT, CGSCC } x { old pass manager (OPM), new pass manager (NPM) }
Finally, the number of developers and tests grew rapidly (partially due
to the addition of ArgumentPromotion and IPConstantProp tests), which
resulted in tests only being run in some configurations, different
prefixes being used, and different "styles" of checks being used.
Due to the above reasons I believed we needed to take another look at
the test update scripts. While we started to use them, via UTC_ARGS:
--enable/disable, the other problems remained. To improve the testing
situation for *all* configurations, to simplify future updates to the
test, and to help identify subtle effects of future changes, we now use
the test update scripts for (almost) all Attributor tests.
An exhaustive prefix list minimizes the number of check lines and makes
it easy to identify and compare configurations.
Tests have been adjusted in the process but we tried to keep their
intend unchanged.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D76588
This patch introduces the heat coloring of the Control Flow Graph which is based
on the relative "hotness" of each BB. The patch is a part of sequence of three
patches, related to graphs Heat Coloring.
Reviewers: rcorcs, apilipenko, davidxl, sfertile, fedor.sergeev, eraman, bollu
Differential Revision: https://reviews.llvm.org/D77161
Summary:
Preserve call site info for duplicated instructions. We copy over the
call site info in CloneMachineInstrBundle to avoid repeated calls to
copyCallSiteInfo in CloneMachineInstr.
(Alternatively, we could copy call site info higher up the stack, e.g.
into TargetInstrInfo::duplicate, or even into individual backend passes.
However, I don't see how that would be safer or more general than the
current approach.)
Reviewers: aprantl, djtodoro, dstenb
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77685
Any or all the argument registers can be used to pass a byval formal
argument, with the limitation that the argument must fit in the
available registers (ie: is not split between registers and stack).
Differential Revision: https://reviews.llvm.org/D76902
On PowerPC most functions require a valid TOC pointer.
This is the case because either the function itself needs to use this
pointer to access the TOC or because other functions that are called
from that function expect a valid TOC pointer in the register R2.
The main exception to this is leaf functions that do not access the TOC
since they are guaranteed not to need a valid TOC pointer.
This patch introduces a feature that will allow more functions to not
require a valid TOC pointer in R2.
Differential Revision: https://reviews.llvm.org/D73664
Based on the post-commit comments for rG0f56bbc, there might
be a problem with this transform:
(bitcast (fpext/fptrunc X)) to iX) < 0 --> (bitcast X to iY) < 0
...and the ppc_fp128 data type, so conservatively bypass if we
are bitcasting a ppc_fp128.
We might be able to account for endian or other differences to
enable this for PowerPC again if that is useful.
Differential Revision: https://reviews.llvm.org/D77642
The R_AARCH64_PLT32 relocation type will be documented in the next release
of ELF for the 64-bit Arm Architecture. It is being added in draft state
for the benefit of the position independent vtable feature.
R_AARCH64_PLT32 is very similar to R_AARCH64_PREL32. The intention is to
provide a signed 32-bit integer representing an offset from the place
to a function.
- It relocates 32-bit data
- The expression is S + A - P
- The overflow check for the expression is -2^31 <= X < 2^31
- The relocation generates Thunks/Veneers/Stubs and PLT entries as per
R_AArch64_CALL26
- If the symbol S is an undefined weak the ABI does not define its value.
The ABI defines a code for ilp32 for completeness, I have added the code
but have only added to the existing reloc-types-elf-aarch64.text as there
is no ilp32 equivalent.
Differential Revision: https://reviews.llvm.org/D77647
Summary:
Since D75300 has been landed, I want to support enhanced relaxation when we need to align branches and allow prefix padding. "Enhanced Relaxtion" means we allow an instruction that could not be traditionally relaxed to be emitted into RelaxableFragment so that we increase its length by adding prefixes for optimization.
The motivation is straightforward, RelaxFragment is mostly for relative jumps and we can not increase the length of jumps when we need to align them, so if we need to achieve D75300's purpose (reducing the bytes of nops) when need to align jumps, we have to make more instructions "relaxable".
Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight
Reviewed By: reames
Subscribers: hiraditya, llvm-commits, annita.zhang
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76286
Summary:
This fixes PR45302.
Previously the case
BB1
/ \
| |
TBB FBB
| |
\ /
BB2
was treated as a valid diamond also when TBB and FBB was the same basic
block. This then lead to a failed assertion in IfConvertDiamond.
Since TBB == FBB is quite a degenerated case of a diamond, we now
don't treat it as a valid diamond anymore, and thus we will avoid the
trouble of making IfConvertDiamond handle it correctly.
Reviewers: efriedma, kparzysz
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D77651
This patch adds an analysis of the offset addresses used by gathers
and scatters to the MVEGatherScatterLowering pass to find
multiplications and additions that are loop invariant and thus can
be moved into the loop preheader, avoiding to execute them each time.
Differential Revision: https://reviews.llvm.org/D76681
Summary:
Legalization can introduce the trunc(trunc) pattern. This can cause
problems if one of these intermediate truncs is not legal.
Combine truncs of this pattern, if the resulting trunc is legal.
Reviewers: arsenm, aemerson, dsanders
Reviewed By: arsenm
Subscribers: jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76601
When two sections shared the same address, the disassembly code was
using pointer values when sorting (see the SectionRef less than
operator). Since those values aren't guaranteed to have a specific
order, this meant the disassembly code would sometimes change which
section to pick when finding symbols targeted by calls in fully linked
objects.
This change fixes the non-determinism, so that the same section is
always picked. This might have a negative impact in that now a section
without any symbol might be picked over a section with symbols, but this
will be addressed in a later commit.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45411.
Reviewed by: grimar, MaskRay
Differential Revision: https://reviews.llvm.org/D77640
Summary:
Combine sext(zext x) to (zext x) since the sign-bit is 0
after the zero-extension.
Combine sext(sext x) to (sext x) and ext(zext x) to (zext x)
since the intermediate step is not needed.
Reviewers: arsenm, volkan, aemerson, aditya_nandakumar
Reviewed By: arsenm
Subscribers: jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77210
Summary:
When narrowing G_IMPLICIT_DEF where the original size is not a multiple
of the narrow size, emit a smaller G_IMPLICIT_DEF and use G_ANYEXT.
To prevent a potential endless loop in the legalizer, the condition
to combine G_ANYEXT(G_IMPLICIT_DEF) is changed from isInstUnsupported
to !isInstLegal, since in this case the combine is only valid if
consequent legalization of the newly combined G_IMPLICIT_DEF does not
introduce G_ANYEXT due to narrowing.
Although this legalization for G_IMPLICIT_DEF would also be valid for
the general case, it actually caused a lot of code regressions when
tried due to superfluous COPYs and combines not getting hit anymore.
Reviewers: dsanders, aemerson, volkan, arsenm, aditya_nandakumar
Reviewed By: arsenm
Subscribers: jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76598
In DWARFv5, type units are stored in .debug_info sections, along with
compilation units, and they are distinguished by the unit_type field
in the header, not by the name of the section. It is impossible to
associate the correct index section of a DWP file with the unit before
the unit's header is read. This patch fixes reading DWARFv5 type units
by parsing the header first and then applying the index entry according
to the actual unit type.
Differential Revision: https://reviews.llvm.org/D77552
Summary:
Re-used the IR-level debugify for the most part. The MIR-level code then
adds locations to the MachineInstrs afterwards based on the LLVM-IR debug
info.
It's worth mentioning that the resulting locations make little sense as
the range of line numbers used in a Function at the MIR level exceeds that
of the equivelent IR level function. As such, MachineInstrs can appear to
originate from outside the subprogram scope (and from other subprogram
scopes). However, it doesn't seem worth worrying about as the source is
imaginary anyway.
There's a few high level goals this pass works towards:
* We should be able to debugify our .ll/.mir in the lit tests without
changing the checks and still pass them. I.e. Debug info should not change
codegen. Combining this with a strip-debug pass should enable this. The
main issue I ran into without the strip-debug pass was instructions with MMO's and
checks on both the instruction and the MMO as the debug-location is
between them. I currently have a simple hack in the MIRPrinter to
resolve that but the more general solution is a proper strip-debug pass.
* We should be able to test that GlobalISel does not lose debug info. I
recently found that the legalizer can be unexpectedly lossy in seemingly
simple cases (e.g. expanding one instr into many). I have a verifier
(will be posted separately) that can be integrated with passes that use
the observer interface and will catch location loss (it does not verify
correctness, just that there's zero lossage). It is a little conservative
as the line-0 locations that arise from conflicts do not track the
conflicting locations but it can still catch a fair bit.
Depends on D77439, D77438
Reviewers: aprantl, bogner, vsk
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77446
dso_local leads to direct access even if the definition is not within this compilation unit (it is
still in the same linkage unit). On ELF, such a relocation (e.g. R_X86_64_PC32) referencing a
STB_GLOBAL STV_DEFAULT object can cause a linker error in a -shared link.
If the linkage is changed to available_externally, the dso_local flag should be dropped, so that no
direct access will be generated.
The current behavior is benign, because -fpic does not assume dso_local
(clang/lib/CodeGen/CodeGenModule.cpp:shouldAssumeDSOLocal).
If we do that for -fno-semantic-interposition (D73865), there will be an
R_X86_64_PC32 linker error without this patch.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D74751
Fix the error of show-prof-info.test on some platforms without zlib.
The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.
Differential Revision: https://reviews.llvm.org/D77426
The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.
Differential Revision: https://reviews.llvm.org/D77426
There are a few patterns where we use a superclass for inputs to this
instruction rather than the correct class. This can sometimes lead to
unncessary copies.
This patch extends existing constant folding in logical operations to
handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and
V_AND_OR_B32. Also added a couple of tests for existing folds.
7aecf232 fixed the bug where we would miscompile, but we still generate
a crazy amount of code. Turn off the expansion until someone implements
an appropriate heuristic.
Differential Revision: https://reviews.llvm.org/D77599
We can only collapse adjacent SI_END_CF if outer statement
belongs to a simple SI_IF, otherwise correct mask is not in the
register we expect, but is an argument of an S_XOR instruction.
Even if SI_IF is simple it might be lowered using S_XOR because
lowering is dependent on a basic block layout. It is not
considered simple if instruction consuming its output is
not an SI_END_CF. Since that SI_END_CF might have already been
lowered to an S_OR isSimpleIf() check may return false.
This situation is an opportunity for a further optimization
of SI_IF lowering, but that is a separate optimization. In the
meanwhile move SI_END_CF post the lowering when we already know
how the rest of the CFG was lowered since a non-simple SI_IF
case still needs to be handled.
Differential Revision: https://reviews.llvm.org/D77610
Summary:
- Remove the no longer used Darwin CalleeSavedRegs
- Combine the SVR464 callee saved regs and AIX64 since the two are (and should be) identical into PPC64
- Update tests for 64-bit CSR change
Reviewers: sfertile, ZarkoCA, cebowleratibm, jasonliu, #powerpc
Reviewed By: sfertile
Subscribers: wuzish, nemanjai, hiraditya, kbarton, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77235
Shuffle combining can insert zero byte sized elements into the shuffle mask, which combineX86ShufflesConstants will attempt to fold without taking into account whether the byte-sized type is legal (e.g. AVX512F only targets).
If we have a full-zeroable vector then we should just return a zero version of the root type, otherwise if the type isn't valid we should bail.
Fixes PR45443
Currently we have no dedicated warnings, but we return error message instead of a result.
It is generally not consistent with another warnings we have.
This change was suggested and discussed here:
https://reviews.llvm.org/D77216#1954873
This change refines error messages we report and also I had to update the API
to implement it.
Differential revision: https://reviews.llvm.org/D77399
If the stack pointer is altered for local variables and we are generating
Thumb2 execute-only code the .pad directive is missing.
Usually the size of the adjustment is stored in a PC-relative location
and loaded into a register which is then added to the stack pointer.
However when we are generating execute-only code code the size of the
adjustment is instead generated using the MOVW/MOVT instruction pair.
As a by product of handling the execute-only case this also fixes an
existing issue that in the none execute-only case the .pad directive was
generated against the load of the constant to a register instruction,
instead of the instruction which adds the register to the stack pointer.
Differential Revision: https://reviews.llvm.org/D76849
This patch updates the code that deals with conditions from predicate
info to make use of constant ranges.
For ssa_copy instructions inserted by PredicateInfo, we have 2 ranges:
1. The range of the original value.
2. The range imposed by the linked condition.
1. is known, 2. can be determined using makeAllowedICmpRegion. The
intersection of those ranges is the range for the copy.
With this patch, we get a nice increase in the number of instructions
eliminated by both SCCP and IPSCCP for some benchmarks:
For MultiSource, SPEC2000 & SPEC2006:
Tests: 237
Same hash: 170 (filtered out)
Remaining: 67
Metric: sccp.NumInstRemoved
Program base patch diff
test-suite...Source/Benchmarks/sim/sim.test 10.00 71.00 610.0%
test-suite...CFP2000/177.mesa/177.mesa.test 361.00 1626.00 350.4%
test-suite...encode/alacconvert-encode.test 141.00 602.00 327.0%
test-suite...decode/alacconvert-decode.test 141.00 602.00 327.0%
test-suite...CI_Purple/SMG2000/smg2000.test 1639.00 4093.00 149.7%
test-suite...peg2/mpeg2dec/mpeg2decode.test 75.00 163.00 117.3%
test-suite...T2006/401.bzip2/401.bzip2.test 358.00 513.00 43.3%
test-suite...rks/FreeBench/pifft/pifft.test 11.00 15.00 36.4%
test-suite...langs-C/unix-tbl/unix-tbl.test 4.00 5.00 25.0%
test-suite...lications/sqlite3/sqlite3.test 541.00 667.00 23.3%
test-suite.../CINT2000/254.gap/254.gap.test 243.00 299.00 23.0%
test-suite...ks/Prolangs-C/agrep/agrep.test 25.00 29.00 16.0%
test-suite...marks/7zip/7zip-benchmark.test 1135.00 1304.00 14.9%
test-suite...lications/ClamAV/clamscan.test 1105.00 1268.00 14.8%
test-suite...urce/Applications/lua/lua.test 398.00 436.00 9.5%
Metric: sccp.IPNumInstRemoved
Program base patch diff
test-suite...C/CFP2000/179.art/179.art.test 1.00 3.00 200.0%
test-suite...006/447.dealII/447.dealII.test 429.00 1056.00 146.2%
test-suite...nch/fourinarow/fourinarow.test 3.00 7.00 133.3%
test-suite...CI_Purple/SMG2000/smg2000.test 818.00 1748.00 113.7%
test-suite...ks/McCat/04-bisect/bisect.test 3.00 5.00 66.7%
test-suite...CFP2000/177.mesa/177.mesa.test 165.00 255.00 54.5%
test-suite...ediabench/gsm/toast/toast.test 18.00 27.00 50.0%
test-suite...telecomm-gsm/telecomm-gsm.test 18.00 27.00 50.0%
test-suite...ks/Prolangs-C/agrep/agrep.test 24.00 35.00 45.8%
test-suite...TimberWolfMC/timberwolfmc.test 43.00 62.00 44.2%
test-suite...encode/alacconvert-encode.test 46.00 66.00 43.5%
test-suite...decode/alacconvert-decode.test 46.00 66.00 43.5%
test-suite...langs-C/unix-tbl/unix-tbl.test 12.00 17.00 41.7%
test-suite...peg2/mpeg2dec/mpeg2decode.test 31.00 41.00 32.3%
test-suite.../CINT2000/254.gap/254.gap.test 117.00 154.00 31.6%
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D76611
From Arm v8 Architecture Reference Manual F5.1.84 LDREXD
The ldrexd instruction in Arm state has the following conditions:
t = UInt(Rt); t2 = t + 1; n = UInt(Rn);
if Rt<0> == '1' || t2 == 15 || n == 15 then UNPREDICTABLE;
In when Rt is odd or if Rt is 14 (making t2 15).
In the implementation when the pair is the UNPREDICTABLE R14_R15 we
would ideally return SOFT_FAIL. We can't because there is no R14_R15
value for us to return so we fail early returning FAIL.
The early return for registers outside the bounds of the table means
the check for Rt == 14 (0xE) redundant which causes a static analyzer
to flag the condition as never being true.
To fix the warning I've removed the check and replaced with a comment
explaining the difference with the specification.
Fixes pr41660
Differential Revision: https://reviews.llvm.org/D77463
Do not commit the llvm/test/ExecutionEngine/MCJIT/cet-code-model-lager.ll because it will
cause build bot fail(not suitable for window 32 target).
Summary:
This patch comes from H.J.'s 2bd54ce7fa
**This patch fix the failed llvm unit tests which running on CET machine. **(e.g. ExecutionEngine/MCJIT/MCJITTests)
The reason we enable IBT at "JIT compiled with CET" is mainly that: the JIT don't know the its caller program is CET enable or not.
If JIT's caller program is non-CET, it is no problem JIT generate CET code or not.
But if JIT's caller program is CET enabled, JIT must generate CET code or it will cause Control protection exceptions.
I have test the patch at llvm-unit-test and llvm-test-suite at CET machine. It passed.
and H.J. also test it at building and running VNCserver(Virtual Network Console), it works too.
(if not apply this patch, VNCserver will crash at CET machine.)
Reviewers: hjl.tools, craig.topper, LuoYuanke, annita.zhang, pengfei
Reviewed By: LuoYuanke
Subscribers: tstellar, efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76900
Summary:
Thanks to Bill Wendling (void) for the report and steps to reproduce. It looks
like this was missed during r350508's cleanup of the CallSite split into
CallBase, CallInst, and CallBrInst.
This was exposed by running pgo on a callbr, which was creating a ptrtoint to
the inline asm thinking it was an indirect call. The relevant callchain looks
like:
IndirectCallPromotionPlugin::run()
-> PGOIndirectCallVisitor::findIndirectCalls()
-> PGOIndirectCallVisitor::visitCallBase()
-> CallBase::isIndirectCall()
Reviewers: void, chandlerc
Reviewed By: void
Subscribers: hiraditya, llvm-commits, craig.topper, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77600
Summary:
In some cases, ASan may insert instrumentation before function arguments
have been stored into their allocas. This causes two issues:
1) The argument value must be spilled until it can be stored into the
reserved alloca, wasting a stack slot.
2) Until the store occurs in a later basic block, the debug location
will point to the wrong frame offset, and backtraces will show an
uninitialized value.
The proposed solution is to move instructions which initialize allocas
for arguments up into the entry block, before the position where ASan
starts inserting its instrumentation.
For the motivating test case, before the patch we see:
```
| 0033: movq %rdi, 0x68(%rbx) | | DW_TAG_formal_parameter |
| ... | | DW_AT_name ("a") |
| 00d1: movq 0x68(%rbx), %rsi | | DW_AT_location (RBX+0x90) |
| 00d5: movq %rsi, 0x90(%rbx) | | ^ not correct ... |
```
and after the patch we see:
```
| 002f: movq %rdi, 0x70(%rbx) | | DW_TAG_formal_parameter |
| | | DW_AT_name ("a") |
| | | DW_AT_location (RBX+0x70) |
```
rdar://61122691
Reviewers: aprantl, eugenis
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77182
Summary:
It can be helpful to test behaviour w.r.t locations without having DEBUG_VALUE
around. In particular, because DEBUG_VALUE has the potential to change CodeGen
behaviour (e.g. hasOneUse() vs hasOneNonDbgUse()) while locations generally
don't.
Reviewers: aprantl, bogner
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77438
A global symbol that is defined in a comdat should not generate an alias since
call sites that would've referred to that symbol will refer to their own
independent local aliases rather than the surviving global comdat one. This
could result in something that looks like:
```
ld.lld: error: relocation refers to a discarded section: .text._ZN3fbl8internal18NullFunctionTargetIvJjjPjEED1Ev.stub
>>> defined in user-x64-clang/obj/system/ulib/minfs/libminfs.a(minfs._sources.file.cc.o)
>>> section group signature: _ZN3fbl8internal18NullFunctionTargetIvJjjPjEED1Ev.stub
>>> prevailing definition is in user-x64-clang/obj/system/ulib/minfs/libminfs.a(minfs._sources.vnode.cc.o)
>>> referenced by function.h:169 (../../zircon/system/ulib/fbl/include/fbl/function.h:169)
>>> minfs._sources.file.cc.o:(minfs::File::AllocateAndCommitData(std::__2::unique_ptr<minfs::Transaction, std::__2::default_delete<minfs::Transaction> >)) in archive user-x64-clang/obj/system/ulib/minfs/libminfs.a
```
We ran into this when experimenting with a new C++ ABI for fuchsia
(refer to D72959) which takes relative offsets between comdat'd functions
which is why the normal C++ user wouldn't run into this.
Differential Revision: https://reviews.llvm.org/D77429
Summary:
A bug report mentioned that LLVM was producing jumps off the end of a
function when using "asm goto with outputs". Further digging pointed to
MachineBasicBlocks that had their address taken and were indirect
targets of INLINEASM_BR being removed by BranchFolder, because their
predecessor list was empty, so they appeared to have no entry.
This was a cascading failure caused earlier, during Pre-RA instruction
scheduling. We have a few special cases in Pre-RA instruction scheduling
where we split a MachineBasicBlock in two. This requires careful
handing of predecessor and successor lists for a MachineBasicBlock that
was split, and careful handing of PHI MachineInstrs that referred to the
MachineBasicBlock before it was split.
The clue that led to this fix was the observation that many callers of
MachineBasicBlock::splice() frequently call
MachineBasicBlock::transferSuccessorsAndUpdatePHIs() to update their PHI
nodes after a splice. We don't want to reuse that method, as we have
custom successor transferring logic for this block split.
This patch fixes 2 pre-existing bugs, and adds tests.
The first bug was that MachineBasicBlock::splice() correctly handles
updating most successors and predecessors; we don't need to do anything
more than removing the previous fallthrough block from the first half of
the split block post splice. Previously, we were updating the successor
list incorrectly (updating successors updates predecessors).
The second bug was that PHI nodes that needed registers from the first
half of the split block were not having entries populated. The register
live out information was correct, and the FuncInfo->PHINodesToUpdate was
correct. Specifically, the check in SelectionDAGISel::FinishBasicBlock:
for (unsigned i = 0, e = FuncInfo->PHINodesToUpdate.size(); i != e; ++i) {
MachineInstrBuilder PHI(*MF, FuncInfo->PHINodesToUpdate[i].first);
if (!FuncInfo->MBB->isSuccessor(PHI->getParent()))
continue;
PHI.addReg(FuncInfo->PHINodesToUpdate[i].second).addMBB(FuncInfo->MBB);
was `continue`ing because FuncInfo->MBB tracks the second half of
the post-split block; no one was updating PHI entries for the first half
of the post-split block.
SelectionDAGBuilder::UpdateSplitBlock() already expects to perform
special handling for MachineBasicBlocks that were split post calls to
ScheduleDAGSDNodes::EmitSchedule(), so I'm confident that it's both
correct for ScheduleDAGSDNodes::EmitSchedule() to return the second half
of the split block `CopyBB` which updates `FuncInfo->MBB` (ie. the
current MachineBasicBlock being processed), and perform special handling
for this in SelectionDAGBuilder::UpdateSplitBlock().
Reviewers: void, craig.topper, efriedma
Reviewed By: void, efriedma
Subscribers: hfinkel, fhahn, MatzeB, efriedma, hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76961
This pass is created in d6de5f12d4 and tested
for new and legacy pass manager but never added to new pass manager pipeline.
I am adding it to new pass manager pipeline.
This pass is get used in Vector Function Database (VFDatabase) and without
this pass in new pass manager pipeline, none of the vector libraries are work
ing with new pass manager.
Related passes:
66c120f025https://reviews.llvm.org/D74944
Differential revision: https://reviews.llvm.org/D75354
The previous code used the type of the first field for the VT
passed to getNode for every field.
I've based the implementation here off what is done in visitSelect
as it removes the need to special case aggregates.
Differential Revision: https://reviews.llvm.org/D77093
So that constant expressions like the following are permitted:
and w0, w0, #~(0xfe<<24)
and w1, w1, #~(0xff<<24)
The behavior matches GNU as (opcodes/aarch64-opc.c:aarch64_logical_immediate_p).
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D75885
The cmyk test is based on the known regression that resulted from:
rGf2fbdf76d8d0
This improves on the equivalent signed min/max change:
rG867f0c3c4d8c
The underlying icmp equivalence is:
~X pred ~Y --> Y pred X
For an icmp with constant, canonicalization results in a swapped pred:
~X < C --> X > ~C
SUMMARY:
For the llvm-objdump -D, the symbol name is used as a label in the disassembly for the specific address (when a symbol address is equal to the virtual address in the dump).
In XCOFF, multiple symbols may have the same name, being differentiated by their storage mapping class. It is helpful to print the QualName and not just the name when forming the output label for a csect symbol. The symbol index further removes any ambiguity caused by duplicate names.
To maintain compatibility with the binutils objdump, the XCOFF-specific --symbol-description option is added to enable the enhanced format.
Reviewers: hubert.reinterpretcast, James Henderson, Jason Liu ,daltenty
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D72973
llvm-readobj prints the file path as part of the output, so
--implicit-check-not patterns can accidentally match it. This patch
makes the test more robust by preventing failures if "foo" is in
somebody's path.
Reviewed by: grimar
Differential Revision: https://reviews.llvm.org/D77546
ExecutionEngine/MCJIT/cet-code-model-lager.ll is failing on 32-bit
windows, see llvm-commits thread for fef2dab.
This reverts commit 43f031d312
and the follow-ups fef2dab100 and
6a800f6f62.
It was failing in 32-bit Windows builds with:
$ ":" "RUN: at line 1"
$ "c:\src\llvm_package_944db8a4\build32_stage0\bin\lli.exe" "-mtriple=i686-pc-windows-msvc-elf" "-code-model=large" "C:\src\llvm_package_944db8a4\llvm-project\llvm\test\ExecutionEngine\MCJIT\cet-code-model-lager.ll"
# command stderr:
Assertion failed: Is64Bit && "Large code model is only legal in 64-bit mode.", file C:\src\llvm_package_944db8a4\llvm-project\llvm\lib\Target\X86\X86ISelLowering.cpp, line 4212
Let's see if this helps.
Summary:
This patch adds support for emission of following DWARFv5 macro forms
in .debug_macro section.
1. DW_MACRO_start_file
2. DW_MACRO_end_file
3. DW_MACRO_define_strp
4. DW_MACRO_undef_strp.
Reviewed By: dblaikie, ikudrin
Differential Revision: https://reviews.llvm.org/D72828
truncateVectorWithPACK has its own vector length controls, so we can rely on those directly. This helps some existing truncation to subvector tests, which were being combined later during shuffle lowering at which point the sign/zero bit detection had become obscured preventing lowerShuffleWithPACK working as well as it could.
Summary:
Modify lea/load/store instructions to accept `disp(index, base)`
style addressing mode (called ASX format). Also, uniform the
number of DAG nodes to have 3 operands for this ASX format
instructions, and update selectADDR functions to lower
appropriate MI.
Reviewers: arsenm, simoll, k-ishizaka
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D76822
This patch adds a -matrix-default-layout option which can be used to
set the default matrix layout to row-major or column-major (default).
The initial patch updates codegen for loads, stores, binary operators
and matrix multiply.
Reviewers: anemet, Gerolf, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D76325
This patch adds initial fusion for load/multiply/store chains of matrix
operations.
The patch contains roughly two parts:
1. Code generation for a fused load/multiply/store chain (LowerMatrixMultiplyFused).
First, we ensure that both loads of the multiply operands do not alias the store.
If they do, we create new non-aliasing copies of the operands. Note that this
may introduce new basic block. Finally we process TileSize x TileSize blocks.
That is: load tiles from the input operands, multiply and store them.
2. Identify fusion candidates & matrix instructions.
As a first step, collect all instructions with shape info and fusion candidates
(currently @llvm.matrix.multiply calls). Next, try to fuse candidates and
collect instructions eliminated by fusion. Finally iterate over all matrix
instructions, skip the ones eliminated by fusion and lower the rest as usual.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D75566
llvm-dwp did not check section identifiers read from input files.
In the case of an unexpected identifier, the calculated index for
Contributions[] pointed outside the array. This fix avoids the issue
by skipping unsupported identifiers.
Differential Revision: https://reviews.llvm.org/D76543
In package files, the base offset provided by index sections should be
used to find the contribution of a unit. The patch adds that base
offset when reading range list tables.
Differential revision: https://reviews.llvm.org/D77401
This fixes the reading of location lists headers for compilation units
in package files by adjusting the reading offset according to the
corresponding record in the unit index. This is required for
DW_FORM_loclistx to work.
Differential revision: https://reviews.llvm.org/D77146
Without the patch, all version 5 compile units in a DWP file read
location tables from the beginning of a .debug_loclists.dwo section.
The patch fixes that by adjusting the reading offset the same way as
for pre-v5 units. The section identifier to find the contribution
entry corresponds to the version of the unit.
Differential revision: https://reviews.llvm.org/D77145
DWARFv5 defines index sections in package files in a slightly different
way than the pre-standard GNU proposal, see Section 7.3.5 in the DWARF
standard and https://gcc.gnu.org/wiki/DebugFissionDWP for GNU proposal.
The main concern here is values for section identifiers, which are
partially overlapped with changed meanings. The patch adds support for
v5 index sections and resolves that difficulty by defining a set of
identifiers for internal use which can represent and distinct values
of both standards.
Differential Revision: https://reviews.llvm.org/D75929
The new and old pass managers (PassManagerBuilder.cpp and
PassBuilder.cpp) are exposed to an `extern` declaration of
`attributor-disable` option which will guard the addition of the
attributor passes to the pass pipelines.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D76871
Add a new overload of StaticLibraryDefinitionGenerator::Load that takes a triple
argument and supports loading archives from MachO universal binaries in addition
to regular archives.
The LLI tool is updated to use this overload.
We had previously limited the shuffle(HORIZOP,HORIZOP) combine to binary shuffles, but we can often merge unary shuffles just as well, folding in UNDEF/ZERO values into the 64-bit half lanes.
For the (P)HADD/HSUB cases this is limited to fast-horizontal cases but PACKSS/PACKUS combines under all cases.
This patch builds upon D76140 by updating metadata on pointer typed
loads in inlined functions, when the load is the return value, and the
callsite contains return attributes which can be updated as metadata on
the load.
Added test cases show this for nonnull, dereferenceable,
dereferenceable_or_null
Reviewed-By: jdoerfert
Differential Revision: https://reviews.llvm.org/D76792
The newly-created constant zero will need an extra register to hold it
in the current statepoint lowering implementation. Remove it if there
exists one.
Summary:
This patch upstreams support the optional ARMv8.0 Data Gathering Hint (DGH)
extension, which adds the Data Gathering Hint instruction to the hint
space.
See ARMv8.0-DGH in the Arm Architecture Reference Manual Armv8 for more
information.
Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, danielkiss, samparker
Reviewed By: SjoerdMeijer
Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77097
Summary:
This patch upstreams support for the ARMv8.6A Enhanced Counter Virtualization
(ECV) extension, which adds 6 new system registers.
See ARMv8.6-ECV in the Arm Architecture Reference Manual Armv8 for more
information.
Reviewers: t.p.northover, rengolin, SjoerdMeijer, pcc, ab, chill
Reviewed By: SjoerdMeijer
Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77094
As discussed in D76983, that patch can turn a chain of insert/extract
with scalar trunc ops into bitcast+extract and existing instcombine
vector transforms end up creating a shuffle out of that (see the
PhaseOrdering test for an example). Currently, that process requires
at least this sequence: -instcombine -early-cse -instcombine.
Before D76983, the sequence of insert/extract would reach the SLP
vectorizer and become a vector trunc there.
Based on a small sampling of public targets/types, converting the
shuffle to a trunc is better for codegen in most cases (and a
regression of that form is the reason this was noticed). The trunc is
clearly better for IR-level analysis as well.
This means that we can induce "spontaneous vectorization" without
invoking any explicit vectorizer passes (at least a vector cast op
may be created out of scalar casts), but that seems to be the right
choice given that we started with a chain of insert/extract, and the
backend would expand back to that chain if a target does not support
the op.
Differential Revision: https://reviews.llvm.org/D77299
Summary:
This patch upstreams support for the ARMv8.6A Fine Grain Traps (FGT)
extension, which adds 5 new system registers.
See ARMv8.6-FGT in the Arm Architecture Reference Manual Armv8 for more
information.
Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, momchil.velikov
Reviewed By: SjoerdMeijer
Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76991
The cmyk tests are based on the known regression that resulted from:
rGf2fbdf76d8d0
So this improvement in analysis might be enough to restore that commit.
Summary:
This patch upstreams v8.6A activity monitors virtualization
assembler support, which consists of 32 new system
registers (two groups, each with 16 numbered registers).
See ARMv8.6-AMU in the Arm Architecture Reference Manual Armv8 for more
information.
Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, john.brawn, ostannard
Reviewed By: ostannard
Subscribers: LukeGeeson, dnsampaio, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76998
Our existing combine allows to merge the shuffle of 2 similar 64-bit wide 'horizontal ops' (HADD/PACK/etc.) if the shuffle was a UNPCK/MOVSD.
This patch generalizes this to decode any target shuffle mask that can be widened to a 128-bit repeating v2*64 mask, which helps us catch PBLENDW/PBLENDD cases.
If we're packing from 128-bits to 64-bits then we don't need the RHS argument. This helps with register allocation, especially as we avoid repeating a use of the input value.
Summary:
This patch is to teach `llvm-objdump` dump dynamic symbols (`-T` and `--dynamic-syms`). Currently, this patch is not fully compatible with `gnu-objdump`, but I would like to continue working on this in next few patches. It has two issues.
1. Some symbols shouldn't be marked as global(g). (`-t/--syms` has same issue as well) (Fixed by D75659)
2. `gnu-objdump` can dump version information and *dynamically* insert before symbol name field.
`objdump -T a.out` gives:
```
DYNAMIC SYMBOL TABLE:
0000000000000000 w D *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 printf
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __libc_start_main
0000000000000000 w D *UND* 0000000000000000 __gmon_start__
0000000000000000 w D *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 w DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_finalize
```
`llvm-objdump -T a.out` gives:
```
DYNAMIC SYMBOL TABLE:
0000000000000000 w D *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 g DF *UND* 0000000000000000 printf
0000000000000000 g DF *UND* 0000000000000000 __libc_start_main
0000000000000000 w D *UND* 0000000000000000 __gmon_start__
0000000000000000 w D *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 w DF *UND* 0000000000000000 __cxa_finalize
```
Reviewers: jhenderson, grimar, MaskRay, espindola
Reviewed By: jhenderson, grimar
Subscribers: emaste, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75756
`isKnownReachable` had only interface (always returns true).
Changed it to call `isPotentiallyReachable`.
This change enables deductions of other Abstract Attributes depending on
AAReachability to use reachability information obtained from CFG, and it
can make them stronger.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D76210
This commit was made to settle [[ https://github.com/llvm/llvm-project/issues/175 | this issue on GitHub ]].
I added analysis getters for LoopInfo, DominatorTree, and
PostDominatorTree. And I added a test to show an improvement of the
deduction of `dereferenceable` attribute.
Reviewed By: jdoerfert, uenoku
Differential Revision: https://reviews.llvm.org/D76378
Query AAValueSimplify on pointers in memory accessing instructions to take
advantage of the constant propagation (or any other value simplification) of such values.
Summary:
When we insert a call to the personality function wrapper
(`_Unwind_CallPersonality`) for a catch pad, we store some necessary
info in `__wasm_lpad_context` struct and pass it. One of the info is the
LSDA address for the function. For this, we insert a call to
`wasm.lsda()`, which will be lowered down to the address of LSDA, and
store it in a field in `__wasm_lpad_context`.
There are exceptions to this personality call insertion: catchpads for
`catch (...)` and cleanuppads (for destructors) don't need personality
function calls, because we don't need to figure out whether the current
exception should be caught or not. (They always should.)
There was a little optimization to `wasm.lsda()` call insertion. Because
the LSDA address is the same throughout a function, we don't need to
insert a store of `wasm.lsda()` return value in every catchpad. For
example:
```
try {
foo();
} catch (int) {
// wasm.lsda() call and a store are inserted here, like, in
// pseudocode,
// %lsda = wasm.lsda();
// store %lsda to a field in __wasm_lpad_context
try {
foo();
} catch (int) {
// We don't need to insert the wasm.lsda() and store again, because
// to arrive here, we have already stored the LSDA address to
// __wasm_lpad_context in the outer catch.
}
}
```
So the previous algorithm checked if the current catch has a parent EH
pad, we didn't insert a call to `wasm.lsda()` and its store.
But this was incorrect, because what if the outer catch is `catch (...)`
or a cleanuppad?
```
try {
foo();
} catch (...) {
// wasm.lsda() call and a store are NOT inserted here
try {
foo();
} catch (int) {
// We need wasm.lsda() here!
}
}
```
In this case we need to insert `wasm.lsda()` in the inner catchpad,
because the outer catchpad does not have one.
To minimize the number of inserted `wasm.lsda()` calls and stores, we
need a way to figure out whether we have encountered `wasm.lsda()` call
in any of EH pads that dominates the current EH pad. To figure that
out, we now visit EH pads in BFS order in the dominator tree so that we
visit parent BBs first before visiting its child BBs in the domtree.
We keep a set named `ExecutedLSDA`, which basically means "Do we have
`wasm.lsda()` either in the current EH pad or any of its parent EH
pads in the dominator tree?". This is to prevent scanning the domtree up
to the root in the worst case every time we examine an EH pad: each EH
pad only needs to examine its immediate parent EH pad.
- If any of its parent EH pads in the domtree has `wasm.lsda()`, this
means we don't need `wasm.lsda()` in the current EH pad. We also insert
the current EH pad in `ExecutedLSDA` set.
- If none of its parent EH pad has `wasm.lsda()`
- If the current EH pad is a `catch (...)` or a cleanuppad, done.
- If the current EH pad is neither a `catch (...)` nor a cleanuppad,
add `wasm.lsda()` and the store in the current EH pad, and add the
current EH pad to `ExecutedLSDA` set.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77423
Similar to the lowerV16I8Shuffle implementation, for binary compaction v8i16 shuffles we can avoid the PUNPCKLDQ(PSHUFB,PSHUFB) pattern on SSE41+ targets by using PACKUSDW and PBLENDW. Before SSE41 we would need to use PACKSSDW but that requires sign extension that seems to destroy any gains, even on targets without PSHUFB.
This is a bigger gain on AMD than Intel targets but should never be a regression, and avoiding the shuffle mask load(s) is always useful.
Noticed in codegen while dealing with PR31443.
Summary:
Note: This revision is very similar to D62296.
In D75756, we need `getDynamicSymbolIterators()` to skip first NULL symbol in `.dynsym`. And I believe it might be worth pointing this out in a separate patch to gather you experts' opinions.
I have checked that current code base will not be affected by this change.
```
dynamic_symbol_begin()
|- dynamic_symbol_end(): Ok
`- getDynamicSymbolIterators()
|- addDynamicElfSymbols(): llvm/tools/llvm-objdump/llvm-objdump.cpp, Line 934
| Ok, NULL symbol will be omitted by Line 945-947
| StringRef Name = unwrapOrError(Symbol.getName(), Obj->getName());
| if (Name.empty()) continue;
|- dumpSymbolNameFromObject(): llvm/tools/llvm-nm/llvm-nm.cpp, Line 1192
| There's no test for dumping dynamic debugging symbol. This patch helps improve llvm-nm behavior. (we should add test for this later)
`- computeSymbolSizes(): llvm/lib/Object/SymbolSize.cpp, Line 52
|- OProfileJITEventListener::notifyObjectLoaded(): llvm/lib/ExecutionEngine/OProfileJIT/OProfileJITEventListener.cpp, Line 92
| Ok, NULL symbol will be omitted by Line 94-95
| if (!Sym.getType() || *Sym.getType() != SF_Function) continue;
|- IntelJITEventListener::notifyObjectLoaded(): llvm/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp, Line 98
| Ok, NULL symbol will be omitted by Line 124-126 (same as previous one)
|- PerfJITEventListener::notifyObjectLoaded(): llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp, Line 244
| Ok, NULL symbol will be omitted by Line 254-256, (same as previous one)
|- SymbolizableObjectFile::create(): llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp, Line 73
| Ok, NULL symbol will be omitted by Line 75
| res->addSymbol()
| In addSymbol(), Line 167-168
| if (!Sec || (Obj && Obj->section_end() == *Sec)) return std::error_code();
|- dumpCXXData(): llvm/tools/llvm-cxxdump/llvm-cxxdump.cpp, Line 189
| Ok, NULL symbol will be omitted by Line 199-202
| object::section_iterator SecI = *SecIOrErr;
| // Skip external symbols.
| if (SecI == Obj->section_end())
| continue;
`- printLineInfoForInput(): llvm/tools/llvm-rtdyld/llvm-rtdyld.cpp, Line 418
Ok, NULL symbol will be omitted by Line 430-477
if (Type == object::SymbolRef::ST_Function) {
...
}
```
Reviewers: grimar, jhenderson, MaskRay
Reviewed By: jhenderson, MaskRay
Subscribers: rupprecht, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76081
After finding all such gadgets in a given function, the pass minimally inserts
LFENCE instructions in such a manner that the following property is satisfied:
for all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at
least one LFENCE instruction. The algorithm that implements this minimal
insertion is influenced by an academic paper that minimally inserts memory
fences for high-performance concurrent programs:
http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf
The algorithm implemented in this pass is as follows:
1. Build a condensed CFG (i.e., a GadgetGraph) consisting only of the following components:
-SOURCE instructions (also includes function arguments)
-SINK instructions
-Basic block entry points
-Basic block terminators
-LFENCE instructions
2. Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e., gadgets) are already mitigated by existing LFENCEs. If all gadgets have been mitigated, go to step 6.
3. Use a heuristic or plugin to approximate minimal LFENCE insertion.
4. Insert one LFENCE along each CFG edge that was cut in step 3.
5. Go to step 2.
6. If any LFENCEs were inserted, return true from runOnFunction() to tell LLVM that the function was modified.
By default, the heuristic used in Step 3 is a greedy heuristic that avoids
inserting LFENCEs into loops unless absolutely necessary. There is also a
CLI option to load a plugin that can provide even better optimization,
inserting fewer fences, while still mitigating all of the LVI gadgets.
The plugin can be found here: https://github.com/intel/lvi-llvm-optimization-plugin,
and a description of the pass's behavior with the plugin can be found here:
https://software.intel.com/security-software-guidance/insights/optimized-mitigation-approach-load-value-injection.
Differential Revision: https://reviews.llvm.org/D75937
Adds a new data structure, ImmutableGraph, and uses RDF to find LVI gadgets and add them to a MachineGadgetGraph.
More specifically, a new X86 machine pass finds Load Value Injection (LVI) gadgets consisting of a load from memory (i.e., SOURCE), and any operation that may transmit the value loaded from memory over a covert channel, or use the value loaded from memory to determine a branch/call target (i.e., SINK).
Also adds a new target feature to X86: +lvi-load-hardening
The feature can be added via the clang CLI using -mlvi-hardening.
Differential Revision: https://reviews.llvm.org/D75936
Adding a pass that replaces every ret instruction with the sequence:
pop <scratch-reg>
lfence
jmp *<scratch-reg>
where <scratch-reg> is some available scratch register, according to the
calling convention of the function being mitigated.
Differential Revision: https://reviews.llvm.org/D75935
We can fix register class of PHI based on its all AGPR uses.
That leaves behind all PHIs which were already processed
earlier. Propagate RC back to PHI operands of a PHI.
Differential Revision: https://reviews.llvm.org/D77344
Extracting to the same index that we are going to insert back into
allows forming select ("blend") shuffles and enables further transforms.
Admittedly, this is a quick-fix for a more general problem that I'm
hoping to solve by adding transforms for patterns that start with an
insertelement.
But this might resolve some regressions known to be caused by the
extract-extract transform (although I have not gotten more details on
those yet).
In the motivating case from PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724
The combination of subsequent instcombine and codegen transforms gets us this improvement:
vmovshdup %xmm0, %xmm2 ## xmm2 = xmm0[1,1,3,3]
vhaddps %xmm1, %xmm1, %xmm4
vmovshdup %xmm1, %xmm3 ## xmm3 = xmm1[1,1,3,3]
vaddps %xmm0, %xmm2, %xmm0
vaddps %xmm1, %xmm3, %xmm1
vshufps $200, %xmm4, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm4[0,3]
vinsertps $177, %xmm1, %xmm0, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm1[2]
-->
vmovshdup %xmm0, %xmm2 ## xmm2 = xmm0[1,1,3,3]
vhaddps %xmm1, %xmm1, %xmm1
vaddps %xmm0, %xmm2, %xmm0
vshufps $200, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm1[0,3]
Differential Revision: https://reviews.llvm.org/D76623
Extend lowerShuffleWithPACK/matchShuffleWithPACK/createPackShuffleMask to handle compaction style shuffle masks that can be lowered to chains of PACKSS/PACKUS if their inputs are suitably sign/zero extended.
This helps avoid PSHUFB (and its mask load) for short shuffle chains, shuffle combining will still replace with a PSHUFB if we have enough shuffles as getFauxShuffleMask should recognise the PACKSS/PACKUS chains.
As discussed in post-commit review in https://reviews.llvm.org/D73501
if the goal of this is to help vectorizer, then we should actually
be teaching vectorizer to do this, because right now this rewrite
is still budget-limited, which isn't what we'd want.
Additionally, while the rest of the patch series was universally profitable,
this particular patch is reportedly (https://reviews.llvm.org/D73501#1905171)
exposing cost-modeling issues on ARM.
So let's just back this particular patch out. Once there's an undo transform,
this could be considered for reintegration.
This reverts commit 44edc6fd2c.
We got some of the potential optimizations with D76727 and D76844.
There are 2 likely enhancements that we could add to -vector-combine
to get most of the remaining cases:
1. Allow bitcasted shuffle mask narrowing (widen the elements).
2. Combine shuffle-of-shuffle into a single shuffle.
This is already partly handled by the x86 backend, but the
tests here show that we still miss some of the potential
combines.
Currently when the target is big-endian vmov.i64 reverses the order of the two
words of the vector. This is correct only when the underlying element type is
32-bit, as actually what it should be doing is considering it a vector of the
underlying type and reversing the elements of that.
Differential Revision: https://reviews.llvm.org/D76515
If we have an element-wise vmov immediate instruction then a subsequent vrev
with width greater or equal to the vmov element width, then that vrev won't do
anything. Add a DAG combine to convert bitcasts that would become such vrevs
into vector_reg_casts instead.
Differential Revision: https://reviews.llvm.org/D76514
1. It's redundant to explicitly check that string does not contain `PAUSE_MM`
token if check that the `PAUSE` token is followed by an angle bracket.
2. The removed `CHECK-NOT` directives do not have trailing colons.
This pass replaces each indirect call/jump with a direct call to a thunk that looks like:
lfence
jmpq *%r11
This ensures that if the value in register %r11 was loaded from memory, then
the value in %r11 is (architecturally) correct prior to the jump.
Also adds a new target feature to X86: +lvi-cfi
("cfi" meaning control-flow integrity)
The feature can be added via clang CLI using -mlvi-cfi.
This is an alternate implementation to https://reviews.llvm.org/D75934 That merges the thunk insertion functionality with the existing X86 retpoline code.
Differential Revision: https://reviews.llvm.org/D76812
Summary:
This patch adds parsing and dumping DWARFv5 .debug_macro section in llvm-dwarfdump,
it does not introduce any new switch. Existing switch "--debug-macro"
should be used to dump macinfo or macro section.
Reviewed By: dblaikie, ikudrin, jhenderson
Differential Revision: https://reviews.llvm.org/D73086
Summary:
Add mapping from exp2 math functions
to corresponding SVML calls.
This is a follow up and extension for llvm diff
https://reviews.llvm.org/D19544
Test Plan:
- update test case and run ninja check.
- run tests locally
Reviewers: wenlei, hoyFB, mmasten, mzolotukhin, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77114
Summary:
A recent change in the instruction simplifier enables a call to a function that just returns one of its parameter to be simplified as simply loading the parameter. This exposes a bug in the inliner where double inlining may be involved which in turn may cause compiler ICE when an already-inlined callsite is reused for further inlining.
To put it simply, in the following-like C program, when the function call second(t) is inlined, its code t = third(t) will be reduced to just loading the return value of the callsite first(). This causes the inliner internal data structure to register the first() callsite for the call edge representing the third() call, therefore incurs a double inlining when both call edges are considered an inline candidate. I'm making a fix to break the inliner from reusing a callsite for new call edges.
```
void top()
{
int t = first();
second(t);
}
void second(int t)
{
t = third(t);
fourth(t);
}
void third(int t)
{
return t;
}
```
The actual failing case is much trickier than the example here and is only reproducible with the legacy inliner. The way the legacy inliner works is to process each SCC in a bottom-up order. That means in reality function first may be already inlined into top, or function third is either inlined to second or is folded into nothing. To repro the failure seen from building a large application, we need to figure out a way to confuse the inliner so that the bottom-up inlining is not fulfilled. I'm doing this by making the second call indirect so that the alias analyzer fails to figure out the right call graph edge from top to second and top can be processed before second during the bottom-up. We also need to tweak the test code so that when the inlining of top happens, the function body of second is not that optimized, by delaying the pass of function attribute deducer (i.e, which tells function third has no side effect and just returns its parameter). Since the CGSCC pass is iterative, additional calls are added to top to postpone the inlining of second to the second round right after the first function attribute deducing pass is done. I haven't been able to repro the failure with the new pass manager since the processing order of ininlined callsites is a bit different, but in theory the issue could happen there too.
Note that this fix could introduce a side effect that blocks the simplification of inlined code, specifically for a call site that can be folded to another call site. I hope this can probably be complemented by subsequent inlining or folding, as shown in the attached unit test. The ideal fix should be to separate the use of VMap. However, in reality this failing pattern shouldn't happen often. And even if it happens, there should be a good chance that the non-folded call site will be refolded by iterative inlining or subsequent simplification.
Reviewers: wenlei, davidxl, tejohnson
Reviewed By: wenlei, davidxl
Subscribers: eraman, nikic, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76248
Summary:
This patch comes from H.J.'s 2bd54ce7fa
**This patch fix the failed llvm unit tests which running on CET machine. **(e.g. ExecutionEngine/MCJIT/MCJITTests)
The reason we enable IBT at "JIT compiled with CET" is mainly that: the JIT don't know the its caller program is CET enable or not.
If JIT's caller program is non-CET, it is no problem JIT generate CET code or not.
But if JIT's caller program is CET enabled, JIT must generate CET code or it will cause Control protection exceptions.
I have test the patch at llvm-unit-test and llvm-test-suite at CET machine. It passed.
and H.J. also test it at building and running VNCserver(Virtual Network Console), it works too.
(if not apply this patch, VNCserver will crash at CET machine.)
Reviewers: hjl.tools, craig.topper, LuoYuanke, annita.zhang, pengfei
Subscribers: tstellar, efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76900
This was causing a machine verifier failure on the test suite.
Make sure that we don't end up with a weird register class here.
Failure for reference:
*** Bad machine code: Illegal virtual register for instruction ***
- function: check_constrain
- basic block: %bb.1 (0x7f8b70839f80)
- instruction: early-clobber %6:gpr64, early-clobber %7:gpr64sp =
JumpTableDest32 %5:gpr64, %1:gpr64sp, %jump-table.0
- operand 3: %1:gpr64sp
Expected a GPR64 register, but got a GPR64sp register
Differential Revision: https://reviews.llvm.org/D77349
MI peephole will remove unnecessary FRSP instructions. This patch
removes such unnecessary XSRSP.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D77208
Summary:
This fixes a few issues related to SMRD offsets. On gfx9 and gfx10 we have a
signed byte offset immediate, however we can overflow into a negative since we
treat it as unsigned.
Also, the SMRD SOFFSET sgpr is an unsigned offset on all subtargets. We
sometimes tried to use negative values here.
Third, S_BUFFER instructions should never use a signed offset immediate.
Differential Revision: https://reviews.llvm.org/D77082
This will likely introduce catastrophic performance regressions on
older subtargets, but should be correct. A follow up change will
remove the old fp32-denormals subtarget features, and switch to using
the new denormal-fp-math/denormal-fp-math-f32 attributes. Frontends
should be making sure to add the denormal-fp-math-f32 attribute when
appropriate to avoid performance regressions.
The problem on Windows was that the \b in "..\bin" was interpreted
as an escape sequence. Use r"" strings to prevent that.
This reverts commit ab11b9eefa,
with raw strings in the lit.site.cfg.py.in files.
Differential Revision: https://reviews.llvm.org/D77184
Consider a callee function that has a call (C) within it which feeds
into the return. When we inline that callee into a callsite that has
return attributes, we can backward propagate valid attributes to the
call (C) within that inlined callee body.
This is safe to do so only if we can guarantee transfer of execution to
successor in the window of instructions between return value (i.e. the
call C) and the return instruction.
Also, this is valid only for attributes which are a property of a
callsite and not those that are not dependent on the ABI, or a property
of the call itself.
Reviewed-By: reames, jdoerfert
Differential Revision: https://reviews.llvm.org/D76140
This is a workaround for clang adding noinline to all functions at
-O0. Previously, we would just add alwaysinline, and the verifier
would complain about having both noinline and alwaysinline. We
currently can't truly codegen this case as a freestanding function, so
override the user forcing noinline.
Currently, all generated lit.site.cfg files contain absolute paths.
This makes it impossible to build on one machine, and then transfer the
build output to another machine for test execution. Being able to do
this is useful for several use cases:
1. When running tests on an ARM machine, it would be possible to build
on a fast x86 machine and then copy build artifacts over after building.
2. It allows running several test suites (clang, llvm, lld) on 3
different machines, reducing test time from sum(each test suite time) to
max(each test suite time).
This patch makes it possible to pass a list of variables that should be
relative in the generated lit.site.cfg.py file to
configure_lit_site_cfg(). The lit.site.cfg.py.in file needs to call
`path()` on these variables, so that the paths are converted to absolute
form at lit start time.
The testers would have to have an LLVM checkout at the same revision,
and the build dir would have to be at the same relative path as on the
builder.
This does not yet cover how to figure out which files to copy from the
builder machine to the tester machines. (One idea is to look at the
`--graphviz=test.dot` output and copy all inputs of the `check-llvm`
target.)
Differential Revision: https://reviews.llvm.org/D77184
bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'
We do not attempt this in InstCombine because we do not want to change
types and create new shuffle ops that are potentially not lowered as
well as the original code. Here, we can check the cost model to see if
it is worthwhile.
I've aggressively enabled this transform even if the types are the same
size and/or equal cost because moving the bitcast allows InstCombine to
make further simplifications.
In the motivating cases from PR35454:
https://bugs.llvm.org/show_bug.cgi?id=35454
...this is enough to let instcombine and the backend eliminate the
redundant shuffles, but we probably want to extend VectorCombine to
handle the inverse pattern (shuffle-of-bitcast) to get that
simplification directly in IR.
Differential Revision: https://reviews.llvm.org/D76727
SILoadStoreOptimizer::checkAndPrepareMerge() expects base and
paired instruction to come in order and scans MBB from base to
the paired instruction. An original order can be changed if
there were a dependent instruction in between and base instruction
was moved.
Fixed by bailing the optimization. In theory it might be possible
still to perform a merge by swapping instructions, but on practice
it bails anyway because it finds dependency on that same instruction
which has resulted in the base move.
Differential Revision: https://reviews.llvm.org/D77245
These are versions of a function that regressed with:
rGf2fbdf76d8d0
That particular problem occurs with an instcombine-simplifycfg-instcombine
sequence, but we can show that it exists within instcombine only with
other variations of the pattern.
This reverts commit f2fbdf76d8.
As noted in the post-commit thread:
https://reviews.llvm.org/rGf2fbdf76d8d0
...this can obscure a min/max pattern where the components
have extra uses. We can show that the problem is independent
of this change with a slightly modified source example, so
this revert just delays/reduces the need to fix the real
problem.
We need to improve our analysis of negation or -- more
generally -- subtraction using patches like D77230 or D68408.
Summary:
Splitting Knowledge retention into Queries in Analysis and Builder into Transform/Utils
allows Queries and Transform/Utils to use Analysis.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77171
This patch adds
- New arguments to getMinPrefetchStride() to let the target decide on a
per-loop basis if software prefetching should be done even with a stride
within the limit of the hw prefetcher.
- New TTI hook enableWritePrefetching() to let a target do write prefetching
by default (defaults to false).
- In LoopDataPrefetch:
- A search through the whole loop to gather information before emitting any
prefetches. This way the target can get information via new arguments to
getMinPrefetchStride() and emit prefetches more selectively. Collected
information includes: Does the loop have a call, how many memory
accesses, how many of them are strided, how many prefetches will cover
them. This is NFC to before as long as the target does not change its
definition of getMinPrefetchStride().
- If a previous access to the same exact address was 'read', and the
current one is 'write', make it a 'write' prefetch.
- If two accesses that are covered by the same prefetch do not dominate
each other, put the prefetch in a block that dominates both of them.
- If a ConstantMaxTripCount is less than ItersAhead, then skip the loop.
- A SystemZ implementation of getMinPrefetchStride().
Review: Ulrich Weigand, Michael Kruse
Differential Revision: https://reviews.llvm.org/D70228
Add an option to llvm-dwarfdump to calculate the bytes within
the debug sections. Dump this numbers when using --statistics
option as well.
This is an initial patch (e.g. we should support other units,
since we only support 'bytes' now).
Differential Revision: https://reviews.llvm.org/D74205
This adds MVE vmull patterns, which are conceptually the same as
mul(vmovl, vmovl), and so the tablegen patterns follow the same
structure.
For i8 and i16 this is simple enough, but in the i32 version the
multiply (in 64bits) is illegal, meaning we need to catch the pattern
earlier in a dag fold. Because bitcasts are involved in the zext
versions and the patterns are a little different in little and big
endian. I have only added little endian support in this patch.
Differential Revision: https://reviews.llvm.org/D76740
The unpredictable/hasSideEffects flag is usually inferred by tablegen
from whether the instruction has a tablegen pattern (and that pattern
only has a single output instruction). Now that the MVE intrinsics are
all committed and producing code, the remaining instructions still
marked as unpredictable need to be specially handled. This adds the flag
directly to instructions that need it, notably the V*MLAL instructions
and some of the MOV's.
Differential Revision: https://reviews.llvm.org/D76910
Summary:
This allows doing `memcmp(p, q, 7)` with 2 loads instead of a call to
memcmp.
This fixes part of PR45147.
Reviewers: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76133
As pointed out by @thakis, currently CallSiteSplitting bails out after
checking the first PHI node. We should check all PHI nodes, until we
find one where call site splitting is beneficial.
This patch also slightly simplifies the code using BasicBlock::phis().
Reviewers: davidxl, junbuml, thakis
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D77089
There was a TODO in genericValueTraversal to provide the context
instruction and due to the lack of it users that wanted one just used
something available. Unfortunately, using a fixed instruction is wrong
in the presence of PHIs so we need to update the context instruction
properly.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D76870