Commit Graph

70581 Commits

Author SHA1 Message Date
Simon Pilgrim 7e62684251 [AMDGPU] Regenerate vector-extract-insert test checks to fix issue reported on D77354 2020-04-08 13:18:32 +01:00
Simon Pilgrim 98181a1f98 [AMDGPU] Regenerate si-annotate-cfg-loop-assert test checks to fix issue reported on D77354 2020-04-08 13:09:08 +01:00
Simon Pilgrim 66c18c729d [X86][SSE] Combine PTEST(AND(X,Y),AND(X,Y)) -> PTEST(X,Y) and ANDN equivalents
Tests derived from PR42035 examples
2020-04-08 12:42:22 +01:00
Peter Smith 02cd80e68e [ELF][AArch64] Add R_AARCH64_PLT32 relocation type.
The R_AARCH64_PLT32 relocation type will be documented in the next release
of ELF for the 64-bit Arm Architecture. It is being added in draft state
for the benefit of the position independent vtable feature.

R_AARCH64_PLT32 is very similar to R_AARCH64_PREL32. The intention is to
provide a signed 32-bit integer representing an offset from the place
to a function.
- It relocates 32-bit data
- The expression is S + A - P
- The overflow check for the expression is -2^31 <= X < 2^31
- The relocation generates Thunks/Veneers/Stubs and PLT entries as per
  R_AArch64_CALL26
- If the symbol S is an undefined weak the ABI does not define its value.

The ABI defines a code for ilp32 for completeness, I have added the code
but have only added to the existing reloc-types-elf-aarch64.text as there
is no ilp32 equivalent.

Differential Revision: https://reviews.llvm.org/D77647
2020-04-08 12:19:35 +01:00
Shengchen Kan 916044d819 [X86][MC] Support enhanced relaxation for branch align
Summary:
Since D75300 has been landed, I want to support enhanced relaxation when we need to align branches and allow prefix padding. "Enhanced Relaxtion" means we allow an instruction that could not be traditionally relaxed to be emitted into RelaxableFragment so that we increase its length by adding prefixes for optimization.

The motivation is straightforward, RelaxFragment is mostly for relative jumps and we can not increase the length of jumps when we need to align them, so if we need to achieve D75300's purpose (reducing the bytes of nops) when need to align jumps, we have to make more instructions "relaxable".

Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: reames

Subscribers: hiraditya, llvm-commits, annita.zhang

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76286
2020-04-08 19:08:19 +08:00
Mikael Holmen 893df2032d [IfConversion] Disallow TrueBB == FalseBB for valid diamonds
Summary:
This fixes PR45302.

Previously the case

     BB1
     / \
    |   |
   TBB FBB
    |   |
     \ /
     BB2

was treated as a valid diamond also when TBB and FBB was the same basic
block. This then lead to a failed assertion in IfConvertDiamond.

Since TBB == FBB is quite a degenerated case of a diamond, we now
don't treat it as a valid diamond anymore, and thus we will avoid the
trouble of making IfConvertDiamond handle it correctly.

Reviewers: efriedma, kparzysz

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D77651
2020-04-08 12:50:36 +02:00
Anna Welker 89e1248d7b [ARM][MVE] Optimise offset addresses of gathers/scatters
This patch adds an analysis of the offset addresses used by gathers
and scatters to the MVEGatherScatterLowering pass to find
multiplications and additions that are loop invariant and thus can
be moved into the loop preheader, avoiding to execute them each time.

Differential Revision: https://reviews.llvm.org/D76681
2020-04-08 11:46:57 +01:00
Max Kazantsev 7adb9e06fd [LoopLoadElim] Add test showing that LoopLoadElim doesn't work correctly with new PM 2020-04-08 17:32:03 +07:00
Dominik Montada c8393240ab [GlobalISel] combine trunc(trunc) pattern
Summary:
Legalization can introduce the trunc(trunc) pattern. This can cause
problems if one of these intermediate truncs is not legal.
Combine truncs of this pattern, if the resulting trunc is legal.

Reviewers: arsenm, aemerson, dsanders

Reviewed By: arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76601
2020-04-08 11:58:28 +02:00
James Henderson abd335a339 [llvm-objdump] Fix unstable disassembly output for sections with same address
When two sections shared the same address, the disassembly code was
using pointer values when sorting (see the SectionRef less than
operator). Since those values aren't guaranteed to have a specific
order, this meant the disassembly code would sometimes change which
section to pick when finding symbols targeted by calls in fully linked
objects.

This change fixes the non-determinism, so that the same section is
always picked. This might have a negative impact in that now a section
without any symbol might be picked over a section with symbols, but this
will be addressed in a later commit.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45411.

Reviewed by: grimar, MaskRay

Differential Revision: https://reviews.llvm.org/D77640
2020-04-08 10:57:12 +01:00
Dominik Montada 432720f1c4 [GlobalISel] Combine sext([sz]ext) -> [sz]ext, zext(zext) -> zext
Summary:
Combine sext(zext x) to (zext x) since the sign-bit is 0
after the zero-extension.

Combine sext(sext x) to (sext x) and ext(zext x) to (zext x)
since the intermediate step is not needed.

Reviewers: arsenm, volkan, aemerson, aditya_nandakumar

Reviewed By: arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77210
2020-04-08 11:24:29 +02:00
Dominik Montada 35950fea8d [GlobalISel] support narrow G_IMPLICIT_DEF for DstSize % NarrowSize != 0
Summary:
When narrowing G_IMPLICIT_DEF where the original size is not a multiple
of the narrow size, emit a smaller G_IMPLICIT_DEF and use G_ANYEXT.

To prevent a potential endless loop in the legalizer, the condition
to combine G_ANYEXT(G_IMPLICIT_DEF) is changed from isInstUnsupported
to !isInstLegal, since in this case the combine is only valid if
consequent legalization of the newly combined G_IMPLICIT_DEF does not
introduce G_ANYEXT due to narrowing.

Although this legalization for G_IMPLICIT_DEF would also be valid for
the general case, it actually caused a lot of code regressions when
tried due to superfluous COPYs and combines not getting hit anymore.

Reviewers: dsanders, aemerson, volkan, arsenm, aditya_nandakumar

Reviewed By: arsenm

Subscribers: jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76598
2020-04-08 11:00:07 +02:00
Igor Kudrin af11c556db [DebugInfo] Fix reading DWARFv5 type units in DWP.
In DWARFv5, type units are stored in .debug_info sections, along with
compilation units, and they are distinguished by the unit_type field
in the header, not by the name of the section. It is impossible to
associate the correct index section of a DWP file with the unit before
the unit's header is read. This patch fixes reading DWARFv5 type units
by parsing the header first and then applying the index entry according
to the actual unit type.

Differential Revision: https://reviews.llvm.org/D77552
2020-04-08 12:50:58 +07:00
Stanislav Mekhanoshin f96810ff34 [AMDGPU] Expand vector trunc stores from i16 to i8
Differential Revision: https://reviews.llvm.org/D77693
2020-04-07 21:47:45 -07:00
Daniel Sanders 1adeeabb79 Add MIR-level debugify with only locations support for now
Summary:
Re-used the IR-level debugify for the most part. The MIR-level code then
adds locations to the MachineInstrs afterwards based on the LLVM-IR debug
info.

It's worth mentioning that the resulting locations make little sense as
the range of line numbers used in a Function at the MIR level exceeds that
of the equivelent IR level function. As such, MachineInstrs can appear to
originate from outside the subprogram scope (and from other subprogram
scopes). However, it doesn't seem worth worrying about as the source is
imaginary anyway.

There's a few high level goals this pass works towards:
* We should be able to debugify our .ll/.mir in the lit tests without
  changing the checks and still pass them. I.e. Debug info should not change
  codegen. Combining this with a strip-debug pass should enable this. The
  main issue I ran into without the strip-debug pass was instructions with MMO's and
  checks on both the instruction and the MMO as the debug-location is
  between them. I currently have a simple hack in the MIRPrinter to
  resolve that but the more general solution is a proper strip-debug pass.
* We should be able to test that GlobalISel does not lose debug info. I
  recently found that the legalizer can be unexpectedly lossy in seemingly
  simple cases (e.g. expanding one instr into many). I have a verifier
  (will be posted separately) that can be integrated with passes that use
  the observer interface and will catch location loss (it does not verify
  correctness, just that there's zero lossage). It is a little conservative
  as the line-0 locations that arise from conflicts do not track the
  conflicting locations but it can still catch a fair bit.

Depends on D77439, D77438

Reviewers: aprantl, bogner, vsk

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77446
2020-04-07 16:25:13 -07:00
Fangrui Song d2ef8c1f2c [ThinLTO] Drop dso_local if a GlobalVariable satisfies isDeclarationForLinker()
dso_local leads to direct access even if the definition is not within this compilation unit (it is
still in the same linkage unit). On ELF, such a relocation (e.g. R_X86_64_PC32) referencing a
STB_GLOBAL STV_DEFAULT object can cause a linker error in a -shared link.

If the linkage is changed to available_externally, the dso_local flag should be dropped, so that no
direct access will be generated.

The current behavior is benign, because -fpic does not assume dso_local
(clang/lib/CodeGen/CodeGenModule.cpp:shouldAssumeDSOLocal).
If we do that for -fno-semantic-interposition (D73865), there will be an
R_X86_64_PC32 linker error without this patch.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D74751
2020-04-07 15:46:01 -07:00
Wei Mi b49eac71ad Recommit [SampleFDO] Add flag for partial profile.
Fix the error of show-prof-info.test on some platforms without zlib.

The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.

Differential Revision: https://reviews.llvm.org/D77426
2020-04-07 14:28:25 -07:00
Stanislav Mekhanoshin 96e51ed005 [AMDGPU] Implement copyPhysReg for 16 bit subregs
Differential Revision: https://reviews.llvm.org/D74937
2020-04-07 14:22:46 -07:00
Wei Mi c5da949ae8 Revert "[SampleFDO] Add flag for partial profile." show-prof-info.test breaks on some platforms.
This reverts commit e3ba652a14.
2020-04-07 12:54:51 -07:00
Wei Mi e3ba652a14 [SampleFDO] Add flag for partial profile.
The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.

Differential Revision: https://reviews.llvm.org/D77426
2020-04-07 12:17:56 -07:00
Nemanja Ivanovic ecd8435483 [NFC][PowerPC] Fix register class for patterns using XXPERMDIs
There are a few patterns where we use a superclass for inputs to this
instruction rather than the correct class. This can sometimes lead to
unncessary copies.
2020-04-07 14:06:08 -05:00
Graham Sellers a19a56f6a1 [AMDGPU] Extend constant folding for logical operations
This patch extends existing constant folding in logical operations to
handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and
V_AND_OR_B32. Also added a couple of tests for existing folds.
2020-04-07 14:37:16 -04:00
Matt Arsenault f524194ffd AMDGPU: Cleanup test MIR 2020-04-07 14:09:25 -04:00
Eli Friedman e9ac757f79 [AArch64] Don't expand memcmp in strict align mode.
7aecf232 fixed the bug where we would miscompile, but we still generate
a crazy amount of code. Turn off the expansion until someone implements
an appropriate heuristic.

Differential Revision: https://reviews.llvm.org/D77599
2020-04-07 10:53:36 -07:00
Stanislav Mekhanoshin 12a324393d [AMDGPU] Limit endcf-collapase to simple if
We can only collapse adjacent SI_END_CF if outer statement
belongs to a simple SI_IF, otherwise correct mask is not in the
register we expect, but is an argument of an S_XOR instruction.

Even if SI_IF is simple it might be lowered using S_XOR because
lowering is dependent on a basic block layout. It is not
considered simple if instruction consuming its output is
not an SI_END_CF. Since that SI_END_CF might have already been
lowered to an S_OR isSimpleIf() check may return false.

This situation is an opportunity for a further optimization
of SI_IF lowering, but that is a separate optimization. In the
meanwhile move SI_END_CF post the lowering when we already know
how the rest of the CFG was lowered since a non-simple SI_IF
case still needs to be handled.

Differential Revision: https://reviews.llvm.org/D77610
2020-04-07 10:27:23 -07:00
Simon Pilgrim 6f46e9af8a [X86][SSE] Add PTEST(AND(X,Y),AND(X,Y)) tests derived from PR42035 examples 2020-04-07 17:58:54 +01:00
David Tenty b9245f14b7 [NFC][PowerPC] Cleanup 64-bit and Darwin CalleeSavedRegs
Summary:
- Remove the no longer used Darwin CalleeSavedRegs
- Combine the SVR464 callee saved regs and AIX64 since the two are (and should be) identical into PPC64
- Update tests for 64-bit CSR change

Reviewers: sfertile, ZarkoCA, cebowleratibm, jasonliu, #powerpc

Reviewed By: sfertile

Subscribers: wuzish, nemanjai, hiraditya, kbarton, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77235
2020-04-07 11:49:10 -04:00
Simon Pilgrim e3b6059776 [X86][SSE] combineX86ShufflesConstants - early out for zeroable vectors (PR45443)
Shuffle combining can insert zero byte sized elements into the shuffle mask, which combineX86ShufflesConstants will attempt to fold without taking into account whether the byte-sized type is legal (e.g. AVX512F only targets).

If we have a full-zeroable vector then we should just return a zero version of the root type, otherwise if the type isn't valid we should bail.

Fixes PR45443
2020-04-07 14:45:29 +01:00
Georgii Rymar 7fc599ceb0 [llvm-readobj] - Introduce warnings for cases when unable to read strings from string tables.
Currently we have no dedicated warnings, but we return error message instead of a result.
It is generally not consistent with another warnings we have.

This change was suggested and discussed here:
https://reviews.llvm.org/D77216#1954873

This change refines error messages we report and also I had to update the API
to implement it.

Differential revision: https://reviews.llvm.org/D77399
2020-04-07 14:40:32 +03:00
Sanjay Patel e268ec8e0d [InstCombine] add icmp+cast tests for ppc_fp128; NFC
See post-commit comments for rG0f56bbc.
2020-04-07 07:35:01 -04:00
Keith Walker 01dc10774e [ARM] unwinding .pad instructions missing in execute-only prologue
If the stack pointer is altered for local variables and we are generating
Thumb2 execute-only code the .pad directive is missing.

Usually the size of the adjustment is stored in a PC-relative location
and loaded into a register which is then added to the stack pointer.
However when we are generating execute-only code code the size of the
adjustment is instead generated using the MOVW/MOVT instruction pair.

As a by product of handling the execute-only case this also fixes an
existing issue that in the none execute-only case the .pad directive was
generated against the load of the constant to a register instruction,
instead of the instruction which adds the register to the stack pointer.

Differential Revision: https://reviews.llvm.org/D76849
2020-04-07 11:51:59 +01:00
Florian Hahn 6aabb109be [SCCP] Use ranges for predicate info conditions.
This patch updates the code that deals with conditions from predicate
info to make use of constant ranges.

For ssa_copy instructions inserted by PredicateInfo, we have 2 ranges:
1. The range of the original value.
2. The range imposed by the linked condition.

1. is known, 2. can be determined using makeAllowedICmpRegion. The
intersection of those ranges is the range for the copy.

With this patch, we get a nice increase in the number of instructions
eliminated by both SCCP and IPSCCP for some benchmarks:

For MultiSource, SPEC2000 & SPEC2006:

Tests: 237
Same hash: 170 (filtered out)
Remaining: 67
Metric: sccp.NumInstRemoved
Program                                        base    patch   diff
 test-suite...Source/Benchmarks/sim/sim.test    10.00   71.00  610.0%
 test-suite...CFP2000/177.mesa/177.mesa.test   361.00  1626.00 350.4%
 test-suite...encode/alacconvert-encode.test   141.00  602.00  327.0%
 test-suite...decode/alacconvert-decode.test   141.00  602.00  327.0%
 test-suite...CI_Purple/SMG2000/smg2000.test   1639.00 4093.00 149.7%
 test-suite...peg2/mpeg2dec/mpeg2decode.test    75.00  163.00  117.3%
 test-suite...T2006/401.bzip2/401.bzip2.test   358.00  513.00  43.3%
 test-suite...rks/FreeBench/pifft/pifft.test    11.00   15.00  36.4%
 test-suite...langs-C/unix-tbl/unix-tbl.test     4.00    5.00  25.0%
 test-suite...lications/sqlite3/sqlite3.test   541.00  667.00  23.3%
 test-suite.../CINT2000/254.gap/254.gap.test   243.00  299.00  23.0%
 test-suite...ks/Prolangs-C/agrep/agrep.test    25.00   29.00  16.0%
 test-suite...marks/7zip/7zip-benchmark.test   1135.00 1304.00 14.9%
 test-suite...lications/ClamAV/clamscan.test   1105.00 1268.00 14.8%
 test-suite...urce/Applications/lua/lua.test   398.00  436.00   9.5%

Metric: sccp.IPNumInstRemoved
Program                                        base   patch   diff
 test-suite...C/CFP2000/179.art/179.art.test     1.00   3.00  200.0%
 test-suite...006/447.dealII/447.dealII.test   429.00 1056.00 146.2%
 test-suite...nch/fourinarow/fourinarow.test     3.00   7.00  133.3%
 test-suite...CI_Purple/SMG2000/smg2000.test   818.00 1748.00 113.7%
 test-suite...ks/McCat/04-bisect/bisect.test     3.00   5.00  66.7%
 test-suite...CFP2000/177.mesa/177.mesa.test   165.00 255.00  54.5%
 test-suite...ediabench/gsm/toast/toast.test    18.00  27.00  50.0%
 test-suite...telecomm-gsm/telecomm-gsm.test    18.00  27.00  50.0%
 test-suite...ks/Prolangs-C/agrep/agrep.test    24.00  35.00  45.8%
 test-suite...TimberWolfMC/timberwolfmc.test    43.00  62.00  44.2%
 test-suite...encode/alacconvert-encode.test    46.00  66.00  43.5%
 test-suite...decode/alacconvert-decode.test    46.00  66.00  43.5%
 test-suite...langs-C/unix-tbl/unix-tbl.test    12.00  17.00  41.7%
 test-suite...peg2/mpeg2dec/mpeg2decode.test    31.00  41.00  32.3%
 test-suite.../CINT2000/254.gap/254.gap.test   117.00 154.00  31.6%

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D76611
2020-04-07 11:09:18 +01:00
Peter Smith 14c1e98754 [ARM] Remove condition that could never be true
From Arm v8 Architecture Reference Manual F5.1.84 LDREXD
The ldrexd instruction in Arm state has the following conditions:

t = UInt(Rt); t2 = t + 1; n = UInt(Rn);
if Rt<0> == '1' || t2 == 15 || n == 15 then UNPREDICTABLE;

In when Rt is odd or if Rt is 14 (making t2 15).

In the implementation when the pair is the UNPREDICTABLE R14_R15 we
would ideally return SOFT_FAIL. We can't because there is no R14_R15
value for us to return so we fail early returning FAIL.

The early return for registers outside the bounds of the table means
the check for Rt == 14 (0xE) redundant which causes a static analyzer
to flag the condition as never being true.

To fix the warning I've removed the check and replaced with a comment
explaining the difference with the specification.

Fixes pr41660

Differential Revision: https://reviews.llvm.org/D77463
2020-04-07 09:50:56 +01:00
Pierre-vh 4fc59a468f Revert "[CodeGen][SelectionDAG] Flip Booleans More Often"
This reverts commit 23342bdcc8.
2020-04-07 09:09:10 +01:00
Pierre-vh 23342bdcc8 [CodeGen][SelectionDAG] Flip Booleans More Often
Differential Revision: https://reviews.llvm.org/D77201
2020-04-07 08:19:57 +01:00
Awanish Pandey 0d43e1688a [DWARF5]: Added a left over test case from D73462
Unfortunately this test case never made it to the trunk. This
was part of https://reviews.llvm.org/D73462 revision.
2020-04-07 10:32:56 +05:30
Sam Clegg f0bbf3d086 [WebAssembly] EmscriptenEHSjLj: Mark more functions as imported
These should have been part of https://reviews.llvm.org/D77192

Differential Revision: https://reviews.llvm.org/D77358
2020-04-06 21:27:31 -07:00
Kai Luo 68ef0b6a49 [PowerPC] Pre-commit test case of float rounding in kernel build. NFC. 2020-04-07 02:07:58 +00:00
Xiang1 Zhang 01a32f2bd3 Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology)
Do not commit the llvm/test/ExecutionEngine/MCJIT/cet-code-model-lager.ll because it will
cause build bot fail(not suitable for window 32 target).

Summary:
This patch comes from H.J.'s 2bd54ce7fa

**This patch fix the failed llvm unit tests which running on CET machine. **(e.g. ExecutionEngine/MCJIT/MCJITTests)

The reason we enable IBT at "JIT compiled with CET" is mainly that:  the JIT don't know the its caller program is CET enable or not.
If JIT's caller program is non-CET, it is no problem JIT generate CET code or not.
But if JIT's caller program is CET enabled,  JIT must generate CET code or it will cause Control protection exceptions.

I have test the patch at llvm-unit-test and llvm-test-suite at CET machine. It passed.
and H.J. also test it at building and running VNCserver(Virtual Network Console), it works too.
(if not apply this patch, VNCserver will crash at CET machine.)

Reviewers: hjl.tools, craig.topper, LuoYuanke, annita.zhang, pengfei

Reviewed By: LuoYuanke

Subscribers: tstellar, efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76900
2020-04-07 09:48:47 +08:00
Jun Ma 46bff786bc [Coroutines] Remove alignment check in shouldBeMustTail
Differential Revision: https://reviews.llvm.org/D77362
2020-04-07 09:07:34 +08:00
Davide Italiano 8115e08b05 [MachineCSE] Don't carry the wrong location when hoisting
PR: 45425
<rdar://problem/61359768>

Differential Revision:  https://reviews.llvm.org/D77604
2020-04-06 16:36:22 -07:00
Stanislav Mekhanoshin 9f09550c50 [AMDGPU] Remove clutter from endcf test. NFC. 2020-04-06 16:32:21 -07:00
Nick Desaulniers 41ba80182c [CallSite Removal] a CallBase is never an IndirectCall for isInlineAsm
Summary:
Thanks to Bill Wendling (void) for the report and steps to reproduce.  It looks
like this was missed during r350508's cleanup of the CallSite split into
CallBase, CallInst, and CallBrInst.

This was exposed by running pgo on a callbr, which was creating a ptrtoint to
the inline asm thinking it was an indirect call. The relevant callchain looks
like:

    IndirectCallPromotionPlugin::run()
    -> PGOIndirectCallVisitor::findIndirectCalls()
      -> PGOIndirectCallVisitor::visitCallBase()
        -> CallBase::isIndirectCall()

Reviewers: void, chandlerc

Reviewed By: void

Subscribers: hiraditya, llvm-commits, craig.topper, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77600
2020-04-06 16:14:46 -07:00
Vedant Kumar 5f185a8999 [AddressSanitizer] Fix for wrong argument values appearing in backtraces
Summary:
In some cases, ASan may insert instrumentation before function arguments
have been stored into their allocas. This causes two issues:

1) The argument value must be spilled until it can be stored into the
   reserved alloca, wasting a stack slot.

2) Until the store occurs in a later basic block, the debug location
   will point to the wrong frame offset, and backtraces will show an
   uninitialized value.

The proposed solution is to move instructions which initialize allocas
for arguments up into the entry block, before the position where ASan
starts inserting its instrumentation.

For the motivating test case, before the patch we see:

```
 | 0033: movq %rdi, 0x68(%rbx)  |   | DW_TAG_formal_parameter     |
 | ...                          |   |   DW_AT_name ("a")          |
 | 00d1: movq 0x68(%rbx), %rsi  |   |   DW_AT_location (RBX+0x90) |
 | 00d5: movq %rsi, 0x90(%rbx)  |   |       ^ not correct ...     |
```

and after the patch we see:

```
 | 002f: movq %rdi, 0x70(%rbx)  |   | DW_TAG_formal_parameter     |
 |                              |   |   DW_AT_name ("a")          |
 |                              |   |   DW_AT_location (RBX+0x70) |
```

rdar://61122691

Reviewers: aprantl, eugenis

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77182
2020-04-06 15:59:25 -07:00
Sid Manning 5002863ab0 Support bfdname "elf32-hexagon".
Add support and update testcases.

Differential Revision: https://reviews.llvm.org/D77579
2020-04-06 17:22:56 -05:00
Daniel Sanders 15f7bc7857 Add option to limit Debugify to locations (omitting variables)
Summary:
It can be helpful to test behaviour w.r.t locations without having DEBUG_VALUE
around. In particular, because DEBUG_VALUE has the potential to change CodeGen
behaviour (e.g. hasOneUse() vs hasOneNonDbgUse()) while locations generally
don't.

Reviewers: aprantl, bogner

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77438
2020-04-06 15:04:55 -07:00
Konstantin Pyzhov 72e8754916 [AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard

Differential Revision: https://reviews.llvm.org/D77228
2020-04-06 09:05:58 -04:00
Leonard Chan a0222ac1f9 [AsmPrinter] Do not define local aliases for global objects in a comdat
A global symbol that is defined in a comdat should not generate an alias since
call sites that would've referred to that symbol will refer to their own
independent local aliases rather than the surviving global comdat one. This
could result in something that looks like:

```
ld.lld: error: relocation refers to a discarded section: .text._ZN3fbl8internal18NullFunctionTargetIvJjjPjEED1Ev.stub
>>> defined in user-x64-clang/obj/system/ulib/minfs/libminfs.a(minfs._sources.file.cc.o)
>>> section group signature: _ZN3fbl8internal18NullFunctionTargetIvJjjPjEED1Ev.stub
>>> prevailing definition is in user-x64-clang/obj/system/ulib/minfs/libminfs.a(minfs._sources.vnode.cc.o)
>>> referenced by function.h:169 (../../zircon/system/ulib/fbl/include/fbl/function.h:169)
>>>               minfs._sources.file.cc.o:(minfs::File::AllocateAndCommitData(std::__2::unique_ptr<minfs::Transaction, std::__2::default_delete<minfs::Transaction> >)) in archive user-x64-clang/obj/system/ulib/minfs/libminfs.a
```

We ran into this when experimenting with a new C++ ABI for fuchsia
(refer to D72959) which takes relative offsets between comdat'd functions
which is why the normal C++ user wouldn't run into this.

Differential Revision: https://reviews.llvm.org/D77429
2020-04-06 13:48:05 -07:00
Nick Desaulniers 5bc291be71 [SelectionDAG] fix predecessor list for INLINEASM_BRs' parent
Summary:
A bug report mentioned that LLVM was producing jumps off the end of a
function when using "asm goto with outputs". Further digging pointed to
MachineBasicBlocks that had their address taken and were indirect
targets of INLINEASM_BR being removed by BranchFolder, because their
 predecessor list was empty, so they appeared to have no entry.

This was a cascading failure caused earlier, during Pre-RA instruction
scheduling. We have a few special cases in Pre-RA instruction scheduling
where we split a MachineBasicBlock in two.  This requires careful
handing of predecessor and successor lists for a MachineBasicBlock that
was split, and careful handing of PHI MachineInstrs that referred to the
MachineBasicBlock before it was split.

The clue that led to this fix was the observation that many callers of
MachineBasicBlock::splice() frequently call
MachineBasicBlock::transferSuccessorsAndUpdatePHIs() to update their PHI
nodes after a splice. We don't want to reuse that method, as we have
custom successor transferring logic for this block split.

This patch fixes 2 pre-existing bugs, and adds tests.

The first bug was that MachineBasicBlock::splice() correctly handles
updating most successors and predecessors; we don't need to do anything
more than removing the previous fallthrough block from the first half of
the split block post splice. Previously, we were updating the successor
list incorrectly (updating successors updates predecessors).

The second bug was that PHI nodes that needed registers from the first
half of the split block were not having entries populated.  The register
live out information was correct, and the FuncInfo->PHINodesToUpdate was
correct. Specifically, the check in SelectionDAGISel::FinishBasicBlock:

    for (unsigned i = 0, e = FuncInfo->PHINodesToUpdate.size(); i != e; ++i) {
      MachineInstrBuilder PHI(*MF, FuncInfo->PHINodesToUpdate[i].first);
      if (!FuncInfo->MBB->isSuccessor(PHI->getParent()))
        continue;
      PHI.addReg(FuncInfo->PHINodesToUpdate[i].second).addMBB(FuncInfo->MBB);

was `continue`ing because FuncInfo->MBB tracks the second half of
the post-split block; no one was updating PHI entries for the first half
of the post-split block.

SelectionDAGBuilder::UpdateSplitBlock() already expects to perform
special handling for MachineBasicBlocks that were split post calls to
ScheduleDAGSDNodes::EmitSchedule(), so I'm confident that it's both
correct for ScheduleDAGSDNodes::EmitSchedule() to return the second half
of the split block `CopyBB` which updates `FuncInfo->MBB` (ie. the
current MachineBasicBlock being processed), and perform special handling
for this in SelectionDAGBuilder::UpdateSplitBlock().

Reviewers: void, craig.topper, efriedma

Reviewed By: void, efriedma

Subscribers: hfinkel, fhahn, MatzeB, efriedma, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76961
2020-04-06 13:46:39 -07:00
Masoud Ataei jaliseh 9ed0612cca Add InjectTLIMappings pass to new pass manager
This pass is created in d6de5f12d4 and tested
for new and legacy pass manager but never added to new pass manager pipeline.
I am adding it to new pass manager pipeline.

This pass is get used in Vector Function Database (VFDatabase) and without
this pass in new pass manager pipeline, none of the vector libraries are work
ing with new pass manager.

Related passes:
66c120f025
https://reviews.llvm.org/D74944

Differential revision: https://reviews.llvm.org/D75354
2020-04-06 13:16:48 -05:00
Craig Topper 07ed1fb597 [SelectionDAGBuilder] Fix ISD::FREEZE creation for structs with fields of different types.
The previous code used the type of the first field for the VT
passed to getNode for every field.

I've based the implementation here off what is done in visitSelect
as it removes the need to special case aggregates.

Differential Revision: https://reviews.llvm.org/D77093
2020-04-06 11:03:40 -07:00
Konstantin Pyzhov 51dc028314 Revert e1730cfeb3 2020-04-06 05:56:11 -04:00
Konstantin Pyzhov e1730cfeb3 [AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard

Differential Revision: https://reviews.llvm.org/D77228
2020-04-06 05:10:37 -04:00
Fangrui Song a5d375e0cb [AArch64] Allow logical immediates to have all-1 in top bits
So that constant expressions like the following are permitted:

and w0, w0, #~(0xfe<<24)
and w1, w1, #~(0xff<<24)

The behavior matches GNU as (opcodes/aarch64-opc.c:aarch64_logical_immediate_p).

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D75885
2020-04-06 09:56:04 -07:00
Sanjay Patel a2bb19ca42 [x86] add size cost tests for casts and binops; NFC
Shows bugs for div/rem/fdiv and possibly others.
2020-04-06 12:38:15 -04:00
Jonathan Roelofs 7c5d2bec76 [llvm] Fix missing FileCheck directive colons
https://reviews.llvm.org/D77352
2020-04-06 09:59:08 -06:00
Sanjay Patel fbb1b43f13 [ValueTracking] enhance matching of umin/umax with 'not' operands
The cmyk test is based on the known regression that resulted from:
rGf2fbdf76d8d0

This improves on the equivalent signed min/max change:
rG867f0c3c4d8c

The underlying icmp equivalence is:
  ~X pred ~Y --> Y pred X

For an icmp with constant, canonicalization results in a swapped pred:
  ~X < C -->  X > ~C
2020-04-06 11:51:59 -04:00
Matt Arsenault 8a5f0dafd4 AMDGPU/GlobalISel: Select llvm.amdgcn.div.scale 2020-04-06 11:50:19 -04:00
Matt Arsenault e87ec66762 AMDGPU/GlobalISel: Fix llvm.amdgcn.div.fmas.ll 2020-04-06 11:50:16 -04:00
Chris Bowler d6ea82d11c [AIX][PPC] Implement by-val caller arguments in multiple registers
Differential Revision: https://reviews.llvm.org/D76380
2020-04-06 11:06:51 -04:00
Matt Arsenault 08772f1742 AMDGPU/GlobalISel: Add unmerge of concat tests 2020-04-06 11:03:55 -04:00
Chris Bowler 982202408b [NFC][PPC][AIX] Test updates for byval args that fit in a single register
Differential Revision: https://reviews.llvm.org/D76875
2020-04-06 10:31:11 -04:00
diggerlin a26a441b99 [llvm-objdump][XCOFF] Use symbol index+symbol name + storage mapping class as label for -D
SUMMARY:

For the llvm-objdump -D, the symbol name is used as a label in the disassembly for the specific address (when a symbol address is equal to the virtual address in the dump).

In XCOFF, multiple symbols may have the same name, being differentiated by their storage mapping class. It is helpful to print the QualName and not just the name when forming the output label for a csect symbol. The symbol index further removes any ambiguity caused by duplicate names.

To maintain compatibility with the binutils objdump, the XCOFF-specific --symbol-description option is added to enable the enhanced format.

Reviewers: hubert.reinterpretcast, James Henderson, Jason Liu ,daltenty
Subscribers: wuzish, nemanjai, hiraditya

Differential Revision: https://reviews.llvm.org/D72973
2020-04-06 10:10:10 -04:00
James Henderson ccf16c4d38 Make test more robust
llvm-readobj prints the file path as part of the output, so
--implicit-check-not patterns can accidentally match it. This patch
makes the test more robust by preventing failures if "foo" is in
somebody's path.

Reviewed by: grimar

Differential Revision: https://reviews.llvm.org/D77546
2020-04-06 14:47:05 +01:00
Chris Bowler 4477343993 test commit 2020-04-06 09:36:59 -04:00
Matt Arsenault 70726cec5b DAG: Combine extract_vector_elt of concat_vectors
Fixes extra canonicalize regressions when legalizing
vector fminnum/fmaxnum.
2020-04-06 09:26:29 -04:00
Hans Wennborg 64c2312750 Revert 43f031d312 "Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology)"
ExecutionEngine/MCJIT/cet-code-model-lager.ll is failing on 32-bit
windows, see llvm-commits thread for fef2dab.

This reverts commit 43f031d312
and the follow-ups fef2dab100 and
6a800f6f62.
2020-04-06 15:05:25 +02:00
Hans Wennborg 6a800f6f62 Add a triple to test/ExecutionEngine/MCJIT/cet-code-model-lager.ll
It was failing in 32-bit Windows builds with:

  $ ":" "RUN: at line 1"
  $ "c:\src\llvm_package_944db8a4\build32_stage0\bin\lli.exe" "-mtriple=i686-pc-windows-msvc-elf" "-code-model=large" "C:\src\llvm_package_944db8a4\llvm-project\llvm\test\ExecutionEngine\MCJIT\cet-code-model-lager.ll"
  # command stderr:
  Assertion failed: Is64Bit && "Large code model is only legal in 64-bit mode.", file C:\src\llvm_package_944db8a4\llvm-project\llvm\lib\Target\X86\X86ISelLowering.cpp, line 4212

Let's see if this helps.
2020-04-06 14:19:59 +02:00
Sourabh Singh Tomar 5d7e9adce2 [DWARF5] Added support for emission of debug_macro section.
Summary:
This patch adds support for emission of following DWARFv5 macro forms
in .debug_macro section.

1. DW_MACRO_start_file
2. DW_MACRO_end_file
3. DW_MACRO_define_strp
4. DW_MACRO_undef_strp.

Reviewed By: dblaikie, ikudrin

Differential Revision: https://reviews.llvm.org/D72828
2020-04-06 17:45:10 +05:30
Simon Pilgrim 9bc5b1a489 [X86][SSE] combineVectorSignBitsTruncation - remove minimum vector length limitations
truncateVectorWithPACK has its own vector length controls, so we can rely on those directly. This helps some existing truncation to subvector tests, which were being combined later during shuffle lowering at which point the sign/zero bit detection had become obscured preventing lowerShuffleWithPACK working as well as it could.
2020-04-06 12:45:23 +01:00
David Green 9fa38c985f [ARM] MVE vqmovn tests. NFC. 2020-04-06 11:13:02 +01:00
Kazushi (Jam) Marukawa e981a46a77 [VE] Update lea/load/store instructions
Summary:
Modify lea/load/store instructions to accept `disp(index, base)`
style addressing mode (called ASX format).  Also, uniform the
number of DAG nodes to have 3 operands for this ASX format
instructions, and update selectADDR functions to lower
appropriate MI.

Reviewers: arsenm, simoll, k-ishizaka

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D76822
2020-04-06 11:49:46 +02:00
Oliver Stannard a294d9eb21 Revert "[IPRA][ARM] Spill extra registers at -Oz"
Reverting because this is causing failures on bots with expensive checks
enabled.

This reverts commit 73cea83a6f.
2020-04-06 10:34:59 +01:00
Kerry McLaughlin 944e322f88 [AArch64][SVE] Add SVE intrinsics for saturating add & subtract
Summary:
Adds the following intrinsics:
  - @llvm.aarch64.sve.[s|u]qadd.x
  - @llvm.aarch64.sve.[s|u]qsub.x

Reviewers: sdesmalen, c-rhodes, dancgr, efriedma, cameron.mcinally, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77054
2020-04-06 10:07:08 +01:00
Florian Hahn 39f2d9aa81 [Matrix] Add option to use row-major matrix layout as default.
This patch adds a -matrix-default-layout option which can be used to
set the default matrix layout to row-major or column-major (default).

The initial patch updates codegen for loads, stores, binary operators
and matrix multiply.

Reviewers: anemet, Gerolf, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D76325
2020-04-06 10:00:56 +01:00
Florian Hahn d1fed7081d [Matrix] Add initial tiling for load/multiply/store chains.
This patch adds initial fusion for load/multiply/store chains of matrix
operations.

The patch contains roughly two parts:

1. Code generation for a fused load/multiply/store chain (LowerMatrixMultiplyFused).
First, we ensure that both loads of the multiply operands do not alias the store.
If they do, we create new non-aliasing copies of the operands. Note that this
may introduce new basic block. Finally we process TileSize x TileSize blocks.
That is: load tiles from the input operands, multiply and store them.

2. Identify fusion candidates & matrix instructions.
As a first step, collect all instructions with shape info and fusion candidates
(currently @llvm.matrix.multiply calls). Next, try to fuse candidates and
collect instructions eliminated by fusion. Finally iterate over all matrix
instructions, skip the ones eliminated by fusion and lower the rest as usual.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D75566
2020-04-06 09:28:15 +01:00
Igor Kudrin 5125685e91 [llvm-dwp] Fix a possible out of bound access.
llvm-dwp did not check section identifiers read from input files.
In the case of an unexpected identifier, the calculated index for
Contributions[] pointed outside the array. This fix avoids the issue
by skipping unsupported identifiers.

Differential Revision: https://reviews.llvm.org/D76543
2020-04-06 14:31:00 +07:00
Igor Kudrin 35819ff3cf [DebugInfo] Fix reading range lists of v5 units in DWP.
In package files, the base offset provided by index sections should be
used to find the contribution of a unit. The patch adds that base
offset when reading range list tables.

Differential revision: https://reviews.llvm.org/D77401
2020-04-06 13:28:06 +07:00
Igor Kudrin a93b77b97f [DebugInfo] Fix reading location tables headers of v5 units in DWP.
This fixes the reading of location lists headers for compilation units
in package files by adjusting the reading offset according to the
corresponding record in the unit index. This is required for
DW_FORM_loclistx to work.

Differential revision: https://reviews.llvm.org/D77146
2020-04-06 13:28:06 +07:00
Igor Kudrin 49737df767 [DebugInfo] Fix reading location tables of v5 units in DWP.
Without the patch, all version 5 compile units in a DWP file read
location tables from the beginning of a .debug_loclists.dwo section.
The patch fixes that by adjusting the reading offset the same way as
for pre-v5 units. The section identifier to find the contribution
entry corresponds to the version of the unit.

Differential revision: https://reviews.llvm.org/D77145
2020-04-06 13:28:06 +07:00
Igor Kudrin 714324b79a [DebugInfo] Support DWARFv5 index sections.
DWARFv5 defines index sections in package files in a slightly different
way than the pre-standard GNU proposal, see Section 7.3.5 in the DWARF
standard and https://gcc.gnu.org/wiki/DebugFissionDWP for GNU proposal.
The main concern here is values for section identifiers, which are
partially overlapped with changed meanings. The patch adds support for
v5 index sections and resolves that difficulty by defining a set of
identifiers for internal use which can represent and distinct values
of both standards.

Differential Revision: https://reviews.llvm.org/D75929
2020-04-06 13:28:06 +07:00
Tarindu Jayatilaka b43b59fcc0 Expose `attributor-disable` to the new and old pass managers
The new and old pass managers (PassManagerBuilder.cpp and
PassBuilder.cpp) are exposed to an `extern` declaration of
`attributor-disable` option which will guard the addition of the
attributor passes to the pass pipelines.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76871
2020-04-05 22:29:34 -05:00
Lang Hames 1b39c6f62c [ORC] Add MachO universal binary support to StaticLibraryDefinitionGenerator.
Add a new overload of StaticLibraryDefinitionGenerator::Load that takes a triple
argument and supports loading archives from MachO universal binaries in addition
to regular archives.

The LLI tool is updated to use this overload.
2020-04-05 20:21:05 -07:00
Simon Pilgrim 4431a29c60 [X86][SSE] Combine unary shuffle(HORIZOP,HORIZOP) -> HORIZOP
We had previously limited the shuffle(HORIZOP,HORIZOP) combine to binary shuffles, but we can often merge unary shuffles just as well, folding in UNDEF/ZERO values into the 64-bit half lanes.

For the (P)HADD/HSUB cases this is limited to fast-horizontal cases but PACKSS/PACKUS combines under all cases.
2020-04-05 22:49:46 +01:00
Anna Thomas 1d0f757904 [InlineFunction] Update metadata on loads that are return values
This patch builds upon D76140 by updating metadata on pointer typed
loads in inlined functions, when the load is the return value, and the
callsite contains return attributes which can be updated as metadata on
the load.
Added test cases show this for nonnull, dereferenceable,
dereferenceable_or_null

Reviewed-By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76792
2020-04-05 14:50:10 -04:00
Zuojian Lin a58c8a7866 Remove the additional constant which requires an extra register for statepoint lowering.
The newly-created constant zero will need an extra register to hold it
in the current statepoint lowering implementation. Remove it if there
exists one.
2020-04-05 11:22:09 -04:00
Matt Arsenault 9620fe02df AMDGPU/GlobalISel: Add some G_INSERT/G_EXTRACT select tests 2020-04-05 10:54:24 -04:00
Oliver Stannard cb6aeb2239 [ARM] Add data gathering hint instruction
Summary:
This patch upstreams support the optional ARMv8.0 Data Gathering Hint (DGH)
extension, which adds the Data Gathering Hint instruction to the hint
space.

See ARMv8.0-DGH in the Arm Architecture Reference Manual Armv8 for more
information.

Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, danielkiss, samparker

Reviewed By: SjoerdMeijer

Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77097
2020-04-05 15:21:00 +01:00
Oliver Stannard 6f60eb4a3c [ARM] Add enhanced counter virtualization system registers
Summary:
This patch upstreams support for the ARMv8.6A Enhanced Counter Virtualization
(ECV) extension, which adds 6 new system registers.

See ARMv8.6-ECV in the Arm Architecture Reference Manual Armv8 for more
information.

Reviewers: t.p.northover, rengolin, SjoerdMeijer, pcc, ab, chill

Reviewed By: SjoerdMeijer

Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77094
2020-04-05 15:18:35 +01:00
Sanjay Patel 538a8f0227 [InstCombine] convert bitcast-shuffle to vector trunc
As discussed in D76983, that patch can turn a chain of insert/extract
with scalar trunc ops into bitcast+extract and existing instcombine
vector transforms end up creating a shuffle out of that (see the
PhaseOrdering test for an example). Currently, that process requires
at least this sequence: -instcombine -early-cse -instcombine.

Before D76983, the sequence of insert/extract would reach the SLP
vectorizer and become a vector trunc there.

Based on a small sampling of public targets/types, converting the
shuffle to a trunc is better for codegen in most cases (and a
regression of that form is the reason this was noticed). The trunc is
clearly better for IR-level analysis as well.

This means that we can induce "spontaneous vectorization" without
invoking any explicit vectorizer passes (at least a vector cast op
may be created out of scalar casts), but that seems to be the right
choice given that we started with a chain of insert/extract, and the
backend would expand back to that chain if a target does not support
the op.

Differential Revision: https://reviews.llvm.org/D77299
2020-04-05 09:48:02 -04:00
Oliver Stannard 9e1455dc23 [ARM] Add ARMv8.6 Fine Grain Traps system registers
Summary:
This patch upstreams support for the ARMv8.6A Fine Grain Traps (FGT)
extension, which adds 5 new system registers.

See ARMv8.6-FGT in the Arm Architecture Reference Manual Armv8 for more
information.

Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, momchil.velikov

Reviewed By: SjoerdMeijer

Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76991
2020-04-05 14:28:18 +01:00
Sanjay Patel 4036a0af24 [InstCombine] enhance freelyNegateValue() by handling 'not'
This patch extends D77230. If we have a 'not' instruction inside a
negated expression, we can ignore extra uses of that op because the
negation has a one-to-one replacement: negate becomes increment.

Alive2 examples of the test cases:
http://volta.cs.utah.edu:8080/z/T5-u9P
http://volta.cs.utah.edu:8080/z/eT89L6

Differential Revision: https://reviews.llvm.org/D77459
2020-04-05 09:16:19 -04:00
Sanjay Patel 867f0c3c4d [ValueTracking] enhance matching of smin/smax with 'not' operands
The cmyk tests are based on the known regression that resulted from:
rGf2fbdf76d8d0

So this improvement in analysis might be enough to restore that commit.
2020-04-05 08:54:12 -04:00
Diogo Sampaio 59d10dc703 [ARM] add ARMv8.6-A Activity monitors virtualization extension
Summary:
This patch upstreams v8.6A activity monitors virtualization
assembler support, which consists of 32 new system
registers (two groups, each with 16 numbered registers).

See ARMv8.6-AMU in the Arm Architecture Reference Manual Armv8 for more
information.

Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, john.brawn, ostannard

Reviewed By: ostannard

Subscribers: LukeGeeson, dnsampaio, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76998
2020-04-05 13:31:06 +01:00
David Zarzycki 74ab5d98d0 Revert "Test had incorrect check for nonzero count"
This reverts commit 210f40fe9a.
2020-04-05 07:16:47 -04:00
Simon Pilgrim 3079e51858 [X86][SSE] Generalize shuffle(HORIZOP,HORIZOP) -> HORIZOP combine
Our existing combine allows to merge the shuffle of 2 similar 64-bit wide 'horizontal ops' (HADD/PACK/etc.) if the shuffle was a UNPCK/MOVSD.

This patch generalizes this to decode any target shuffle mask that can be widened to a 128-bit repeating v2*64 mask, which helps us catch PBLENDW/PBLENDD cases.
2020-04-05 12:09:19 +01:00
Simon Pilgrim a17de6b91c [X86][SSE] truncateVectorWithPACK - upper undef for 128->64 packing
If we're packing from 128-bits to 64-bits then we don't need the RHS argument. This helps with register allocation, especially as we avoid repeating a use of the input value.
2020-04-05 11:47:36 +01:00
vgxbj 688fe2d03d [llvm-nm] Add test for `--debug-syms --dynamic`
Summary: This test ensures that `llvm-nm` will omit NULL symbol.

Reviewers: jhenderson, MaskRay, grimar

Reviewed By: jhenderson, grimar

Subscribers: rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76213
2020-04-05 12:15:30 +08:00
vgxbj dc4c8a3c9c [llvm-objdump][test] Recommit unimplemented-features.test
Recommit test case that removed by rG685bf42e9e0cc79cff6d26cf44749db98f148270
2020-04-05 11:47:27 +08:00
vgxbj 685bf42e9e [llvm-objdump][test] Remove unimplemented-features.test
Seems that this test breaks build bots.
http://lab.llvm.org:8011/builders/lld-x86_64-win7/builds/41418

Commit it later.
2020-04-05 11:03:34 +08:00
vgxbj 948ef5b1a6 [llvm-objdump] Teach `llvm-objdump` dump dynamic symbols.
Summary:
This patch is to teach `llvm-objdump` dump dynamic symbols (`-T` and `--dynamic-syms`). Currently, this patch is not fully compatible with `gnu-objdump`, but I would like to continue working on this in next few patches. It has two issues.

1. Some symbols shouldn't be marked as global(g). (`-t/--syms` has same issue as well) (Fixed by D75659)
2. `gnu-objdump` can dump version information and *dynamically* insert before symbol name field.

`objdump -T a.out` gives:

```
DYNAMIC SYMBOL TABLE:
0000000000000000  w   D  *UND*  0000000000000000              _ITM_deregisterTMCloneTable
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 printf
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 __libc_start_main
0000000000000000  w   D  *UND*  0000000000000000              __gmon_start__
0000000000000000  w   D  *UND*  0000000000000000              _ITM_registerTMCloneTable
0000000000000000  w   DF *UND*  0000000000000000  GLIBC_2.2.5 __cxa_finalize
```

`llvm-objdump -T a.out` gives:

```
DYNAMIC SYMBOL TABLE:
0000000000000000  w   D  *UND*  0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 g    DF *UND*  0000000000000000 printf
0000000000000000 g    DF *UND*  0000000000000000 __libc_start_main
0000000000000000  w   D  *UND*  0000000000000000 __gmon_start__
0000000000000000  w   D  *UND*  0000000000000000 _ITM_registerTMCloneTable
0000000000000000  w   DF *UND*  0000000000000000 __cxa_finalize
```

Reviewers: jhenderson, grimar, MaskRay, espindola

Reviewed By: jhenderson, grimar

Subscribers: emaste, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75756
2020-04-05 10:46:59 +08:00
Matt Arsenault 6bfe28e92f AMDGPU: Fix annotate kernel features through casted calls
I thought I was testing this before, but the workitem id x
case isn't great since it's mandatory in the parent kernel.
2020-04-04 20:44:44 -04:00
Shinji Okumura c80cf48801 [Attributor] AAReachability : use isPotentiallyReachable in isKnownReachable
`isKnownReachable` had only interface (always returns true).
 Changed it to call `isPotentiallyReachable`.
This change enables deductions of other Abstract Attributes depending on
AAReachability to use reachability information obtained from CFG, and it
can make them stronger.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76210
2020-04-04 19:16:17 -05:00
Shinji Okumura 475abe14a5 [Attributor] Make use of analysis in the MustBeExecutedExplorer
This commit was made to settle [[ https://github.com/llvm/llvm-project/issues/175 | this issue on GitHub ]].
I added analysis getters for LoopInfo, DominatorTree, and
PostDominatorTree. And I added a test to show an improvement of the
deduction of `dereferenceable` attribute.

Reviewed By: jdoerfert, uenoku

Differential Revision: https://reviews.llvm.org/D76378
2020-04-04 19:08:44 -05:00
Stefanos Baziotis f3dd3a66d3 [Attributor] AAUndefinedBehavior: Use AAValueSimplify in memory accessing instructions.
Query AAValueSimplify on pointers in memory accessing instructions to take
advantage of the constant propagation (or any other value simplification) of such values.
2020-04-05 02:46:26 +03:00
Simon Pilgrim be84d2b5b7 [CostModel][X86] Add some insert subvector cost tests for vXf32/vXi32/vXi16/vXi8 types 2020-04-04 22:46:57 +01:00
Simon Pilgrim 3380d4d75e [X86] Cleanup vectorcall test checks
PR32397 still needs to be fixed for the update script to process this
2020-04-04 22:46:56 +01:00
Jonathan Roelofs 3ce77142a6 Revert "[DAG] Fix PR45049: LegalizeTypes crash"
This reverts commit 17673ae0b2.
2020-04-04 13:47:22 -06:00
Jonathan Roelofs 17673ae0b2 [DAG] Fix PR45049: LegalizeTypes crash
Sometimes LegalizeTypes knows about common subexpressions before SelectionDAG
does, leading to accidental SDValue removal before its reference count was
truly zero.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45049

https://reviews.llvm.org/D76994
2020-04-04 13:36:22 -06:00
Sanjay Patel 28202dd35c [InstCombine] add more tests for min/max folding; NFC 2020-04-04 13:44:06 -04:00
Luofan Chen eec6d87626 [Attributor] Deduce attributes for non-exact functions
This patch is based on D63312 and D63319. For now we create shallow wrappers for all functions that are IPO amendable.
See also [this github issue](https://github.com/llvm/llvm-project/issues/172).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76404
2020-04-04 11:34:58 -05:00
Heejin Ahn 2e9839729d [WebAssembly] Fix wasm.lsda() optimization in WasmEHPrepare
Summary:
When we insert a call to the personality function wrapper
(`_Unwind_CallPersonality`) for a catch pad, we store some necessary
info in `__wasm_lpad_context` struct and pass it. One of the info is the
LSDA address for the function. For this, we insert a call to
`wasm.lsda()`, which will be lowered down to the address of LSDA, and
store it in a field in `__wasm_lpad_context`.

There are exceptions to this personality call insertion: catchpads for
`catch (...)` and cleanuppads (for destructors) don't need personality
function calls, because we don't need to figure out whether the current
exception should be caught or not. (They always should.)

There was a little optimization to `wasm.lsda()` call insertion. Because
the LSDA address is the same throughout a function, we don't need to
insert a store of `wasm.lsda()` return value in every catchpad. For
example:
```
try {
  foo();
} catch (int) {
  // wasm.lsda() call and a store are inserted here, like, in
  // pseudocode,
  // %lsda = wasm.lsda();
  // store %lsda to a field in __wasm_lpad_context
  try {
    foo();
  } catch (int) {
    // We don't need to insert the wasm.lsda() and store again, because
    // to arrive here, we have already stored the LSDA address to
    // __wasm_lpad_context in the outer catch.
  }
}
```
So the previous algorithm checked if the current catch has a parent EH
pad, we didn't insert a call to `wasm.lsda()` and its store.

But this was incorrect, because what if the outer catch is `catch (...)`
or a cleanuppad?
```
try {
  foo();
} catch (...) {
  // wasm.lsda() call and a store are NOT inserted here
  try {
    foo();
  } catch (int) {
    // We need wasm.lsda() here!
  }
}
```
In this case we need to insert `wasm.lsda()` in the inner catchpad,
because the outer catchpad does not have one.

To minimize the number of inserted `wasm.lsda()` calls and stores, we
need a way to figure out whether we have encountered `wasm.lsda()` call
in any of EH pads that dominates the current EH pad. To figure that
out, we now visit EH pads in BFS order in the dominator tree so that we
visit parent BBs first before visiting its child BBs in the domtree.

We keep a set named `ExecutedLSDA`, which basically means "Do we have
`wasm.lsda()` either in the current EH pad or any of its parent EH
pads in the dominator tree?". This is to prevent scanning the domtree up
to the root in the worst case every time we examine an EH pad: each EH
pad only needs to examine its immediate parent EH pad.

- If any of its parent EH pads in the domtree has `wasm.lsda()`, this
  means we don't need `wasm.lsda()` in the current EH pad. We also insert
  the current EH pad in `ExecutedLSDA` set.
- If none of its parent EH pad has `wasm.lsda()`
  - If the current EH pad is a `catch (...)` or a cleanuppad, done.
  - If the current EH pad is neither a `catch (...)` nor a cleanuppad,
    add `wasm.lsda()` and the store in the current EH pad, and add the
    current EH pad to `ExecutedLSDA` set.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77423
2020-04-04 07:02:50 -07:00
Simon Pilgrim 6a57ba17c0 [CostModel][X86] Add shuffle cost tests for sub-128bit vectors 2020-04-04 13:08:25 +01:00
Simon Pilgrim 87fd686f6f [CostModel][X86] Add insert/extract cost tests for sub-128bit vXi8/vXi16 vectors 2020-04-04 13:08:25 +01:00
Simon Pilgrim e5e719d885 [X86][SSE] lowerV8I16Shuffle - lower compaction shuffles using PACKUSDW(PBLENDW,PBLENDW) on SSE41+
Similar to the lowerV16I8Shuffle implementation, for binary compaction v8i16 shuffles we can avoid the PUNPCKLDQ(PSHUFB,PSHUFB) pattern on SSE41+ targets by using PACKUSDW and PBLENDW. Before SSE41 we would need to use PACKSSDW but that requires sign extension that seems to destroy any gains, even on targets without PSHUFB.

This is a bigger gain on AMD than Intel targets but should never be a regression, and avoiding the shuffle mask load(s) is always useful.

Noticed in codegen while dealing with PR31443.
2020-04-04 13:08:25 +01:00
vgxbj 541bead8b4 [Object] object::ELFObjectFile::dynamic_symbol_begin(): skip symbol index 0
Summary:
Note: This revision is very similar to D62296.

In D75756, we need `getDynamicSymbolIterators()` to skip first NULL symbol in `.dynsym`. And I believe it might be worth pointing this out in a separate patch to gather you experts' opinions.

I have checked that current code base will not be affected by this change.

```
dynamic_symbol_begin()
|- dynamic_symbol_end(): Ok
`- getDynamicSymbolIterators()
   |- addDynamicElfSymbols(): llvm/tools/llvm-objdump/llvm-objdump.cpp, Line 934
   |                          Ok, NULL symbol will be omitted by Line 945-947
   |                          StringRef Name = unwrapOrError(Symbol.getName(), Obj->getName());
   |                          if (Name.empty()) continue;
   |- dumpSymbolNameFromObject(): llvm/tools/llvm-nm/llvm-nm.cpp, Line 1192
   |                          There's no test for dumping dynamic debugging symbol. This patch helps improve llvm-nm behavior. (we should add test for this later)
   `- computeSymbolSizes(): llvm/lib/Object/SymbolSize.cpp, Line 52
      |- OProfileJITEventListener::notifyObjectLoaded(): llvm/lib/ExecutionEngine/OProfileJIT/OProfileJITEventListener.cpp, Line 92
      |                                                  Ok, NULL symbol will be omitted by Line 94-95
      |                                                  if (!Sym.getType() || *Sym.getType() != SF_Function) continue;
      |- IntelJITEventListener::notifyObjectLoaded(): llvm/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp, Line 98
      |                                               Ok, NULL symbol will be omitted by Line 124-126 (same as previous one)
      |- PerfJITEventListener::notifyObjectLoaded(): llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp, Line 244
      |                                              Ok, NULL symbol will be omitted by Line 254-256, (same as previous one)
      |- SymbolizableObjectFile::create(): llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp, Line 73
      |                                    Ok, NULL symbol will be omitted by Line 75
      |                                    res->addSymbol()
      |                                    In addSymbol(), Line 167-168
      |                                    if (!Sec || (Obj && Obj->section_end() == *Sec)) return std::error_code();
      |- dumpCXXData(): llvm/tools/llvm-cxxdump/llvm-cxxdump.cpp, Line 189
      |                 Ok, NULL symbol will be omitted by Line 199-202
      |                 object::section_iterator SecI = *SecIOrErr;
      |                 // Skip external symbols.
      |                 if (SecI == Obj->section_end())
      |                   continue;
      `- printLineInfoForInput(): llvm/tools/llvm-rtdyld/llvm-rtdyld.cpp, Line 418
                                  Ok, NULL symbol will be omitted by Line 430-477
                                  if (Type == object::SymbolRef::ST_Function) {
                                    ...
                                  }
```

Reviewers: grimar, jhenderson, MaskRay

Reviewed By: jhenderson, MaskRay

Subscribers: rupprecht, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76081
2020-04-04 18:45:52 +08:00
Matt Arsenault b801577c59 AMDGPU: Fix a few more tests with old denormal subtarget features 2020-04-03 23:42:13 -04:00
Nemanja Ivanovic 56246b241e [NFC][PowerPC] Pre-commit a test case for D77448
Pre-committing the new test case so the review shows only the diffs.
2020-04-03 20:43:04 -05:00
Craig Topper 1d42c0db9a Revert "[X86] Add a Pass that builds a Condensed CFG for Load Value Injection (LVI) Gadgets"
This reverts commit c74dd640fd.

Reverting to address coding standard issues raised in post-commit
review.
2020-04-03 16:56:08 -07:00
Craig Topper a505ad58cf Revert "[X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)"
This reverts commit 62c42e29ba

Reverting to address coding standard issues raised in post-commit
review.
2020-04-03 16:55:53 -07:00
Sanjay Patel b7397e81fe [InstCombine] add tests for freelyNegateValue with 'not'; NFC 2020-04-03 17:28:29 -04:00
Nick Desaulniers 9d9b8a20a8 [test] preformat test with update_llc_test_checks.py NFC
Summary:
Prior to landing D76961, preprocess via:
    $ llvm/utils/update_llc_test_checks.py \
      llvm/test/CodeGen/X86/callbr-asm-outputs.ll

Reviewers: void, MaskRay

Reviewed By: void, MaskRay

Subscribers: MaskRay, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77356
2020-04-03 14:07:21 -07:00
Scott Constable 62c42e29ba [X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)
After finding all such gadgets in a given function, the pass minimally inserts
LFENCE instructions in such a manner that the following property is satisfied:
for all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at
least one LFENCE instruction. The algorithm that implements this minimal
insertion is influenced by an academic paper that minimally inserts memory
fences for high-performance concurrent programs:

http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf

The algorithm implemented in this pass is as follows:

1. Build a condensed CFG (i.e., a GadgetGraph) consisting only of the following components:
  -SOURCE instructions (also includes function arguments)
  -SINK instructions
  -Basic block entry points
  -Basic block terminators
  -LFENCE instructions
2. Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e., gadgets) are already mitigated by existing LFENCEs. If all gadgets have been mitigated, go to step 6.
3. Use a heuristic or plugin to approximate minimal LFENCE insertion.
4. Insert one LFENCE along each CFG edge that was cut in step 3.
5. Go to step 2.
6. If any LFENCEs were inserted, return true from runOnFunction() to tell LLVM that the function was modified.

By default, the heuristic used in Step 3 is a greedy heuristic that avoids
inserting LFENCEs into loops unless absolutely necessary. There is also a
CLI option to load a plugin that can provide even better optimization,
inserting fewer fences, while still mitigating all of the LVI gadgets.
The plugin can be found here: https://github.com/intel/lvi-llvm-optimization-plugin,
and a description of the pass's behavior with the plugin can be found here:
https://software.intel.com/security-software-guidance/insights/optimized-mitigation-approach-load-value-injection.

Differential Revision: https://reviews.llvm.org/D75937
2020-04-03 13:45:50 -07:00
Scott Constable c74dd640fd [X86] Add a Pass that builds a Condensed CFG for Load Value Injection (LVI) Gadgets
Adds a new data structure, ImmutableGraph, and uses RDF to find LVI gadgets and add them to a MachineGadgetGraph.

More specifically, a new X86 machine pass finds Load Value Injection (LVI) gadgets consisting of a load from memory (i.e., SOURCE), and any operation that may transmit the value loaded from memory over a covert channel, or use the value loaded from memory to determine a branch/call target (i.e., SINK).

Also adds a new target feature to X86: +lvi-load-hardening

The feature can be added via the clang CLI using -mlvi-hardening.

Differential Revision: https://reviews.llvm.org/D75936
2020-04-03 13:02:04 -07:00
Paul Robinson 210f40fe9a Test had incorrect check for nonzero count 2020-04-03 12:37:13 -07:00
Scott Constable f95a67d8b8 [X86] Add RET-hardening Support to mitigate Load Value Injection (LVI)
Adding a pass that replaces every ret instruction with the sequence:

pop <scratch-reg>
lfence
jmp *<scratch-reg>

where <scratch-reg> is some available scratch register, according to the
calling convention of the function being mitigated.

Differential Revision: https://reviews.llvm.org/D75935
2020-04-03 12:08:34 -07:00
Stanislav Mekhanoshin 8c5dc084e5 [AMDGPU] Added label to test. NFC. 2020-04-03 11:36:32 -07:00
Stanislav Mekhanoshin 0462795095 [AMDGPU] Propagate AGPR RC from PHI to its PHI operands
We can fix register class of PHI based on its all AGPR uses.
That leaves behind all PHIs which were already processed
earlier. Propagate RC back to PHI operands of a PHI.

Differential Revision: https://reviews.llvm.org/D77344
2020-04-03 11:23:02 -07:00
Sanjay Patel ce97ce3a5d [VectorCombine] try to form a better extractelement
Extracting to the same index that we are going to insert back into
allows forming select ("blend") shuffles and enables further transforms.

Admittedly, this is a quick-fix for a more general problem that I'm
hoping to solve by adding transforms for patterns that start with an
insertelement.

But this might resolve some regressions known to be caused by the
extract-extract transform (although I have not gotten more details on
those yet).

In the motivating case from PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724

The combination of subsequent instcombine and codegen transforms gets us this improvement:

  vmovshdup	%xmm0, %xmm2    ## xmm2 = xmm0[1,1,3,3]
  vhaddps	%xmm1, %xmm1, %xmm4
  vmovshdup	%xmm1, %xmm3    ## xmm3 = xmm1[1,1,3,3]
  vaddps	%xmm0, %xmm2, %xmm0
  vaddps	%xmm1, %xmm3, %xmm1
  vshufps	$200, %xmm4, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm4[0,3]
  vinsertps	$177, %xmm1, %xmm0, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm1[2]

  -->

  vmovshdup	%xmm0, %xmm2    ## xmm2 = xmm0[1,1,3,3]
  vhaddps	%xmm1, %xmm1, %xmm1
  vaddps	%xmm0, %xmm2, %xmm0
  vshufps	$200, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm1[0,3]

Differential Revision: https://reviews.llvm.org/D76623
2020-04-03 13:55:13 -04:00
Simon Pilgrim 34a497b765 [X86][SSE] lowerShuffleWithPACK - extend to use chained PACKs for larger truncations
Extend lowerShuffleWithPACK/matchShuffleWithPACK/createPackShuffleMask to handle compaction style shuffle masks that can be lowered to chains of PACKSS/PACKUS if their inputs are suitably sign/zero extended.

This helps avoid PSHUFB (and its mask load) for short shuffle chains, shuffle combining will still replace with a PSHUFB if we have enough shuffles as getFauxShuffleMask should recognise the PACKSS/PACKUS chains.
2020-04-03 18:26:10 +01:00
Roman Lebedev 7d572ef2dd
Revert "[SCEV] rewriteLoopExitValues(): even if have hard uses, still rewrite if cheap (PR44668)"
As discussed in post-commit review in https://reviews.llvm.org/D73501
if the goal of this is to help vectorizer, then we should actually
be teaching vectorizer to do this, because right now this rewrite
is still budget-limited, which isn't what we'd want.

Additionally, while the rest of the patch series was universally profitable,
this particular patch is reportedly (https://reviews.llvm.org/D73501#1905171)
exposing cost-modeling issues on ARM.

So let's just back this particular patch out. Once there's an undo transform,
this could be considered for reintegration.

This reverts commit 44edc6fd2c.
2020-04-03 20:15:04 +03:00
Roman Lebedev 8e7b25bb40
[NFC] Move ARM `opt -indvars` test from Codegen into Transforms
They are really not codegen tests.
2020-04-03 20:15:03 +03:00
Simon Pilgrim 396b1ee0e0 [LoopStrengthReduce] Fix test checks to fix issue reported on D77227 2020-04-03 18:10:33 +01:00
Simon Pilgrim 30053c842c [AArch64] Fix swap-compare-operands test names to fix issue reported on D77354
Load of copy+paste errors in the label checks that needed fixing before the missing ":" could be added
2020-04-03 17:48:18 +01:00
Sanjay Patel 389704cc60 [PhaseOrdering] add shuffle tests based on D40633; NFC
We got some of the potential optimizations with D76727 and D76844.
There are 2 likely enhancements that we could add to -vector-combine
to get most of the remaining cases:
1. Allow bitcasted shuffle mask narrowing (widen the elements).
2. Combine shuffle-of-shuffle into a single shuffle.

This is already partly handled by the x86 backend, but the
tests here show that we still miss some of the potential
combines.
2020-04-03 12:44:49 -04:00
John Brawn 4ad9ca0f9e [ARM] Fix incorrect handling of big-endian vmov.i64
Currently when the target is big-endian vmov.i64 reverses the order of the two
words of the vector. This is correct only when the underlying element type is
32-bit, as actually what it should be doing is considering it a vector of the
underlying type and reversing the elements of that.

Differential Revision: https://reviews.llvm.org/D76515
2020-04-03 17:36:50 +01:00
John Brawn cd58fb6325 [ARM] Avoid pointless vrev of element-wise vmov
If we have an element-wise vmov immediate instruction then a subsequent vrev
with width greater or equal to the vmov element width, then that vrev won't do
anything. Add a DAG combine to convert bitcasts that would become such vrevs
into vector_reg_casts instead.

Differential Revision: https://reviews.llvm.org/D76514
2020-04-03 17:36:50 +01:00
John Brawn 966ae76222 Run update_llc_test on test/CodeGen/ARM/vmov.ll
This is in preparation for D76514
2020-04-03 17:36:50 +01:00
Simon Pilgrim 6d24dd7ed1 [InstSimplify] Regenerate compares tests to fix issue reported on D77354 2020-04-03 17:34:56 +01:00
Simon Pilgrim 43d2fc7ed7 [LoopRotate] Cleanup test checks to fix issue reported on D77354 2020-04-03 17:21:37 +01:00
Simon Pilgrim 5f47f613de [PowerPC] Regenerate f128 test to fix issue reported on D77354
I had to manually edit the file as the update script won't strip checks that don't have the ":" immediately after the prefix
2020-04-03 17:01:28 +01:00
Matt Arsenault 57a55313c3 InstCombine: Reduce minnum/maxnum if inputs are casted 2020-04-03 11:57:25 -04:00
Simon Pilgrim 80d4df5be6 [X86] Remove defunct section checks from emulated TLS tests to fix issue reported on D77354 2020-04-03 16:46:09 +01:00
Simon Pilgrim 40fc3de369 [X86] Fix weak global label issue reported on D77354 2020-04-03 16:15:24 +01:00
Simon Pilgrim 58c242e7b8 [X86] Fix gisel copy tests to fix issue reported on D77354 2020-04-03 16:15:24 +01:00
Simon Pilgrim e9511c0206 [X86] Fix strong local function/global label issue reported on D77354 2020-04-03 16:15:23 +01:00
Simon Pilgrim aa8434fa3d [X86] Cleanup emulated TLS test checks. NFC 2020-04-03 16:15:23 +01:00
Simon Pilgrim 7521f3c2f0 [X86] Regenerate soft fp legalization test to fix issue reported on D77354 2020-04-03 14:56:07 +01:00
Simon Pilgrim 74f00c66dd [X86] Regenerate stack clash test to fix issue reported on D77354 2020-04-03 14:56:06 +01:00
Simon Atanasyan f22445bf57 [mips][test] Remove redundant and invalid `CHECK-NOT` directives. NFC
1. It's redundant to explicitly check that string does not contain `PAUSE_MM`
   token if check that the `PAUSE` token is followed by an angle bracket.
2. The removed `CHECK-NOT` directives do not have trailing colons.
2020-04-03 16:29:39 +03:00
Simon Pilgrim 5e426363ba [X86][AVX] Add tests showing failure to use chained PACKSS/PACKUS for multi-stage compaction shuffles
The sign/zero extended top bits mean that we could use chained PACK*S ops here
2020-04-03 14:16:16 +01:00
Jay Foad c7e1fc8496 [AMDGPU] Fix CHECK lines 2020-04-03 10:07:21 +01:00
Guillaume Chatelet ca11c480e7 [Alignment][NFC] Convert MachineIRBuilder::buildDynStackAlloc to Align
Summary:
The change in IRTranslator is not trivial but is NFC as far as I can tell.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77292
2020-04-03 09:05:19 +00:00
OCHyams 9b56cc9361 [DebugInfo] Salvage debug info when sinking loop invariant instructions
Reviewed By: vsk, aprantl, djtodoro

Differential Revision: https://reviews.llvm.org/D77318
2020-04-03 09:19:26 +01:00
Scott Constable 5b519cf1fc [X86] Add Indirect Thunk Support to X86 to mitigate Load Value Injection (LVI)
This pass replaces each indirect call/jump with a direct call to a thunk that looks like:

lfence
jmpq *%r11

This ensures that if the value in register %r11 was loaded from memory, then
the value in %r11 is (architecturally) correct prior to the jump.
Also adds a new target feature to X86: +lvi-cfi
("cfi" meaning control-flow integrity)
The feature can be added via clang CLI using -mlvi-cfi.

This is an alternate implementation to https://reviews.llvm.org/D75934 That merges the thunk insertion functionality with the existing X86 retpoline code.

Differential Revision: https://reviews.llvm.org/D76812
2020-04-03 00:34:39 -07:00
Sourabh Singh Tomar 69c8fb1c65 [DWARF5] Added support for debug_macro section parsing and dumping in llvm-dwarfdump.
Summary:
This patch adds parsing and dumping DWARFv5 .debug_macro section in llvm-dwarfdump,
it does not introduce any new switch. Existing switch "--debug-macro"
should be used to dump macinfo or macro section.

Reviewed By: dblaikie, ikudrin, jhenderson

Differential Revision: https://reviews.llvm.org/D73086
2020-04-03 12:23:51 +05:30
Xiang1 Zhang fef2dab100 Bugix for buildbot failure at commit 43f031d312
Author: Xiang1 Zhang <xiang1.zhang@intel.com>
Date:   Fri Apr 3 11:25:38 2020 +0800

    Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology)
2020-04-03 13:25:35 +08:00
Scott Constable 71e8021d82 [X86][NFC] Generalize the naming of "Retpoline Thunks" and related code to "Indirect Thunks"
There are applications for indirect call/branch thunks other than retpoline for Spectre v2, e.g.,

https://software.intel.com/security-software-guidance/software-guidance/load-value-injection

Therefore it makes sense to refactor X86RetpolineThunks as a more general capability.

Differential Revision: https://reviews.llvm.org/D76810
2020-04-02 21:55:13 -07:00
laith sakka a0983ed3d2 Handle exp2 with proper vectorization and lowering to SVML calls
Summary:
Add mapping from exp2 math functions
to corresponding SVML calls.

This is a follow up and extension for llvm diff
https://reviews.llvm.org/D19544

Test Plan:
- update test case and run ninja check.
- run tests locally

Reviewers: wenlei, hoyFB, mmasten, mzolotukhin, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77114
2020-04-02 21:11:13 -07:00
Hongtao Yu 88da019977 Fix a bug in the inliner that causes subsequent double inlining
Summary:
A recent change in the instruction simplifier enables a call to a function that just returns one of its parameter to be simplified as simply loading the parameter. This exposes a bug in the inliner where double inlining may be involved which in turn may cause compiler ICE when an already-inlined callsite is reused for further inlining.
To put it simply, in the following-like C program, when the function call second(t) is inlined, its code t = third(t) will be reduced to just loading the return value of the callsite first(). This causes the inliner internal data structure to register the first() callsite for the call edge representing the third() call, therefore incurs a double inlining when both call edges are considered an inline candidate. I'm making a fix to break the inliner from reusing a callsite for new call edges.

```
void top()
{
    int t = first();
    second(t);
}

void second(int t)
{
   t = third(t);
   fourth(t);
}

void third(int t)
{
   return t;
}
```
The actual failing case is much trickier than the example here and is only reproducible with the legacy inliner. The way the legacy inliner works is to process each SCC in a bottom-up order. That means in reality function first may be already inlined into top, or function third is either inlined to second or is folded into nothing. To repro the failure seen from building a large application, we need to figure out a way to confuse the inliner so that the bottom-up inlining is not fulfilled. I'm doing this by making the second call indirect so that the alias analyzer fails to figure out the right call graph edge from top to second and top can be processed before second during the bottom-up.  We also need to tweak the test code so that when the inlining of top happens, the function body of second is not that optimized, by delaying the pass of function attribute deducer (i.e, which tells function third has no side effect and just returns its parameter). Since the CGSCC pass is iterative, additional calls are added to top to postpone the inlining of second to the second round right after the first function attribute deducing pass is done. I haven't been able to repro the failure with the new pass manager since the processing order of ininlined callsites is a bit different, but in theory the issue could happen there too.

Note that this fix could introduce a side effect that blocks the simplification of inlined code, specifically for a call site that can be folded to another call site. I hope this can probably be complemented by subsequent inlining or folding, as shown in the attached unit test. The ideal fix should be to separate the use of VMap. However, in reality this failing pattern shouldn't happen often. And even if it happens, there should be a good chance that the non-folded call site will be refolded by iterative inlining or subsequent simplification.

Reviewers: wenlei, davidxl, tejohnson

Reviewed By: wenlei, davidxl

Subscribers: eraman, nikic, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76248
2020-04-02 21:08:05 -07:00
Xiang1 Zhang 43f031d312 Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology)
Summary:
This patch comes from H.J.'s 2bd54ce7fa

**This patch fix the failed llvm unit tests which running on CET machine. **(e.g. ExecutionEngine/MCJIT/MCJITTests)

The reason we enable IBT at "JIT compiled with CET" is mainly that:  the JIT don't know the its caller program is CET enable or not.
If JIT's caller program is non-CET, it is no problem JIT generate CET code or not.
But if JIT's caller program is CET enabled,  JIT must generate CET code or it will cause Control protection exceptions.

I have test the patch at llvm-unit-test and llvm-test-suite at CET machine. It passed.
and H.J. also test it at building and running VNCserver(Virtual Network Console), it works too.
(if not apply this patch, VNCserver will crash at CET machine.)

Reviewers: hjl.tools, craig.topper, LuoYuanke, annita.zhang, pengfei

Subscribers: tstellar, efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76900
2020-04-03 11:44:07 +08:00
Jessica Paquette 71947ed927 [AArch64][GlobalISel] Constrain reg operands in selectBrJT
This was causing a machine verifier failure on the test suite.

Make sure that we don't end up with a weird register class here.

Failure for reference:

*** Bad machine code: Illegal virtual register for instruction ***
- function:    check_constrain
- basic block: %bb.1  (0x7f8b70839f80)
- instruction: early-clobber %6:gpr64, early-clobber %7:gpr64sp =
  JumpTableDest32 %5:gpr64, %1:gpr64sp, %jump-table.0
- operand 3:   %1:gpr64sp
Expected a GPR64 register, but got a GPR64sp register

Differential Revision: https://reviews.llvm.org/D77349
2020-04-02 20:34:11 -07:00
Wenju He fe8ac0fe51 [x86] Fix Intel OpenCL builtin CalleeSavedRegs on skx
Summary: Align with AVX512 builtins implementations, some of which don't preserve rdi.

Reviewers: yubing, tianqing, craig.topper

Reviewed By: craig.topper

Subscribers: yaxunl, Anastasia, hiraditya

Differential Revision: https://reviews.llvm.org/D77032
2020-04-03 11:27:40 +08:00
Qiu Chaofan 71f1ab5354 [PowerPC] Remove unnecessary XSRSP instruction
MI peephole will remove unnecessary FRSP instructions. This patch
removes such unnecessary XSRSP.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D77208
2020-04-03 11:05:14 +08:00
Austin Kerbow 30f18ed387 [AMDGPU] Handle SMRD signed offset immediate
Summary:
This fixes a few issues related to SMRD offsets. On gfx9 and gfx10 we have a
signed byte offset immediate, however we can overflow into a negative since we
treat it as unsigned.

Also, the SMRD SOFFSET sgpr is an unsigned offset on all subtargets. We
sometimes tried to use negative values here.

Third, S_BUFFER instructions should never use a signed offset immediate.

Differential Revision: https://reviews.llvm.org/D77082
2020-04-02 17:41:52 -07:00
Adrian Prantl 93fe58c9cf Teach the stripNonLineTableDebugInfo pass about the llvm.dbg.label intrinsic.
Debug info for labels is not generated at -gline-tables-only, so this
pass should remove them.

Differential Revision: https://reviews.llvm.org/D77345
2020-04-02 17:39:33 -07:00
Adrian Prantl c024f3ebdc Teach the stripNonLineTableDebugInfo pass about the llvm.dbg.addr intrinsic.
This patch also strips llvm.dbg.addr intrinsics when downgrading debug
info to linetables-only.

Differential Revision: https://reviews.llvm.org/D77343
2020-04-02 17:39:33 -07:00
Nico Weber e875ba1509 Try again to get tests passing again on Windows.
Things pass locally, but some tests on some bots are still unhappy.
I'm not sure why. See if using forward slashes as before helps.
2020-04-02 20:00:38 -04:00
Lang Hames 05598441de Re-apply 0071eaaf08, "[ORC] Export __cxa_atexit ...", with fixes.
Forgot to include part of the testcase. Thank to Nico for spotting that and
reverting!
2020-04-02 16:03:35 -07:00
Matt Arsenault 2680e88069 AMDGPU: Fix broken check lines 2020-04-02 18:52:49 -04:00
Matt Arsenault f68cc2a7ed AMDGPU: Use 128-bit DS operations by default 2020-04-02 17:17:47 -04:00
Matt Arsenault 192cccb152 AMDGPU: Add some tests for exotic denormal mode combinations 2020-04-02 17:17:12 -04:00
Matt Arsenault 5660bb6bc9 AMDGPU: Remove denormal subtarget features
Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.
2020-04-02 17:17:12 -04:00
Matt Arsenault 75cf30918f AMDGPU: Assume f32 denormals are enabled by default
This will likely introduce catastrophic performance regressions on
older subtargets, but should be correct. A follow up change will
remove the old fp32-denormals subtarget features, and switch to using
the new denormal-fp-math/denormal-fp-math-f32 attributes. Frontends
should be making sure to add the denormal-fp-math-f32 attribute when
appropriate to avoid performance regressions.
2020-04-02 17:17:12 -04:00
Nico Weber a16ba6fea2 Reland "Make it possible for lit.site.cfg to contain relative paths, and use it for llvm and clang"
The problem on Windows was that the \b in "..\bin" was interpreted
as an escape sequence. Use r"" strings to prevent that.

This reverts commit ab11b9eefa,
with raw strings in the lit.site.cfg.py.in files.

Differential Revision: https://reviews.llvm.org/D77184
2020-04-02 16:12:03 -04:00
Craig Topper 4fdb63bbf0 [X86] Enable combineExtSetcc for vectors larger than 256 bits when we've disabled 512 bit vectors.
The compares are going to be type legalized to 256 bits so we
might as well fold the extend.
2020-04-02 12:44:27 -07:00
Nico Weber ab11b9eefa Revert "Make it possible for lit.site.cfg to contain relative paths, and use it for llvm and clang"
This reverts commit fb80b6b2d5 and
follow-up 631ee8b24a.

Seems to not work on Windows:
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast/builds/31684
http://lab.llvm.org:8011/builders/llvm-clang-win-x-aarch64/builds/6512

Let's revert while I investigate.
2020-04-02 15:00:09 -04:00
Anna Thomas bf7a16a768 [InlineFunction] Update valid return attributes at callsite within callee body
Consider a callee function that has a call (C) within it which feeds
into the return. When we inline that callee into a callsite that has
return attributes, we can backward propagate valid attributes to the
call (C) within that inlined callee body.

This is safe to do so only if we can guarantee transfer of execution to
successor in the window of instructions between return value (i.e. the
call C) and the return instruction.

Also, this is valid only for attributes which are a property of a
callsite and not those that are not dependent on the ABI, or a property
of the call itself.

Reviewed-By: reames, jdoerfert

Differential Revision: https://reviews.llvm.org/D76140
2020-04-02 14:13:12 -04:00
Matt Arsenault c3d3c22a58 AMDGPU: Hack out noinline on functions using LDS globals
This is a workaround for clang adding noinline to all functions at
-O0. Previously, we would just add alwaysinline, and the verifier
would complain about having both noinline and alwaysinline. We
currently can't truly codegen this case as a freestanding function, so
override the user forcing noinline.
2020-04-02 14:12:07 -04:00
Nico Weber fb80b6b2d5 Make it possible for lit.site.cfg to contain relative paths, and use it for llvm and clang
Currently, all generated lit.site.cfg files contain absolute paths.

This makes it impossible to build on one machine, and then transfer the
build output to another machine for test execution. Being able to do
this is useful for several use cases:

1. When running tests on an ARM machine, it would be possible to build
   on a fast x86 machine and then copy build artifacts over after building.

2. It allows running several test suites (clang, llvm, lld) on 3
   different machines, reducing test time from sum(each test suite time) to
   max(each test suite time).

This patch makes it possible to pass a list of variables that should be
relative in the generated lit.site.cfg.py file to
configure_lit_site_cfg(). The lit.site.cfg.py.in file needs to call
`path()` on these variables, so that the paths are converted to absolute
form at lit start time.

The testers would have to have an LLVM checkout at the same revision,
and the build dir would have to be at the same relative path as on the
builder.

This does not yet cover how to figure out which files to copy from the
builder machine to the tester machines. (One idea is to look at the
`--graphviz=test.dot` output and copy all inputs of the `check-llvm`
target.)

Differential Revision: https://reviews.llvm.org/D77184
2020-04-02 13:53:16 -04:00
Sanjay Patel f4448063cc [InstCombine] try to reduce shuffle with bitcasted operand
shuf (bitcast X), undef, Mask --> bitcast X'

The 'inverse shuffles' test (shuf_bitcast_operand) is a pattern
in the motivating examples from PR35454:
https://bugs.llvm.org/show_bug.cgi?id=35454
(see also D76727)

We can deal with this class of patterns in generic instcombine
because we are not creating any new shuffles, just a bitcast.

Alive2 proof:
http://volta.cs.utah.edu:8080/z/mwDUZf

Differential Revision: https://reviews.llvm.org/D76844
2020-04-02 13:44:50 -04:00
Sanjay Patel b6050ca181 [VectorCombine] transform bitcasted shuffle to narrower elements
bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'

We do not attempt this in InstCombine because we do not want to change
types and create new shuffle ops that are potentially not lowered as
well as the original code. Here, we can check the cost model to see if
it is worthwhile.

I've aggressively enabled this transform even if the types are the same
size and/or equal cost because moving the bitcast allows InstCombine to
make further simplifications.

In the motivating cases from PR35454:
https://bugs.llvm.org/show_bug.cgi?id=35454
...this is enough to let instcombine and the backend eliminate the
redundant shuffles, but we probably want to extend VectorCombine to
handle the inverse pattern (shuffle-of-bitcast) to get that
simplification directly in IR.

Differential Revision: https://reviews.llvm.org/D76727
2020-04-02 13:30:22 -04:00
Stanislav Mekhanoshin f2334a7ef2 [AMDGPU] Fix crash in SILoadStoreOptimizer
SILoadStoreOptimizer::checkAndPrepareMerge() expects base and
paired instruction to come in order and scans MBB from base to
the paired instruction. An original order can be changed if
there were a dependent instruction in between and base instruction
was moved.

Fixed by bailing the optimization. In theory it might be possible
still to perform a merge by swapping instructions, but on practice
it bails anyway because it finds dependency on that same instruction
which has resulted in the base move.

Differential Revision: https://reviews.llvm.org/D77245
2020-04-02 10:26:47 -07:00
Sanjay Patel 12fcbcecff [InstCombine] add tests for cmyk benchmark; NFC
These are versions of a function that regressed with:
rGf2fbdf76d8d0

That particular problem occurs with an instcombine-simplifycfg-instcombine
sequence, but we can show that it exists within instcombine only with
other variations of the pattern.
2020-04-02 13:00:46 -04:00
Sanjay Patel 1008435f3d Revert "[InstCombine] do not exclude min/max from icmp with casted operand fold"
This reverts commit f2fbdf76d8.
As noted in the post-commit thread:
https://reviews.llvm.org/rGf2fbdf76d8d0
...this can obscure a min/max pattern where the components
have extra uses. We can show that the problem is independent
of this change with a slightly modified source example, so
this revert just delays/reduces the need to fix the real
problem.

We need to improve our analysis of negation or -- more
generally -- subtraction using patches like D77230 or D68408.
2020-04-02 09:15:23 -04:00
Tyker c00cb76274 [NFC] Split Knowledge retention and place it more appropriatly
Summary:
Splitting Knowledge retention into Queries in Analysis and Builder into Transform/Utils
allows Queries and Transform/Utils to use Analysis.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77171
2020-04-02 15:01:41 +02:00
Jonas Paulsson 36d4421f50 [LoopDataPrefetch + SystemZ] Let target decide on prefetching for each loop.
This patch adds

- New arguments to getMinPrefetchStride() to let the target decide on a
  per-loop basis if software prefetching should be done even with a stride
  within the limit of the hw prefetcher.

- New TTI hook enableWritePrefetching() to let a target do write prefetching
  by default (defaults to false).

- In LoopDataPrefetch:

  - A search through the whole loop to gather information before emitting any
    prefetches. This way the target can get information via new arguments to
    getMinPrefetchStride() and emit prefetches more selectively. Collected
    information includes: Does the loop have a call, how many memory
    accesses, how many of them are strided, how many prefetches will cover
    them. This is NFC to before as long as the target does not change its
    definition of getMinPrefetchStride().

  - If a previous access to the same exact address was 'read', and the
    current one is 'write', make it a 'write' prefetch.

  - If two accesses that are covered by the same prefetch do not dominate
    each other, put the prefetch in a block that dominates both of them.

  - If a ConstantMaxTripCount is less than ItersAhead, then skip the loop.

- A SystemZ implementation of getMinPrefetchStride().

Review: Ulrich Weigand, Michael Kruse

Differential Revision: https://reviews.llvm.org/D70228
2020-04-02 14:57:46 +02:00
Kang Zhang 9dcac87297 [NFC][PowerPC] Using update_llc_test_checks.py to update atomics-regression.ll 2020-04-02 12:47:35 +00:00
Sanjay Patel a19b27b90e [PhaseOrdering] add test for vector trunc; NFC
See discussion in D76983.
2020-04-02 08:13:19 -04:00
Sanjay Patel ecb048c7ac [InstCombine] add tests for disguised vector trunc; NFC 2020-04-02 08:13:19 -04:00
Djordje Todorovic 5e508b9bac [llvm-dwarfdump] Add the --show-sections-sizes option
Add an option to llvm-dwarfdump to calculate the bytes within
the debug sections. Dump this numbers when using --statistics
option as well.

This is an initial patch (e.g. we should support other units,
since we only support 'bytes' now).

Differential Revision: https://reviews.llvm.org/D74205
2020-04-02 13:14:30 +02:00
Kang Zhang ce8b85c0b8 [NFC][PowerPC] Add a new test case loop-comment.ll 2020-04-02 10:16:02 +00:00
David Green fbd53ffc3a [ARM] MVE VMULL patterns
This adds MVE vmull patterns, which are conceptually the same as
mul(vmovl, vmovl), and so the tablegen patterns follow the same
structure.

For i8 and i16 this is simple enough, but in the i32 version the
multiply (in 64bits) is illegal, meaning we need to catch the pattern
earlier in a dag fold. Because bitcasts are involved in the zext
versions and the patterns are a little different in little and big
endian. I have only added little endian support in this patch.

Differential Revision: https://reviews.llvm.org/D76740
2020-04-02 10:57:40 +01:00
David Green c697dd9ffd [ARM] Make remaining MVE instruction predictable
The unpredictable/hasSideEffects flag is usually inferred by tablegen
from whether the instruction has a tablegen pattern (and that pattern
only has a single output instruction). Now that the MVE intrinsics are
all committed and producing code, the remaining instructions still
marked as unpredictable need to be specially handled. This adds the flag
directly to instructions that need it, notably the V*MLAL instructions
and some of the MOV's.

Differential Revision: https://reviews.llvm.org/D76910
2020-04-02 10:57:40 +01:00
Clement Courbet fb4aa30f27 [ExpandMemCmp] Allow overlaping loads in the zero-relational case.
Summary:
This allows doing `memcmp(p, q, 7)` with 2 loads instead of a call to
memcmp.
This fixes part of PR45147.

Reviewers: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76133
2020-04-02 11:20:47 +02:00
Florian Hahn a63b5c9e53 [CallSiteSplitting] Simplify isPredicateOnPHI & continue checking PHIs.
As pointed out by @thakis, currently CallSiteSplitting bails out after
checking the first PHI node. We should check all PHI nodes, until we
find one where call site splitting is beneficial.

This patch also slightly simplifies the code using BasicBlock::phis().

Reviewers: davidxl, junbuml, thakis

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D77089
2020-04-02 10:11:27 +01:00
Kristof Beyls deb902252a Fix RUN line in AArch64/speculation-hardening.ll 2020-04-02 09:42:15 +01:00
WangTianQing d08fadd662 [X86] Add SERIALIZE instruction.
Summary: For more details about this instruction, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference

Reviewers: craig.topper, RKSimon, LuoYuanke

Reviewed By: craig.topper

Subscribers: mgorny, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77193
2020-04-02 16:19:23 +08:00
Fangrui Song 85adce3d73 [PPCInstPrinter] Change B to print the target address in hexadecimal form
Follow-up of D76591 and D76907
2020-04-01 22:38:24 -07:00
Johannes Doerfert bcd8009369 [Attributor] Use the proper context instruction in genericValueTraversal
There was a TODO in genericValueTraversal to provide the context
instruction and due to the lack of it users that wanted one just used
something available. Unfortunately, using a fixed instruction is wrong
in the presence of PHIs so we need to update the context instruction
properly.

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D76870
2020-04-01 22:20:47 -05:00
Johannes Doerfert a8b2fed0ae [Utils][FIX] Properly deal with occasionally deleted functions
While D68850 allowed functions to be deleted I accidentally saved some
version of the function to be used once a suitable prefix was found.
This turned out to be problematic when the occasionally deleted function
is also occasionally modified. The test case is adjusted to resemble the
case in which the problem was found.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D76586
2020-04-01 21:56:18 -05:00
Johannes Doerfert 9e19693994 [Attributor] Derive better alignment for accessed pointers
Use DL & ABI information for better alignment deduction, e.g., if a type
is accessed and the ABI specifies an alignment requirement for such an
access we can use it. This is based on a patch by @lebedev.ri and
inspired by getBaseAlign in Loads.cpp.

Depends on D76673.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D76674
2020-04-01 21:49:57 -05:00
Nico Weber 5bac8d427d Revert "[ORC] Export __cxa_atexit from the main JITDylib in LLJIT."
This reverts commit 0071eaaf08.
Inputs/noop-main.ll wasn't checked in, so this breaks check-llvm
everywhere.
2020-04-01 22:49:38 -04:00
Johannes Doerfert b1c788d051 [Attributor][FIX] Prevent alignment breakage wrt. must-tail calls
If we have a must-tail call the callee and caller need to have matching
ABIs. Part of that is alignment which we might modify when we deduce
alignment of arguments of either. Since we would need to keep them in
sync, which is not as simple, we simply avoid deducing alignment for
arguments of the must-tail caller or callee.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D76673
2020-04-01 21:40:07 -05:00
Johannes Doerfert f7f9322843 [Attributor][NFC] Cleanup leftover check lines 2020-04-01 21:37:33 -05:00
Lang Hames 0071eaaf08 [ORC] Export __cxa_atexit from the main JITDylib in LLJIT.
Failure to export __cxa_atexit can lead to an attempt to import a definition
from the process itself (if __cxa_atexit is referenced from another JITDylib),
but the process definition will clash with the existing non-exported definition
to produce an unexpected DuplicateDefinitionError.

This patch fixes the immediate issue by exporting __cxa_atexit. It also fixes a
bug where atexit functions in other JITDylibs were not being run by adding a
copy of run_atexits_helper to every JITDylib.

A follow up patch will deal with the bug where definition generators are called
despite a non-exported definition being present.
2020-04-01 19:12:08 -07:00
Sam Clegg 296ccef703 [WebAssembly] EmscriptenEHSjLj: Mark __invoke_ functions as imported
This means the linker will be expect them be undefined at link time an
will generate imports from the `env` module rather than reporting
undefined externals.

Differential Revision: https://reviews.llvm.org/D77192
2020-04-01 16:33:33 -07:00
Lang Hames 8e5a8f620c [ORC] Don't require a null-terminator on MemoryBuffers for objects in archives.
The MemoryBuffer::getMemBuffer method's RequiresNullTerminator parameter
defaults to true, but object files are not null terminated so we need to
explicitly pass false here.
2020-04-01 12:16:38 -07:00
Sanjay Patel 3d90048791 [InstCombine] enhance freelyNegateValue() by handling xor
Negation is equivalent to bitwise-not + 1, so try to convert more
subtracts into adds using this relationship:
0 - (A ^ C) => ((A ^ C) ^ -1) + 1 => A ^ ~C + 1

I doubt this will recover the regression noted in rGf2fbdf76d8d0,
but seems like we're going to need to improve here and/or revive D68408?

Alive2 proofs:
http://volta.cs.utah.edu:8080/z/Re5tMU
http://volta.cs.utah.edu:8080/z/An-uns

Differential Revision: https://reviews.llvm.org/D77230
2020-04-01 15:05:13 -04:00
Sanjay Patel 8431dbacd4 [InstCombine] add tests for negate with xor operand; NFC 2020-04-01 15:05:13 -04:00
Jonathan Roelofs 1148f004fa Fix PR45371: SeparateConstOffsetFromGEP clean up bookkeeping
find() was altering the UserChain, even in cases where it subsequently
discovered that the resulting constant was a 0. This confuses
rebuildWithoutConstOffset() when it attempts to walk the chain later, since it
is expected that the chain itself be a path down the use-def edges of an
expression.
2020-04-01 12:38:15 -06:00
Uday Bondhugula 6ee11c3b0f [NewGVN] Make NewGVN aware of aligned_alloc
Make the New GVN pass aware of aligned_alloc.

Depends on D76975.

Differential Revision: https://reviews.llvm.org/D76976
2020-04-01 23:26:51 +05:30
Uday Bondhugula 4cf70af94f [GVN] Make GVN aware of aligned_alloc
Make the GVN pass aware of aligned_alloc.

Depends on D76974.

Differential Revision: https://reviews.llvm.org/D76975
2020-04-01 23:26:50 +05:30
Uday Bondhugula c4499e3333 [Attributor] Make attributor aware of aligned_alloc for heap to stack conversion
Make the attributor pass aware of aligned_alloc for converting heap
allocations to stack ones.

Depends on D76971.

Differential Revision: https://reviews.llvm.org/D76974
2020-04-01 23:26:50 +05:30
Matt Arsenault 3f465d0d36 AMDGPU: Fix broken check lines 2020-04-01 10:52:22 -07:00
Matt Arsenault 68e283940a AMDGPU/GlobalISel: Switch test to checking final ISA
The naming convention is for unprefixed .ll tests to check the final
ISA instructions.
2020-04-01 13:03:02 -04:00
Matt Arsenault 5e4e8d0388 AMDGPU/GlobalISel: Change intrinsic ID for _L to _LZ opt
Still should handle the other case changes the opcode this way.
2020-04-01 13:03:02 -04:00
Heejin Ahn c87b5e7e22 [WebAssembly] Fix subregion relationship in CFGSort
Summary:
The previous code for determining the innermost region in CFGSort was
not correct. We determine subregion relationship by domination of their
headers, i.e., if region A's header dominates region B's header, B is a
subregion of A. Previously we assumed that if a BB belongs to both a
loop and an exception, the region with fewer number of BBs is the
innermost one. This may not be true, because while WebAssemblyException
contains BBs in all its subregions (loops or exceptions), MachineLoop
may not, because MachineLoop does not contain BBs that don't have a path
to its header even if they are dominated by its header.

                Loop header  <---|
                    |            |
              Exception header   |
                    | \          |
                    A  B         |
                    |   \        |
                    |    C       |
                    |            |
                Loop latch       |
                    |            |
                    -------------|

For example, in this CFG, the loop does not contain B and C, because
they don't have a path back to the loops header. But for CFGSort we
consider the exception here belongs to the loop and the exception should
be a subregion of the loop and scheduled together.

So here we should use `WE->contains(ML->getHeader())` (but not
`ML->contains(WE->getHeader())`, for the stated region above).

This also fixes some comments and deletes `Regions` vector in
`RegionInfo` class, which was not used anywere.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77181
2020-04-01 08:12:41 -07:00
Georgii Rymar f527e6f2e1 [llvm-readobj] - Do not crash when SHT_HASH table is broken.
We have scenarios when the logic of --elf-hash-histogram/--hash-symbols/--hash-table
options might crash when given a broken hash table.

This patch adds pre-checks for tables for these 3 options
and provides test cases.

Differential revision: https://reviews.llvm.org/D77147
2020-04-01 18:03:02 +03:00
Jessica Clarke 616289ed29 [LegalizeTypes][RISCV] Correctly sign-extend comparison for ATOMIC_CMP_XCHG
Summary:
Currently, the comparison argument used for ATOMIC_CMP_XCHG is legalised
with GetPromotedInteger, which leaves the upper bits of the value
undefind. Since this is used for comparing in an LR/SC loop with a
full-width comparison, we must sign extend it. We introduce a new
getExtendForAtomicCmpSwapArg to complement getExtendForAtomicOps, since
many targets have compare-and-swap instructions (or pseudos) that
correctly handle an any-extend input, and the existing function
determines the extension of the result, whereas we are concerned with
the input.

This is related to https://reviews.llvm.org/D58829, which solved the
issue for ATOMIC_CMP_SWAP_WITH_SUCCESS, but not the simpler
ATOMIC_CMP_SWAP.

Reviewers: asb, lenary, efriedma

Reviewed By: asb

Subscribers: arichardson, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, evandro, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74453
2020-04-01 15:51:26 +01:00
Puyan Lotfi e3033c0ce5 [llvm][clang][IFS] Enhancing the llvm-ifs yaml format for symbol lists.
Prior to this change the clang interface stubs format resembled
something ending with a symbol list like this:

 Symbols:
   a: { Type: Func }

This was problematic because we didn't actually want a map format and
also because we didn't like that an empty symbol list required
"Symbols: {}". That is to say without the empty {} llvm-ifs would crash
on an empty list.

With this new format it is much more clear which field is the symbol
name, and instead the [] that is used to express an empty symbol vector
is optional, ie:

Symbols:
 - { Name: a, Type: Func }

or

Symbols: []

or

Symbols:

This further diverges the format from existing llvm-elftapi. This is a
good thing because although the format originally came from the same
place, they are not the same in any way.

Differential Revision: https://reviews.llvm.org/D76979
2020-04-01 10:49:06 -04:00
Simon Pilgrim eb8880562e [X86][SSE] combinePTESTCC - fold TESTZ(X,~Y) -> TESTC(Y,X) 2020-04-01 15:10:53 +01:00
Kai Wang 501522b5b2 [RISCV] Support RISC-V ELF attributes sections in llvm-readobj.
Enable llvm-readobj to handle RISC-V ELF attribute sections.

Differential Revision: https://reviews.llvm.org/D75833
2020-04-01 21:50:11 +08:00
David Green a0c537834a [ARM] Extra vmull loop tests. NFC 2020-04-01 14:07:45 +01:00
shchenz e344f8b9db Revert "[LSR] re-add testcase for wrongly phi node elimination - NFC"
This reverts commit f25a1b4f58.
ARM and hexagon fail at the new added case.
2020-04-01 12:58:06 +00:00
Pierre-vh 2effe8f5e7 [Target][ARM] Improvements to the VPT Block Insertion Pass
This allows the MVE VPT Block insertion pass to remove VPNOTs in
order to create more complex VPT blocks such as TE, TEET, TETE, etc.

Differential Revision: https://reviews.llvm.org/D75993
2020-04-01 12:34:20 +01:00
shchenz f25a1b4f58 [LSR] re-add testcase for wrongly phi node elimination - NFC
Retest the case on X86/SystemZ/AArch64/PowerPC
2020-04-01 11:11:17 +00:00
Cullen Rhodes 84aa6cf1a9 [Transforms][SROA] Promote allocas with mem2reg for scalable types
Summary:
Aggregate types containing scalable vectors aren't supported and as far
as I can tell this pass is mostly concerned with optimisations on
aggregate types, so the majority of this pass isn't very useful for
scalable vectors.

This patch modifies SROA such that mem2reg is run on allocas with
scalable types that are promotable, but nothing else such as slicing is
done.

The use of TypeSize in this pass has also been updated to be explicitly
fixed size. When invoking the following methods in DataLayout:

    * getTypeSizeInBits
    * getTypeStoreSize
    * getTypeStoreSizeInBits
    * getTypeAllocSize

we now called getFixedSize on the resultant TypeSize. This is quite an
extensive change with around 50 calls to these functions, and also the
first change of this kind (being explicit about fixed vs scalable
size) as far as I'm aware, so feedback welcome.

A test is included containing IR with scalable vectors that this pass is
able to optimise.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D76720
2020-04-01 10:34:11 +00:00
Simon Pilgrim 918ccb64b0 [X86][SSE] Handle basic inversion of PTEST/TESTP operands (PR38522)
PTEST/TESTP sets EFLAGS as:
TESTZ: ZF = (Op0 & Op1) == 0
TESTC: CF = (~Op0 & Op1) == 0
TESTNZC: ZF == 0 && CF == 0

If we are inverting the 0'th operand of a PTEST/TESTP instruction we can adjust the comparisons to correct handle the inversion implicitly.

Additionally, for "TESTZ" (ZF) cases, the allones case, PTEST(X,-1) can be simplified to PTEST(X,X).

We can expand this for the TESTZ(X,~Y) pattern and also handle KTEST/KORTEST in the future.

Differential Revision: https://reviews.llvm.org/D76984
2020-04-01 11:33:28 +01:00
shchenz 8b8cd150a4 Revert "[LSR] add testcase for wrongly phi node elimination - NFC"
This reverts commit dbf5e4f6c7.
The testcase has different behaviour on PowerPC and X86.
2020-04-01 10:28:43 +00:00
shchenz dbf5e4f6c7 [LSR] add testcase for wrongly phi node elimination - NFC 2020-04-01 09:58:58 +00:00
Qiu Chaofan d8b51789fd [NFC] [PowerPC] Add test for frsp elimination 2020-04-01 17:54:24 +08:00
Bjorn Pettersson ef49895da8 [X86] Do not assume types are legal in getFauxShuffleMask
Summary:
Make sure we do not assert on value types not being
simple in getFauxShuffleMask when analysing operations
such as "v8i16 = truncate v8i24".

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77136
2020-04-01 11:40:18 +02:00
Georgii Rymar 93fc0ba145 [yaml2obj] - Add NBucket and NChain fields for the SHT_HASH section.
These fields allows to override nchain and nbucket fields of a SHT_HASH section.

Differential revision: https://reviews.llvm.org/D76834
2020-04-01 12:28:16 +03:00
Florian Hahn d307174e1d [ConstantRange] Use APInt::or/APInt::and for single elements.
Currently ConstantRange::binaryAnd/binaryOr results are too pessimistic
for single element constant ranges.

If both operands are single element ranges, we can use APInt's AND and
OR implementations directly.

Note that some other binary operations on constant ranges can cover the
single element cases naturally, but for OR and AND this unfortunately is
not the case.

Reviewers: nikic, spatel, lebedev.ri

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D76446
2020-04-01 09:50:24 +01:00
Florian Hahn e20cac3650 [Matrix] Add new test case with getelementptr constant exprs.
The new test mostly ensures we keep doing the right thing for constant
expressions while lowering matrix instructions.
2020-04-01 09:32:13 +01:00
Qiu Chaofan 95bcab8272 [DAGCombiner] Require ninf for sqrt recip estimation
Currently, DAG combiner uses (fmul (rsqrt x) x) to estimate square
root of x. However, this method would return NaN if x is +Inf, which
is incorrect.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D76853
2020-04-01 16:23:43 +08:00
Florian Hahn 862766e01e [Verifier] Verify matrix dimensions operands match vector size.
This patch adds checks to the verifier to ensure the dimension arguments
passed to the matrix intrinsics match the vector types for their
arugments/return values.

Reviewers: anemet, Gerolf, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D77129
2020-04-01 09:21:39 +01:00
Simon Pilgrim f9f401dba1 [X86][AVX] Add additional 256/512-bit test cases for PACKSS/PACKUS shuffle patterns
Also add lowerShuffleWithPACK call to lowerV32I16Shuffle - shuffle combining was catching it but we avoid a lot of temporary shuffle creations if we catch it at lowering first.
2020-04-01 08:19:03 +01:00
Simon Pilgrim 3c9064ed96 [X86] Run XOP vector rotation tests with/without AVX2
I noticed this while reviewing D77152 - by only testing bdver4 we weren't checking an XOP target that only had AVX1
2020-04-01 08:19:03 +01:00
Kai Luo 8eb40e41f6 [PowerPC] Don't generate ST_VSR_SCAL_INT if power8-vector is disabled
Summary:
In https://bugs.llvm.org/show_bug.cgi?id=45297, it fails selecting
instructions for `PPCISD::ST_VSR_SCAL_INT`. The reason it generate the
`PPCISD::ST_VSR_SCAL_INT` with `-power8-vector` in IR is PPC's
combiner checks `hasP8Altivec` rather than `hasP8Vector`. This patch
should resolve PR45297.

Differential Revision: https://reviews.llvm.org/D76773
2020-04-01 02:15:25 +00:00
Shengchen Kan d0efd7bfcf [X86][MC] Disable Prefix padding after hardcode/prefix
Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight, eli.friedman

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits, annita.zhang

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76475
2020-04-01 09:49:52 +08:00
Matt Arsenault 43e576593e AMDGPU/GlobalISel: Fix insert point when lowering G_FMAD 2020-03-31 19:57:06 -04:00
Evgenii Stepanov f9471b0010 Fix MSan false positive due to select folding.
Summary:
Select folding in JumpThreading can create a conditional branch on a
code patch that did not have one in the original program. This is not a
valid transformation in sanitize_memory functions.

Note that JumpThreading does select folding in 3 different places. Two
of them seem safe - they apply to a select instruction in a BB that ends
with an unconditional branch to another BB, which (in turn) ends with a
conditional branch or a switch with the same condition.

Fixes PR45220.

Reviewers: glider, dvyukov, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76332
2020-03-31 15:25:42 -07:00
Fangrui Song 4af7560b37 [PPCInstPrinter] Print conditional branches as `bt 2, $target` instead of `bt 2, .+$imm`
Follow-up of D76591.

Reviewed By: #powerpc, sfertile

Differential Revision: https://reviews.llvm.org/D76907
2020-03-31 15:05:38 -07:00
Joel E. Denny 8f8c4950fe [FileCheck] Add missing %ProtectFileCheckOutput to FileCheck tests
I'm committing this fixup without review because it's an obvious
continuation of D65121 (committed at f471eb8e99).
2020-03-31 17:29:11 -04:00
Hubert Tong 478af4479a [Object] Update ObjectFile::makeTriple for XCOFF
Summary:
When we encounter an XCOFF file, reflect that in the triple information.
In addition to knowing the object file format, we know that the
associated OS is AIX.

This means that we can expect that there is no output difference in the
processing of an XCOFF32 input file between cases where the triple is
left unspecified by the user and cases where the user specifies
`--triple powerpc-ibm-aix` explicitly.

Reviewers: jhenderson, sfertile, jasonliu, daltenty

Reviewed By: jasonliu

Subscribers: wuzish, nemanjai, hiraditya, MaskRay, rupprecht, steven.zhang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77025
2020-03-31 17:26:30 -04:00
Daniel Frampton 494abe139a [AArch64] Change AArch64 Windows EH UnwindHelp object to be a fixed object
The UnwindHelp object is used during exception handling by runtime
code. It must be findable from a fixed offset from FP.

This change allocates the UnwindHelp object as a fixed object (as is
done for x86_64) to ensure that both the generated code and runtime
agree on the location of the object.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45346

Differential Revision: https://reviews.llvm.org/D77016
2020-03-31 14:21:21 -07:00
Daniel Frampton 522b4c4b88 [AArch64] Fix mismatch in prologue and epilogue for funclets on Windows
The generated code for a funclet can have an add to sp in the epilogue
for which there is no corresponding sub in the prologue.

This patch removes the early return from emitPrologue that was
preventing the sub to sp, and instead conditionalizes the appropriate
parts of the rest of the function.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45345

Differential Revision: https://reviews.llvm.org/D77015
2020-03-31 14:21:18 -07:00
Anna Thomas 58a05675da Revert "[InlineFunction] Handle return attributes on call within inlined body"
This reverts commit 28518d9ae3.
There is a failure in MsgPackReader.cpp when built with clang. It
complains about "signext and zeroext" are incompatible. Investigating
offline if it is infact a UB in the MsgPackReader code.
2020-03-31 16:16:34 -04:00
Eli Friedman dacf8d3562 [AArch64][SVE] Add support for fcmp.
This also requires support for boolean "not", so I added boolean logic
while I was there.

Differential Revision: https://reviews.llvm.org/D76901
2020-03-31 12:04:39 -07:00
Guozhi Wei 6d20937c29 [CodeGenPrepare] Delete intrinsic call to llvm.assume to enable more tailcall
The attached test case is simplified from tcmalloc. Both function calls should be optimized as tailcall. But llvm can only optimize the first call. The second call can't be optimized because function dupRetToEnableTailCallOpts failed to duplicate ret into block case2.

There 2 problems blocked the duplication:

  1 Intrinsic call llvm.assume is not handled by dupRetToEnableTailCallOpts.
  2 The control flow is more complex than expected, dupRetToEnableTailCallOpts can only duplicate ret into its predecessor, but here we have an intermediate block between call and ret.

The solutions:

  1 Since CodeGenPrepare is already at the end of LLVM IR phase, we can simply delete the intrinsic call to llvm.assume.
  2 A general solution to the complex control flow is hard, but for this case, after exit2 is duplicated into case1, exit2 is the only successor of exit1 and exit1 is the only predecessor of exit2, so they can be combined through eliminateFallThrough. But this function is called too late, there is no more dupRetToEnableTailCallOpts after it. We can add an earlier call to eliminateFallThrough to solve it.

Differential Revision: https://reviews.llvm.org/D76539
2020-03-31 11:55:51 -07:00
Stanislav Mekhanoshin 08682dcc86 [AMDGPU] Define 16 bit VGPR subregs
We have loads preserving low and high 16 bits of their
destinations. However, we always use a whole 32 bit register
for these. The same happens with 16 bit stores, we have to
use full 32 bit register so if high bits are clobbered the
register needs to be copied. One example of such code is
added to the load-hi16.ll.

The proper solution to the problem is to define 16 bit subregs
and use them in the operations which do not read another half
of a VGPR or preserve it if the VGPR is written.

This patch simply defines subregisters and register classes.
At the moment there should be no difference in code generation.
A lot more work is needed to actually use these new register
classes. Therefore, there are no new tests at this time.

Register weight calculation has changed with new subregs so
appropriate changes were made to keep all calculations just
as they are now, especially calculations of register pressure.

Differential Revision: https://reviews.llvm.org/D74873
2020-03-31 11:49:06 -07:00
Anna Thomas 28518d9ae3 [InlineFunction] Handle return attributes on call within inlined body
Consider a callee function that has a call (C) within it which feeds
into the return.  When we inline that callee into a callsite that has
return attributes, we can backward propagate those attributes to the
call (C) within that inlined callee body.

This is safe to do so only if we can guarantee transfer of execution to
successor in the window of instructions between return value (i.e. the
call C) and the return instruction.

See added test cases.

Reviewed-By: reames, jdoerfert

Differential Revision: https://reviews.llvm.org/D76140
2020-03-31 14:35:40 -04:00
Ulrich Weigand c726c920e0 [SystemZ] Allow %r0 in address context for AsmParser
Registers used in any address (as well as in a few other contexts)
have special semantics when a "zero" register is used, which is
why the back-end defines extra register classes ADDR32, ADDR64 etc
to be used to prevent the register allocator from using %r0 there.

However, when writing assembler code "by hand", you sometimes need
to trigger that special semantics.  However, currently the AsmParser
will reject %r0 in those places.  In some cases it may be possible
to write that instruction differently - but in others it is currently
not possible at all.

This check in AsmParser simply seems overly strict, so this patch
just removes the check completely.  This brings the behaviour of
AsmParser in line with the GNU assembler as well.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45092
2020-03-31 19:48:50 +02:00
Uday Bondhugula dc817b2dea [InstCombine] Deduce attributes for aligned_alloc in InstCombine
Make InstCombine aware of the aligned_alloc library function.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Depends on D76970.

Differential Revision: https://reviews.llvm.org/D76971
2020-03-31 23:17:28 +05:30
zhizhouy 94d912296d [NFC] Do not run CGProfilePass when not using integrated assembler
Summary:
CGProfilePass is run by default in certain new pass manager optimization pipeline. Assemblers other than llvm as (such as gnu as) cannot recognize the .cgprofile entries generated and emitted from this pass, causing build time error.

This patch adds new options in clang CodeGenOpts and PassBuilder options so that we can turn cgprofile off when not using integrated assembler.

Reviewers: Bigcheese, xur, george.burgess.iv, chandlerc, manojgupta

Reviewed By: manojgupta

Subscribers: manojgupta, void, hiraditya, dexonsmith, llvm-commits, tcwang, llozano

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D62627
2020-03-31 10:31:31 -07:00
Simon Pilgrim 30436a1ce7 [X86][SSE] Add additional PTEST/TESTP inversion tests 2020-03-31 18:02:27 +01:00
Simon Pilgrim 8b925440d1 [X86][SSE] Simplify PTEST/TESTP tests for D76984
We don't need to use an allones for the second operand - test the general case.
2020-03-31 18:02:27 +01:00
Sterling Augustine 21d9d0855b New symbolizer option to print files relative to the compilation directory.
Summary: New "--relative" option to allow printing files relative to the compilation directory.

Reviewers: jhenderson

Subscribers: MaskRay, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76733
2020-03-31 09:29:24 -07:00
Florian Hahn b0cd7b2799 [SCCP] Limit use of range info for binops to integers for now.
This fixes a crash when building the test suite.
2020-03-31 17:08:09 +01:00
Tyker 4aeb7e1ef4 [AssumeBundles] Preserve information in EarlyCSE
Summary: this patch preserve information from various places in EarlyCSE into assume bundles.

Reviewers: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76769
2020-03-31 17:47:04 +02:00
Tyker 7093b92a13 [AssumeBundles] Preserve Information from Load/Store
Summary: This patch preserve dereferenceable, nonnull and alignment from loads and stores.

Reviewers: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76759
2020-03-31 17:47:04 +02:00
Jonas Paulsson f481d48893 [SystemZ] Improve foldMemoryOperandImpl().
Fold MS(G)RKC -> MS(G)C.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D76771
2020-03-31 17:17:51 +02:00
Georgii Rymar b3f13bc165 [obj2yaml] - Teach tool to dump program headers.
Currently obj2yaml does not dump program headers,
this patch teaches it to do that.

Differential revision: https://reviews.llvm.org/D75342
2020-03-31 18:10:19 +03:00
Simon Pilgrim 7e0e5fa499 Revert rGefe59d6717dcdf7777acb9b7a734e1a520bdf22a "[X86][SSE] lowerShuffleWithPACK - extend to use chained PACKs for larger truncations"
This might be causing an issue on the fuchsia-x86_64-linux buildbot - reverting to see what happens.
2020-03-31 15:47:30 +01:00
Simon Pilgrim efe59d6717 [X86][SSE] lowerShuffleWithPACK - extend to use chained PACKs for larger truncations
If canLowerByDroppingEvenElements indicates that the shuffle is a N:1 compaction pattern and the inputs are suitably sign/zero extended then we can use a chain of PACKSS/PACKUS to compact.

This helps avoid PSHUFB (and its mask load) for short shuffle chains, shuffle combining will still replace with a PSHUFB if we have enough shuffles as getFauxShuffleMask can recognise PACKSS/PACKUS chains.
2020-03-31 14:48:48 +01:00
Sanjay Patel fa61b5059a [InstCombine] remove stray auto-generated test comment; NFC
The script now includes extra info about command-line options used
when generating its advertisement heading, but we don't want that
here. This is a special-case because we have enhanced the check
lines (as noted in the 2nd comment line).
2020-03-31 09:19:12 -04:00
Florian Hahn b37543750c [ValueLattice] Distinguish between constant ranges with/without undef.
This patch updates ValueLattice to distinguish between ranges that are
guaranteed to not include undef and ranges that may include undef.

A constant range guaranteed to not contain undef can be used to simplify
instructions to arbitrary values. A constant range that may contain
undef can only be used to simplify to a constant. If the value can be
undef, it might take a value outside the range. For example, consider
the snipped below

define i32 @f(i32 %a, i1 %c) {
  br i1 %c, label %true, label %false
true:
  %a.255 = and i32 %a, 255
  br label %exit
false:
  br label %exit
exit:
  %p = phi i32 [ %a.255, %true ], [ undef, %false ]
  %f.1 = icmp eq i32 %p, 300
  call void @use(i1 %f.1)
  %res = and i32 %p, 255
  ret i32 %res
}

In the exit block, %p would be a constant range [0, 256) including undef as
%p could be undef. We can use the range information to replace %f.1 with
false because we remove the compare, effectively forcing the use of the
constant to be != 300. We cannot replace %res with %p however, because
if %a would be undef %cond may be true but the  second use might not be
< 256.

Currently LazyValueInfo uses the new behavior just when simplifying AND
instructions and does not distinguish between constant ranges with and
without undef otherwise. I think we should address the remaining issues
in LVI incrementally.

Reviewers: efriedma, reames, aqjune, jdoerfert, sstefan1

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D76931
2020-03-31 12:50:20 +01:00
Denis Antrushin 06c58f11a9 [SCEV] Use backedge SCEV of PHI only if its input is loop invariant
For the PHI node

      %1 = phi [%A, %entry], [%X, %latch]

it is incorrect to use SCEV of backedge val %X as an exit value
of PHI unless %X is loop invariant.
This is because exit value of %1 is value of %X at one-before-last
iteration of the loop.

Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D73181
2020-03-31 18:39:24 +07:00
Simon Pilgrim 98357dee1c [X86] Combine concat(palignr,palignr) -> palignr(concat,concat)
combineX86ShufflesRecursively should handle this someday
2020-03-31 11:06:35 +01:00
Daan Sprenkels 464b9aeafe [InstCombine] Transform extelt-trunc -> bitcast-extelt
Canonicalize the case when a scalar extracted from a vector is
truncated.  Transform such cases to bitcast-then-extractelement.
This will enable erasing the truncate operation.

This commit fixes PR45314.

reviewers: spatel

Differential revision: https://reviews.llvm.org/D76983
2020-03-31 11:53:41 +02:00
David Green 2c5f43f9dd [ARM] Fix qdadd operand order
qdadd is defined as sat(Rm + sat(2*Rn)). We had the Rm and Rn switched
the wrong way around.

Differential Revision: https://reviews.llvm.org/D77049
2020-03-31 10:11:36 +01:00
Guillaume Chatelet c9d5c19597 [Alignment][NFC] Transitionning more getMachineMemOperand call sites
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, Jim, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77121
2020-03-31 08:36:18 +00:00
Sebastian Neubauer 5d3a69feca [AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.

This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.

Use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D65088
2020-03-31 10:35:39 +02:00
Florian Hahn 0c9c58ada0 [SCCP] Use constant ranges for casts.
For casts with constant range operands, we can use
ConstantRange::castOp.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71938
2020-03-31 09:22:04 +01:00
Kai Wang 581ba35291 [RISCV] ELF attribute section for RISC-V.
Leverage ARM ELF build attribute section to create ELF attribute section
for RISC-V. Extract the common part of parsing logic for this section
into ELFAttributeParser.[cpp|h] and ELFAttributes.[cpp|h].

Differential Revision: https://reviews.llvm.org/D74023
2020-03-31 16:16:19 +08:00
Djordje Todorovic bcbd60aeb5 [Mips] Make MipsBranchExpansion aware of BBIT family of branch
Octeon branches (bbit0/bbit032/bbit1/bbit132) have an immediate operand,
so it is legal to have such replacement within
MipsBranchExpansion::replaceBranch().

According to the specification, a branch (e.g. bbit0 ) looks like:

bbit0  rs p offset  // p is an immediate operand
  if !rs<p> then branch

Without this patch, an assertion triggers in the method,
and the problem has been found in the real example.

Differential Revision: https://reviews.llvm.org/D76842
2020-03-31 09:20:51 +02:00
Dylan McKay 7b808b105f [AVR] Generalize the previous interrupt bugfix to signal handlers too 2020-03-31 19:33:34 +13:00
Dylan McKay 339b34266c [AVR] Respect the 'interrupt' function attribute
In the past, AVR functions were only lowered with interrupt-specific
machine code if the function was defined with the "avr-interrupt" or
"avr-signal" calling conventions.

This patch modifies the backend so that if the function does not have a
special calling convention, but does have an "interrupt" attribute,
that function is interpreted as a function with interrupts.

This also extracts the "is this function an interrupt" logic from
several disparate places in the backend into one AVRMachineFunctionInfo
attribute.

Bug found by Wilhelm Meier.
2020-03-31 19:00:18 +13:00
Wei Mi ebad678857 [SampleFDO] Port MD5 name table support to extbinary format.
Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression when writing/reading the profile. The patch adds the support in extbinary format. It is off by default but user can choose to enable it.

Note the feature of using MD5 in name table can bring very small chance of name conflict leading to profile mismatch. Besides, profile using the feature won't have the profile remapping support.

Differential Revision: https://reviews.llvm.org/D76255
2020-03-30 22:07:08 -07:00
QingShan Zhang 4eeb56d088 [PowerPC] Don't do the folding if the operand is R0/X0
We have this transformation in PowerPC peephole:

Replace instruction:
  renamable $x28 = ADDI8 renamable $x7, -8
  renamable $x28 = ADD8 killed renamable $x28, renamable $x0
  STFD killed renamable $f0, -8, killed renamable $x28 :: (store 8 into %ir._ind_cast99.epil)
with:
  renamable $x28 = ADDI8 renamable $x7, -16
  STFDX killed renamable $f0, $x0, killed $x28 :: (store 8 into %ir._ind_cast99.epil)

It is invalid as the '$x0' in STFDX is constant 0, not register r0.

Reviewed By: Nemanjai

Differential Revision: https://reviews.llvm.org/D77034
2020-03-31 02:50:19 +00:00
Jessica Paquette d5ee72065b [GlobalISel] Implement identity transforms for x op x -> x
When we have

```
a = G_OR x, x
```

or

```
b = G_AND y, y
```

We can drop the G_OR/G_AND and just use x/y respectively.

Also update arm64-fallback.ll because there was an or in there which hits this
transformation.

Differential Revision: https://reviews.llvm.org/D77105
2020-03-30 18:22:37 -07:00
Juneyoung Lee 519f5c3796 [LegalizeTypes] Add SoftenFloatRes_FREEZE
Summary: This adds SoftenFloatRes_FREEZE.

Reviewers: bkramer, JamesNagurne, craig.topper, efriedma

Reviewed By: craig.topper

Subscribers: AbigailLinden, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76980
2020-03-31 10:16:38 +09:00
Jessica Paquette 63d70ea6a0 [GlobalISel] Combine (x op 0) -> x for operations with a right identity of 0
Implement identity combines for operations like the following:

```
%a = G_SUB %b, 0
```

This can just be replaced with %b.

Over CTMark, this gives some minor size improvements at -O3.

Differential Revision: https://reviews.llvm.org/D76640
2020-03-30 16:49:52 -07:00
Matt Arsenault b8fc192d42 Revert "[GISel]: Fix incorrect IRTranslation while translating null pointer types"
This reverts commit b3297ef051.

This change is incorrect. The current semantic of null in the IR is a
pointer with the bitvalue 0. It is not a cast from an integer 0, so
this should preserve the pointer type.
2020-03-30 19:30:42 -04:00
Daan Sprenkels 5227fa0c72 Recommit "[InstCombine] Update assertions in InstCombine test; NFC" 2020-03-31 00:00:41 +02:00
Matt Arsenault db9f0d1ce5 AMDGPU: Form v_cvt_ubyte* with f16 results
We get 2 conversion instructions anyway. Previously we would get a
conversion with SDWA reading from a byte source, which has a larger
encoding.
2020-03-30 17:59:49 -04:00
Matt Arsenault b27d255e1e AMDGPU/GlobalISel: Form CVT_F32_UBYTE0 2020-03-30 17:45:55 -04:00
Matt Arsenault bcb643c8af AMDGPU/GlobalISel: Handle image atomics 2020-03-30 17:41:04 -04:00
Matt Arsenault 48eda37282 AMDGPU/GlobalISel: Start selecting image intrinsics
Does not handled atomics yet.
2020-03-30 17:33:04 -04:00
Matt Arsenault 570a578e46 AMDGPU: Account for dmask when computing image mem size
Only the number of elements in the dmask will really be accessed.
2020-03-30 17:30:58 -04:00
Jay Foad cee65d51fe AMDGPU: Implement getMemcpyLoopLoweringType
Summary: Based on a patch by Matt Arsenault.

Reviewers: rampitec, kerbowa, nhaehnle, arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77057
2020-03-30 22:21:01 +01:00
Matt Arsenault 2641ba52a9 AMDGPU/GlobalISel: Round up image operations with 5, 6 or 7 addresses
The instruction definitions are missing for these register types, so
round up to 8 like the DAG.
2020-03-30 17:02:47 -04:00
Matt Arsenault 42d5609809 AMDGPU/GlobalISel: Start handling _L to _LZ optimization
We currently don't have a way to map to the equivalent intrinsic
opcode, so track immediate 0s in place of the address for the
selection to know to change the final opcode.
2020-03-30 17:02:30 -04:00
Daan Sprenkels 273b0d7766 Revert "[InstCombine] Update assertions in InstCombine test; NFC"
This reverts commit 4243bd494d.
2020-03-30 22:41:33 +02:00
Daan Sprenkels 4243bd494d [InstCombine] Update assertions in InstCombine test; NFC 2020-03-30 22:15:50 +02:00
Sanjay Patel f2fbdf76d8 [InstCombine] do not exclude min/max from icmp with casted operand fold
InstCombine has a mess of logic that tries to preserve min/max patterns,
but AFAICT, this one is not necessary because we can always narrow the
corresponding select in this sequence to match the narrow compare.

The biggest danger for this patch is inducing infinite looping or
assert from exceeding max iterations. If any bots hit that in the
vicinity of this commit, this is the likely patch to blame.
2020-03-30 16:10:51 -04:00
Eli Friedman 9eb1b41811 [llvm-cov] Improve error message for missing profdata
I got a report recently that a user was having trouble interpreting the
meaning of the error message.  Hopefully this is more readable; produces
something like the following:

error: No such file or directory: Could not read profile data!

Differential Revision: https://reviews.llvm.org/D76796
2020-03-30 12:54:07 -07:00
Matt Arsenault 4919f2e1c5 AMDGPU/GlobalISel: Basic legalize rules for G_FSHR
Only handles easy 32-bit cases.
2020-03-30 11:53:01 -07:00