Commit Graph

151193 Commits

Author SHA1 Message Date
Philip Reames c5e491e6ee [SCEV] Modernize code style of isSCEVExprNeverPoison [NFC]
Use for-range and all_of to make code easier to read in advance of other changes.
2021-09-30 15:13:43 -07:00
Amara Emerson ca8316b704 [GlobalISel] Extend CombinerHelper::matchConstantOp() to match constant splat vectors.
This allows the "x op 0 -> x" fold to optimize vector constant RHSs.

Differential Revision: https://reviews.llvm.org/D110802
2021-09-30 14:31:25 -07:00
Craig Topper a21c557955 [RISCV] Remove Zbproposedc extension
This consists of 3 compressed instructions, c.not, c.neg, and c.zext.w.
I believe these have been picked up by the Zce effort using different
encodings. I don't think it makes sense to keep them in bitmanip. It
will eventually cause a conflict if/when Zce is implemented in llvm.

Differential Revision: https://reviews.llvm.org/D110871
2021-09-30 14:23:05 -07:00
Jon Chesterfield 3247329107 [openmp] Add addrspacecast to getOrCreateIdent
Fixes 51982. Adds a missing CreatePointerCast and allocates a global in
the correct address space.

Test case derived from https://github.com/ROCm-Developer-Tools/aomp/\
blob/aomp-dev/test/smoke/nest_call_par2/nest_call_par2.c by deleting
parts while checking the assertion failure still occurred.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110556
2021-09-30 21:36:31 +01:00
Arnold Schwaighofer 2df2b27d94 [cora async] Cleanup undefined llvm.coro.async.resume
In situations where the coroutine function is not split we can just
replace the async.resume by null.

rdar://82591919

Differential Revision: https://reviews.llvm.org/D110191
2021-09-30 13:26:53 -07:00
Florian Hahn 1fbdbb5595
Revert "Recommit "[SCEV] Look through single value PHIs." (take 2)"
This reverts commit 764d9aa979.

This patch exposed a few additional cases where SCEV expressions are not
properly invalidated.

See PR52024, PR52023.
2021-09-30 20:53:51 +01:00
Maksim Panchenko 050edef853 [MC] Make MCDwarfLineStr class public
Add MCDwarfLineStr class to the public API.

Note that MCDwarfLineTableHeader::Emit(), takes MCDwarfLineStr as
an Optional<> parameter making it impossible to use the API if the class
is not publicly defined.

Reviewed By: alexander-shaposhnikov

Differential Revision: https://reviews.llvm.org/D109412
2021-09-30 12:31:59 -07:00
Albion Fung 4195ed9959 [PowerPC] Improved codegen related to xscvdpsxws/xscvdpuxws
This patch removes the uneccessary mf/mtvsr generated in conjunction
with xscvdpsxws/xscvdpuxws.

Differential revision: https://reviews.llvm.org/D109902
2021-09-30 14:31:00 -05:00
Amara Emerson 80f4bb5c61 [GlobalISel] Extend G_SELECT of known condition combine to vectors.
Adds a new utility function: isConstantOrConstantSplatVector().

Differential Revision: https://reviews.llvm.org/D110786
2021-09-30 12:16:44 -07:00
Sanjay Patel 3fcb00df5d [InstCombine] restrict shift-trunc-shift fold to opposite direction shifts
This is NFCI because the pattern with 2 left-shifts should get
folded independently by smaller folds.

The motivation is to refine this block to avoid infinite loops
seen with D110170.
2021-09-30 15:06:13 -04:00
Nikita Popov b989211d7d [BasicAA] Move more extension logic into ExtendedValue (NFC)
Add methods to appropriately extend KnownBits/ConstantRange there,
same as with APInt. Also clean up the known bits handling by
actually doing that extension rather than checking ZExtBits. This
doesn't matter now, but becomes relevant once truncation is
involved.
2021-09-30 20:45:12 +02:00
Stanislav Mekhanoshin 244aa7f735 [AMDGPU] move hasAGPRs/hasVGPRs into header
It is now very simple and can go right into the header
allowing optimizer to combine callers, such as isVGPRClass
and similar.

It does not need anything from the TRI itself anymore, so
make it static class member along with the callers.

Differential Revision: https://reviews.llvm.org/D110762
2021-09-30 10:02:02 -07:00
Nikita Popov ea02f9caff [BasicAA] Use ExtendedValue in VariableGEPIndex (NFC)
Use the ExtendedValue structure which is used for LinearExpression
in VariableGEPIndex as well.
2021-09-30 18:48:51 +02:00
Adrian Prantl 9232ca4712 Improve the effectiveness of BDCE's debug info salvaging
This patch improves the effectiveness of BDCE's debug info salvaging
by processing the instructions in reverse order and delaying
dropAllReferences until after debug info salvaging. This allows
salvaging of entire chains of deleted instructions!

Previously we would remove all references from an instruction, which
would make it impossible to use that instruction to salvage a later
instruction in the instruction stream, because its operands were
already removed.

This reapplies the previous patch with a fix for a use-after-free.

Differential Revision: https://reviews.llvm.org/D110568
2021-09-30 09:28:49 -07:00
Kazu Hirata f631173d80 [llvm] Migrate from arg_operands to args (NFC)
Note that arg_operands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-09-30 08:51:21 -07:00
Anna Thomas 6f2d01376d [LoopPredication] Remove unused variable
After rG452714f8f8037ff37f9358317651d1652e231db2, the Function `F` retrieved in LoopPredication is not used.
Remove this unused variable to stop some buildbots (ASAN, clang-ppc) from failing.
2021-09-30 10:40:47 -04:00
Anna Thomas 452714f8f8 [BPI] Keep BPI available in loop passes through LoopStandardAnalysisResults
This is analogous to D86156 (which preserves "lossy" BFI in loop
passes). Lossy means that the analysis preserved may not be up to date
with regards to new blocks that are added in loop passes, but BPI will
not contain stale pointers to basic blocks that are deleted by the loop
passes.

This is achieved through BasicBlockCallbackVH in BPI, which calls
eraseBlock that updates the data structures in BPI whenever a basic
block is deleted.

This patch does not have any changes in the upstream pipeline, since
none of the loop passes in the pipeline use BPI currently.
However, since BPI wasn't previously preserved in loop passes, the loop
predication pass was invoking BPI *on the entire
function* every time it ran in an LPM.  This caused massive compile time
in our downstream LPM invocation which contained loop predication.

See updated test with an invocation of a loop-pipeline containing loop
predication and -debug-pass turned ON.

Reviewed-By: asbirlea, modimo
Differential Revision: https://reviews.llvm.org/D110438
2021-09-30 10:27:05 -04:00
David Green f9aa8623fe [ARM] Add more MVE intrinsics to sink splats to
This adds a few more unpredicated intrinsics to sink splats to, in order
to create more qr instruction variants. Notably this includes
saddsat/uaddsat but also some of the unpredicated mve intrinsics.

Differential Revision: https://reviews.llvm.org/D110333
2021-09-30 14:41:23 +01:00
Brock Wyma bafd8b1add [CodeView] Recognize Fortran95 as Fortran instead of MASM
Map Fortran95 sources to Fortran so the CodeView language is not emitted as
MASM.

Differential Revision: https://reviews.llvm.org/D110330
2021-09-30 09:27:05 -04:00
Jingu Kang 13f3c39f36 Second Recommit "[AArch64] Split bitmask immediate of bitwise AND operation"
This reverts the revert commit c07f709969 with
bug fixes.

Differential Revision: https://reviews.llvm.org/D109963
2021-09-30 09:27:08 +01:00
Jay Foad 156d7d2df7 [LiveIntervals] Remove unused subreg ranges in repairIntervalsInRange
If the old instructions mentioned a subreg that the new instructions do
not, remove the subrange for that subreg.

For example, in TwoAddressInstructionPass::eliminateRegSequence, if a
use operand in the REG_SEQUENCE has the undef flag then we don't
generate a copy for it so after the elimination there should be no live
interval at all for the corresponding subreg of the def.

This is a small step towards switching TwoAddressInstructionPass over
from LiveVariables to LiveIntervals. Currently this path is only tested
if you explicitly enable -early-live-intervals.

Differential Revision: https://reviews.llvm.org/D110542
2021-09-30 09:15:10 +01:00
Clement Courbet 455b60ccfb [AA] Teach BasicAA to recognize basic GEP range information.
The information can be implicit (from `ValueTracking`) or explicit.

This implements the backend part of the following RFC
https://groups.google.com/g/llvm-dev/c/T9o51zB1JY.

We still need to settle on how to best represent the information in the
IR, but this is a separate discussion.

Differential Revision: https://reviews.llvm.org/D109746
2021-09-30 08:29:32 +02:00
Ruiling Song 52785989e9 AMDGPU: Broadcast scalar boolean to vector boolean explicitly
This is used to fix wrong code generation of s_add_co_select_user in
test/CodeGen/AMDGPU/expand-scalar-carry-out-select-user.ll

  s_addc_u32 s4, s6, 0
  s_cselect_b64 vcc, 1, 0    <-- vcc set as 0x1 if SCC==1
  v_mov_b32_e32 v1, s4
  s_cmp_gt_u32 s6, 31
  v_cndmask_b32_e32 v1, 0, v1, vcc

If the s_addc_u32 set SCC, then we will get value 0x1 in VCC.
The v_cndmask will do per thread selection with VCC as condition
register. As VCC only gets the first bit being set, only the first
thread/lane in destination register can get correct result if the
very first lane is active. In fact, we should broadcast the value to all
active lanes of the final register.

The idea here is doing this broadcast to vector boolean explicitly
instead of lowering it into a COPY from SCC which would be interpreted as
selecting between 0/1.

This is used to replace D109754.

Reviewed-by: foad, alex-t

Differential Revision: https://reviews.llvm.org/D109889
2021-09-30 10:15:01 +08:00
Fangrui Song 8971b99c83 [llvm-objdump/llvm-readobj/obj2yaml/yaml2obj] Support STO_RISCV_VARIANT_CC and DT_RISCV_VARIANT_CC
STO_RISCV_VARIANT_CC marks that a symbol uses a non-standard calling
convention or the vector calling convention.

See https://github.com/riscv/riscv-elf-psabi-doc/pull/190

Differential Revision: https://reviews.llvm.org/D107949
2021-09-29 16:56:52 -07:00
Amara Emerson 1c0e8a98e4 [AArch64][GlobalISel] Widen G_BUILD_VECTOR source & dest element types to s8. 2021-09-29 15:11:30 -07:00
Nikita Popov 2898101552 [BasicAA] Move DecomposedGEP out of header (NFC)
It's sufficient to have a forward declaration in the header, we
can move the definition of the struct (and VariableGEPIndex)
in the source file.
2021-09-29 23:45:15 +02:00
Nikita Popov 45288edb65 [BasicAA] Pass whole DecomposedGEP to subtraction API (NFC)
Rather than separately handling subtraction of offset and variable
indices, make this one operation. Also rewrite the implementation
to use range-based for loops.
2021-09-29 23:32:15 +02:00
Sam McCall 22555bafe9 [VFS] InMemoryFilesystem's UniqueIDs are a function of path and content.
This ensures that re-creating "the same" FS results in the same UIDs for files.
In turn, this means that creating a clang module (preamble) using one in-memory
filesystem and consuming it using another doesn't create duplicate FileEntrys
for files that are the same in both FSes.

It's tempting to give the creator control over the UIDs instead. However that
requires fiddly API changes, e.g. what should the UIDs of intermediate
directories be?
This change is more "magic" but seems safe given:
 - InMemoryFilesystem is used in testing more than production
 - comparing UIDs across filesystems is unusual
 - files with the same path and content are usually logically equivalent

(The usual reason for re-creating virtual filesystems rather than reusing them
is that typical use involves mutating their CWD and so is not threadsafe).

Differential Revision: https://reviews.llvm.org/D110711
2021-09-29 23:24:18 +02:00
Ricky Taylor e1e3b6ee72 [M68k] Avoid UB in disassembler
When reading 32 bits a 32-bit shift would be executed.

This is undefined behaviour, but in this case we can just replace the
entire scratch value to avoid it.

Differential Revision: https://reviews.llvm.org/D110769
2021-09-29 22:07:14 +01:00
Nikita Popov 49813f7fbf [BasicAA] Pass DecomposedGEP to constantOffsetHeuristic() (NFC)
Rather than separately passing VarIndices and BaseOffset, pass
the whole DecomposedGEP.
2021-09-29 22:23:27 +02:00
Joseph Huber c11ebfea6d [OpenMP][NFC] Fix linting messages in OpenMPOpt
Summary:
This patch addresses some linting messages I keep getting in my editor
when working on OpenMPOpt.
2021-09-29 16:07:33 -04:00
Joseph Huber 87ce7e65f2 [OpenMP] Add missing distribute definitions to AAKernelInfo
Summary:
The RTL functions added in https://reviews.llvm.org/D110429 were
mistakenly left out from the list of safe runtime calls in AAKernelInfo.
This patch adds them in.
2021-09-29 16:06:34 -04:00
Wael Yehia 8b8da01d88 Revert "[LTO][Legacy] Add -debug-pass-manager option to enable pass run/skip trace."
This reverts commit a60405cf03.
2021-09-29 19:43:35 +00:00
Stefan Pintilie fb4e44c4e7 [PowerPC] The builtins load8r and store8r are Power 7 plus.
This patch makes sure that the builtins __builtin_ppc_load8r and
__ builtin_ppc_store8r are only available for Power 7 and up.
Currently the builtins seem to produce incorrect code if used for
Power 6 or before.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D110653
2021-09-29 14:34:40 -05:00
Sjoerd Meijer 367df18050 [LoopFlatten] Bail if we can't perform flattening after IV widening
It can happen that after widening of the IV, flattening may not be possible,
e.g. when it is deemed unprofitable. We were not properly checking this, which
resulted in flattening being applied when it shouldn't, also leading to
incorrect results (miscompilation).

This should fix PR51980 (https://bugs.llvm.org/show_bug.cgi?id=51980)

Differential Revision: https://reviews.llvm.org/D110712
2021-09-29 19:53:34 +01:00
Roman Lebedev 2d42a192e0
[X86][Costmodel] Load/store i8 Stride=2 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/xz6x7c35P - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.5`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/xz6x7c35P - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110709
2021-09-29 21:52:45 +03:00
Roman Lebedev bac60c55e0
[X86][Costmodel] Load/store i8 Stride=2 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/a9hv4z47v - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: =2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/6GfPn1b79 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `3`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110708
2021-09-29 21:52:45 +03:00
Roman Lebedev 1962185671
[X86][Costmodel] Load/store i8 Stride=2 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

Identical to VF=2.

For load we have:
https://godbolt.org/z/4TEbdzbMM - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/MYfzGPf3Y - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5`
So pick cost of `1`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110705
2021-09-29 21:52:45 +03:00
Roman Lebedev 08face1f9a
[X86][Costmodel] Load/store i8 Stride=2 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

Identical to VF=2.

For load we have:
https://godbolt.org/z/sGE41GYo7 - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/ba5r3s9xa - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5`
So pick cost of `1`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110704
2021-09-29 21:52:45 +03:00
Roman Lebedev 7d52628eb0
[X86][Costmodel] Load/store i8 Stride=2 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/caKqjr9hb - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/6TTn3eKj8 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5`
So pick cost of `1`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110702
2021-09-29 21:52:44 +03:00
Wesley Wiser 2dd883439c [Mangler] Calculate the argument list byte count suffix correctly when returning large values
`__stdcall`, `__fastcall` and `__vectorcall` return large values via a
hidden pointer argument. However, the size of that argument should not
be included in the argument list byte count suffix added to the
function's decorated name.

This patch fixes that issue so that LLVM generates the same decorated
name as MSVC does.

MSVC example: https://godbolt.org/z/nc35MKPhr

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D110719
2021-09-29 11:42:28 -07:00
Jay Foad f9b68304a2 [AMDGPU] Enable machine verification after AMDGPUISelDAGToDAG
This was introduced in D32628 but it does not seem to be required any
more. At least it does not show any problems in check-llvm in an
LLVM_ENABLE_EXPENSIVE_CHECKS build.

Differential Revision: https://reviews.llvm.org/D110692
2021-09-29 18:47:19 +01:00
Sanjay Patel 4414e2ad97 [InstSimplify] (-1 << x) s>> x --> -1
This was noticed in:
https://llvm.org/PR51351

https://alive2.llvm.org/ce/z/aLxunD
2021-09-29 13:03:12 -04:00
Kazu Hirata 9a640a1cb8 [AArch64] Remove redundant declaration createAArch64ObjectTargetStreamer (NFC)
Note that createAArch64ObjectTargetStreamer is declared in
AArch64TargetStreamer.h and defined in AArch64TargetStreamer.cpp.

Identified with readability-redundant-declaration.
2021-09-29 09:08:41 -07:00
David Green e9adcbde31 [AArch64] Model Cortex-A55 Q register NEON instructions
Cortex-A55 has 2 64bit NEON vector units, meaning a 128bit instruction
requires taking both units (and can only be issued as the first
instruction in a dual issue pair). This patch models that by splitting
the WriteV SchedWrite into two - the WriteVd that reads/writes only
64bit operands, and the WriteVq that read/writes 128bit registers. The
A55 schedule then uses this distinction to model the WriteVq as taking
both resource units, and starting a Schedule Group and WriteVd as taking
one as before.

I believe this is more correct, even if it does not lead to much better
performance.

Differential Revision: https://reviews.llvm.org/D108766
2021-09-29 16:55:31 +01:00
Sanjay Patel ea56dcb730 [InstCombine] fix miscompile from dropRedundantMaskingOfLeftShiftInput()
The test is from https://llvm.org/PR51351.

There are 2 related logic bugs from over-generalizing "lshr" to "any shr",
but I'm not sure how to expose the difference for "MaskC" because instsimplify
already folds ashr of -1.

I'll extend instsimplify to catch the MaskD pattern as a follow-up, but this
patch should be enough to avoid the miscompile.
2021-09-29 11:43:18 -04:00
Jay Foad 9886f21bc1 [MSP430] Recognize Bi as an indirect branch in analyzeBranch. NFC.
Recognize Bi as an unconditional branch, just like JMP. This allows
machine verification to run after MSP430BranchSelector without failing
this assertion:

virtual bool llvm::MSP430InstrInfo::analyzeBranch(llvm::MachineBasicBlock &, llvm::MachineBasicBlock *&, llvm::MachineBasicBlock *&, SmallVectorImpl<llvm::MachineOperand> &, bool) const: Assertion `I->getOpcode() == MSP430::JCC && "Invalid conditional branch"' failed.

Note that machine verification is currently disabled after
addPreEmitPass passes because of problems on other targets, so this is
currently NFC.

Differential Revision: https://reviews.llvm.org/D110691
2021-09-29 16:43:11 +01:00
Simon Pilgrim 676f2809b5 [CostModel][AArch64] Don't dereference CostTblEntry before null check.
Fix static analysis warning that we check for null Entry after dereferencing it.

I don't think this can actually happen as i8/i16 should legalize to use the i32 path which should return a cost - but I'd rather play it safe that rely on an implicit type legalization.
2021-09-29 16:35:29 +01:00
Sam Clegg 210cbcf476 [WebAssemlby][Object] Fix dead code in WasmObjectFile.cpp
I introduced this by mistake in https://reviews.llvm.org/D109595.

Differential Revision: https://reviews.llvm.org/D110717
2021-09-29 08:09:57 -07:00
David Green 8a645fc44b [AArch64] Enable type promotion for AArch64
This enables the type promotion pass for AArch64, which acts as a
CodeGenPrepare pass to promote illegal integers to legal ones,
especially useful for removing extends that would otherwise require
cross-basic-block analysis.

I have enabled this generally, for both ISel and GlobalISel. In some
quick experiments it appeared to help GlobalISel remove extra extends in
places too, but that might just be missing optimizations that are better
left for later. We can disable it again if required.

In my experiments, this can improvement performance in some cases, and
codesize was a small improvement. SPEC was a very small improvement,
within the noise. Some of the test cases show extends being moved out of
loops, often when the extend would be part of a cmp operand, but that
should reduce the latency of the instruction in the loop on many cpus.
The signed-truncation-check tests are increasing as they are no longer
matching specific DAG combines.

We also hope to add some additional improvements to the pass in the near
future, to capture more cases of promoting extends through phis that
have come up in a few places lately.

Differential Revision: https://reviews.llvm.org/D110239
2021-09-29 15:13:12 +01:00