Commit Graph

149978 Commits

Author SHA1 Message Date
Nikita Popov e918ba6958 [LICM] Drop -licm-n2-threshold option
This was a diagnostic option used to demonstrate a weakness in
the AST-based LICM implementation. This problem does not exist
in the MSSA-based LICM implementation, which has been enabled
for a long time now. As such, this option is no longer relevant.
2021-08-17 22:41:31 +02:00
Nikita Popov f58a642da1 [PassBuilder] Use loop-mssa for licm
Currently specifying -licm or -passes=licm will implicitly create
-passes=loop(licm). This does not match the intended default (used
by the legacy PM and by the default pipeline) of using the
MemorySSA-based LICM implementation. As I plan to drop the non-MSSA
implementation, this will stop working entirely...

This special-cases licm to create a loop-mssa manager instead. At
this point it's still possible to use -passes='loop(licm)' to opt
into the AST-based implementation.

Differential Revision: https://reviews.llvm.org/D108155
2021-08-17 21:23:11 +02:00
Sanjay Patel 50c1138796 [InstCombine] add TODO about another min/max fold; NFC
Suggested in post-commit for d0975b7cb0
2021-08-17 14:14:25 -04:00
Craig Topper 8f6cea43e7 [RISCV] Use RISCV::RVVBitsPerBlock for RGK_ScalableVector in getRegisterBitWidth.
I might be wrong, but I think this is should be width of the known
min size we use for scalable vectors. It shouldn't scale with
minimum vlen.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107945
2021-08-17 11:13:15 -07:00
Simon Pilgrim d7f288502f SelectionDAGBuilder::visitInlineAsm - don't dereference dyn_cast<> results.
dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct.

Fixes static analyser warning.
2021-08-17 18:40:59 +01:00
Simon Pilgrim caff2acae1 [AArch64] AArch64DAGToDAGISel::tryReadRegister/tryWriteRegister - don't dereference dyn_cast<> results.
dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct.

Fixes static analyser warnings.
2021-08-17 18:40:59 +01:00
Simon Pilgrim 1e770f0388 [ARM] ARMDAGToDAGISel::tryReadRegister/tryWriteRegister - don't dereference dyn_cast<> results.
dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct.

Fixes static analyser warnings.
2021-08-17 18:40:59 +01:00
Simon Pilgrim fb81271e8b [AMDGPU] Fix lowering of AMDGPU::G_CTTZ_ZERO_UNDEF to AMDGPU::G_AMDGPU_FFBL_B32
As mentioned on D107474, there was a copy+paste typo repeating G_CTLZ_ZERO_UNDEF that coverity reported as dead code.

Differential Revision: https://reviews.llvm.org/D108210
2021-08-17 18:09:57 +01:00
Fraser Cormack f3e9047249 [VP] Add vector-predicated reduction intrinsics
This patch adds vector-predicated ("VP") reduction intrinsics corresponding to
each of the existing unpredicated `llvm.vector.reduce.*` versions. Unlike the
unpredicated reductions, all VP reductions have a start value. This start value
is returned when the no vector element is active.

Support for expansion on targets without native vector-predication support is
included.

This patch is based on the ["reduction
slice"](https://reviews.llvm.org/D57504#1732277) of the LLVM-VP reference patch
(https://reviews.llvm.org/D57504).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104308
2021-08-17 17:56:35 +01:00
Joseph Huber 339aa76526 [OpenMP][NFC] Add option to print module after OpenMPOpt for debugging
This patch adds an extra option to print the module after running one of
the OpenMPOpt passes if debugging is enabled. This makes it much easier
to inspect the effects of this pass when doing debugging.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108146
2021-08-17 12:46:10 -04:00
Philip Reames 982da7a20c [SCEVExpander] Stop hoisting IR when reusing phis
his is a fix for PR43678, and is an alternate patch to D105723.

The basic issue we're running into is that LSR + SCEVExpander are moving the very instruction whose operand we're in the process of expanding. This breaks the subtle and ill-documented invariant which let LSR work. (Full story can be found here: https://reviews.llvm.org/D105723#2878473)

Rather than attempting a fix, this change just removes the optimization entirely. The code is entirely untested, and removing it appears to have no impact I can find.  This code was added back in 2014 by 1e12f8563d with a single test which does not seem to actually test the hoisting logic.

From a philosophical standpoint, it also seems very strange to have the expander implementing optimizations which should live in a dedicated transform pass.

Differential Revision: https://reviews.llvm.org/D106178
2021-08-17 09:38:32 -07:00
Fangrui Song 78cb1adc5c [Object] Move llvm-nm's symbol version utility to ELFObjectFile::readDynsymVersions
The utility can be reused by llvm-objdump -T.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D108096
2021-08-17 09:06:39 -07:00
Roman Lebedev 2078c4ecfd
[X86] Lower insertions into upper half of an 256-bit vector as broadcast+blend (PR50971)
Broadcast is not worse than extract+insert of subvector.
https://godbolt.org/z/aPq98G6Yh

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D105390
2021-08-17 18:45:10 +03:00
Tozer 5c6f748cbc [MCParser] Correctly handle CRLF line ends when consuming line comments
Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=47983

The AsmLexer currently has an issue with lexing line comments in files
with CRLF line endings, in which it reads the carriage return as being
part of the line comment. This causes an error for certain valid comment
layouts; this patch fixes this by excluding the carriage return from the
line comment.

Differential Revision: https://reviews.llvm.org/D90234
2021-08-17 15:52:51 +01:00
Kazu Hirata a14920c002 [Bitcode] Remove unused declaration writeBitcodeHeader (NFC)
The corresponding definition was removed on Nov 29, 2016 in commit
5a0a2e648c.
2021-08-17 07:10:51 -07:00
Dylan Fleming ef198cd99e [SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute
Removed AArch64 usage of the getMaxVScale interface, replacing it with
the vscale_range(min, max) IR Attribute.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D106277
2021-08-17 14:42:47 +01:00
David Green 52e0cf9d61 [ARM] Enable subreg liveness
This enables subreg liveness in the arm backend when MVE is present,
which allows the register allocator to detect when subregister are
alive/dead, compared to only acting on full registers. This can helps
produce better code on MVE with the way MQPR registers are made up of
SPR registers, but is especially helpful for MQQPR and MQQQQPR
registers, where there are very few "registers" available and being able
to split them up into subregs can help produce much better code.

Differential Revision: https://reviews.llvm.org/D107642
2021-08-17 14:10:33 +01:00
David Green 62e892fa2d [ARM] Add MQQPR and MQQQQPR spill and reload pseudo instructions
As a part of D107642, this adds pseudo instructions for MQQPR and
MQQQQPR register classes, that can spill and reloads entire registers
whilst keeping them combined, not splitting them into multiple D subregs
that a VLDMIA/VSTMIA would use. This can help certain analyses, and
helps to prevent verifier issues with subreg liveness.
2021-08-17 13:51:34 +01:00
Sanjay Patel e73f4e1123 [InstCombine] remove unused function argument; NFC 2021-08-17 08:10:42 -04:00
Sanjay Patel d0975b7cb0 [InstCombine] fold signed min/max intrinsics with negated operands
If both operands are negated, we can invert the min/max and do
the negation after:
smax (neg nsw X), (neg nsw Y) --> neg nsw (smin X, Y)
smin (neg nsw X), (neg nsw Y) --> neg nsw (smax X, Y)

This is visible as a remaining regression in D98152. I don't see
a way to generalize this for 'unsigned' or adapt Negator to
handle it. This only appears to be safe with 'nsw':
https://alive2.llvm.org/ce/z/GUy1zJ

Differential Revision: https://reviews.llvm.org/D108165
2021-08-17 08:10:42 -04:00
Sebastian Neubauer fbae34635d [GlobalISel] Add combine for PTR_ADD with regbanks
Combine two G_PTR_ADDs, but keep the register bank of the constant.
That way, the combine can be used in post-regbank-select combines.

Introduce two helper methods in CombinerHelper, getRegBank and
setRegBank that get and set an optional register bank to a register.
That way, they can be used before and after register bank selection.

Differential Revision: https://reviews.llvm.org/D103326
2021-08-17 13:58:16 +02:00
Tiehu Zhang 9cfa9b44a5 [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block
In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree,
which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107262
2021-08-17 18:58:15 +08:00
Jeremy Morse 708cbda577 [DebugInfo][InstrRef] Honour too-much-debug-info cutouts
This reapplies 54a61c94f9, its follow up in 547b712500, which were
reverted 95fe61e639. Original commit message:

VarLoc based LiveDebugValues will abandon variable location propagation if
there are too many blocks and variable assignments in the function. If it
didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end
up with 1 million DBG_VALUEs just at the start of blocks.

Instruction-referencing LiveDebugValues should honour this limitation too
(because the same limitation applies to it). Hoist the relevant command
line options into LiveDebugValues.cpp and pass it down into the
implementation classes as an argument to ExtendRanges. I've duplicated all
the run-lines in live-debug-values-cutoffs.mir to have an
instruction-referencing flavour.

Differential Revision: https://reviews.llvm.org/D107823
2021-08-17 11:34:49 +01:00
Simon Pilgrim 895ed64009 [AArch64] LowerCONCAT_VECTORS - merge getNumOperands() calls. NFCI.
Improves on the unused variable fix from rG4357562067003e25ab343a2d67a60bd89cd66dbf
2021-08-17 11:23:03 +01:00
Anton Afanasyev 1f3e35b6d1 [AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAG
Add `shl` instruction to the DAG post-dominated by `trunc`, allowing
TruncInstCombine to reduce bitwidth of expressions containing left shifts.

The only thing we need to check is that the target bitwidth must be wider
than the maximal shift amount: https://alive2.llvm.org/ce/z/AwArqu

Part of https://reviews.llvm.org/D107766

Differential Revision: https://reviews.llvm.org/D108091
2021-08-17 12:44:37 +03:00
Bing1 Yu bcec4ccd04 [X86] [AMX] Replace bitcast with specific AMX intrinsics with X86 specific cast.
There is some discussion on the bitcast for vector and x86_amx at https://reviews.llvm.org/D99152. This patch is to introduce a x86 specific cast for vector and x86_amx, so that it can avoid some unnecessary optimization by middle-end. On the other way, we have to optimize the x86 specific cast by ourselves. This patch also optimize the cast operation to eliminate redundant code.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D107544
2021-08-17 17:04:26 +08:00
David Stuttard ebdb0d09a4 AMDGPU: During img instruction ret value construction cater for non int values
Make sure return type is int type.

Differential Revision: https://reviews.llvm.org/D108131

Change-Id: Ic02f07d1234cd51b6ed78c3fecd2cb1d6acd5644
2021-08-17 09:08:24 +01:00
Kazu Hirata 8f5e9d65d6 [AsmParser] Remove MDConstant (NFC)
The last use was removed on Sep 22, 2016 in commit
fcee2d8001.
2021-08-16 21:21:11 -07:00
Whitney Tsang a41c95c0e3 [LNICM] Fix infinite loop
There is a bug introduced by https://reviews.llvm.org/D107219 which causes an infinite loop, when there are more than 2 levels PHINode chain.

Reviewed By: uint256_t

Differential Revision: https://reviews.llvm.org/D108166
2021-08-17 12:55:22 +09:00
Christudasan Devadasan 686607676f [AMDGPU] Skip pseudo MIs in hazard recognizer
Instructions like WAVE_BARRIER and SI_MASKED_UNREACHABLE
are only placeholders to prevent certain unwanted
transformations and will get discarded during assembly
emission. They should not be counted during nop insertion.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108022
2021-08-16 23:11:14 -04:00
Carl Ritson 99c790dc21 [AMDGPU] Make BVH isel consistent with other MIMG opcodes
Suffix opcodes with _gfx10.
Remove direct references to architecture specific opcodes.
Add a BVH flag and apply this to diassembly.
Fix a number of disassembly errors on gfx90a target caused by
previous incorrect BVH detection code.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108117
2021-08-17 10:42:22 +09:00
Hongtao Yu f27fee623d [SamplePGO][NFC] Dump function profiles in order
Sample profiles are stored in a string map which is basically an unordered map. Printing out profiles by simply walking the string map doesn't enforce an order. I'm sorting the map in the decreasing order of total samples to enable a more stable dump, which is good for comparing two dumps.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D108147
2021-08-16 17:22:30 -07:00
Arthur Eubanks 0d822da2bd [NFC] Remove/replace some confusing attribute getters on Function 2021-08-16 16:12:37 -07:00
Min-Yih Hsu eec3495a9d [M68k] Do not pass llvm::Function& to M68kCCState
Previously we're passing `llvm::Function&` into `M68kCCState` to lower
arguments in fastcc. However, that reference might not be available if
it's a library call and we only need its argument types. Therefore,
now we're simply passing a list of argument llvm::Type-s.

This fixes PR-50752.

Differential Revision: https://reviews.llvm.org/D108101
2021-08-16 15:33:08 -07:00
Afanasyev Ivan 913b5d2f7a [AsmPrinter] fix nullptr dereference for MBBs with hasAddressTaken property without BB
Basic block pointer is dereferenced unconditionally for MBBs with
hasAddressTaken property.

MBBs might have hasAddressTaken property without reference to BB.
Backend developers must assign fake BB to MBB to workaround this issue
and it should be fixed.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D108092
2021-08-16 15:32:09 -07:00
David Green 9236dea255 [ARM] Create MQQPR and MQQQQPR register classes
Similar to the MQPR register class as the MVE equivalent to QPR, this
adds MQQPR and MQQQQPR register classes for the MVE equivalents of QQPR
and QQQQPR registers. The MVE MQPR seemed have worked out quite well,
and adding MQQPR and MQQQQPR allows us to a little more accurately
specify the number of registers, calculating register pressure limits a
little better.

Differential Revision: https://reviews.llvm.org/D107463
2021-08-16 22:58:12 +01:00
Anshil Gandhi f22ba51873 [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpand pass to report atomics generating a
compare and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-16 14:56:01 -06:00
Stanislav Mekhanoshin 877572cc19 Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation
prevents rematerialization on any virtual register use on the grounds
that is not a trivial rematerialization and that we do not want to
extend liveranges.

It appears that LRE logic does not attempt to extend a liverange of
a source register for rematerialization so that is not an issue.
That is checked in the LiveRangeEdit::allUsesAvailableAt().

The only non-trivial aspect of it is accounting for tied-defs which
normally represent a read-modify-write operation and not rematerializable.

The test for a tied-def situation already exists in the
/CodeGen/AMDGPU/remat-vop.mir,
test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.

The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets
where I more or less understand the asm it seems to reduce spilling
(as expected) or be neutral. However, it needs a review by all targets'
specialists.

Differential Revision: https://reviews.llvm.org/D106408
2021-08-16 12:42:42 -07:00
Rong Xu 9b8425e42c Reapply commit b7425e956
The commit b7425e956: [NFC] fix typos
is harmless but was reverted by accident. Reapply.
2021-08-16 12:18:40 -07:00
Stanislav Mekhanoshin b9e433b02a Prevent machine licm if remattable with a vreg use
Check if a remateralizable nstruction does not have any virtual
register uses. Even though rematerializable RA might not actually
rematerialize it in this scenario. In that case we do not want to
hoist such instruction out of the loop in a believe RA will sink
it back if needed.

This already has impact on AMDGPU target which does not check for
this condition in its isTriviallyReMaterializable implementation
and have instructions with virtual register uses enabled. The
other targets are not impacted at this point although will be when
D106408 lands.

Differential Revision: https://reviews.llvm.org/D107677
2021-08-16 12:09:00 -07:00
Nikita Popov 735a590471 [MemorySSA] Remove -enable-mssa-loop-dependency option
This option has been enabled by default for quite a while now.
The practical impact of removing the option is that MSSA use
cannot be disabled in default pipelines (both LPM and NPM) and
in manual LPM invocations. NPM can still choose to enable/disable
MSSA using loop vs loop-mssa.

The next step will be to require MSSA for LICM and drop the
AST-based implementation entirely.

Differential Revision: https://reviews.llvm.org/D108075
2021-08-16 20:59:37 +02:00
Nikita Popov 570c9beb8e [MemorySSA] Remove unnecessary MSSA dependencies
LoopLoadElimination, LoopVersioning and LoopVectorize currently
fetch MemorySSA when construction LoopAccessAnalysis. However,
LoopAccessAnalysis does not actually use MemorySSA and we can pass
nullptr instead.

This saves one MemorySSA calculation in the default pipeline, and
thus improves compile-time.

Differential Revision: https://reviews.llvm.org/D108074
2021-08-16 20:40:55 +02:00
Nikita Popov 0a031449b2 [PassBuilder] Don't use MemorySSA for standalone LoopRotate passes
Two standalone LoopRotate passes scheduled using
createFunctionToLoopPassAdaptor() currently enable MemorySSA.
However, while LoopRotate can preserve MemorySSA, it does not use
it, so requiring MemorySSA is unnecessary.

This change doesn't have a practical compile-time impact by itself,
because subsequent passes still request MemorySSA.

Differential Revision: https://reviews.llvm.org/D108073
2021-08-16 20:34:18 +02:00
Kostya Kortchinsky 80ed75e7fb Revert "[NFC] Fix typos"
This reverts commit b7425e956b.
2021-08-16 11:13:05 -07:00
Rong Xu b7425e956b [NFC] Fix typos
s/senstive/senstive/g
2021-08-16 10:15:30 -07:00
Jordan Rupprecht 4357562067 [NFC][AArch64] Fix unused var in release build 2021-08-16 10:04:32 -07:00
Paul Robinson 94b4598d77 [PS4] stp[n]cpy not available on PS4 2021-08-16 09:06:52 -07:00
Craig Topper 92abb1cf90 [TypePromotion] Don't mutate the result type of SwitchInst.
SwitchInst should have a void result type.

Add a check to the verifier to catch this error.

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D108084
2021-08-16 08:54:34 -07:00
Simon Pilgrim d6fe8d37c6 [DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b)
Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal.

This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands.

Differential Revision: https://reviews.llvm.org/D107597
2021-08-16 16:06:54 +01:00
Jeremy Morse 95fe61e639 Revert 54a61c94f9 and its follow up in 547b712500
These were part of D107823, however asan  has found something excitingly
wrong happening:

https://lab.llvm.org/buildbot/#/builders/5/builds/10543/steps/13/logs/stdio
2021-08-16 15:48:56 +01:00
Sanjay Patel de285eacb0 [InstCombine] allow for constant-folding in GEP transform
This would crash the reduced test or as described in
https://llvm.org/PR51485
...because we can't mark a constant (-expression) with 'inbounds'.
2021-08-16 10:36:56 -04:00
Jeremy Morse 547b712500 Suppress signedness-comparison warning
This is a follow-up to 54a61c94f9.
2021-08-16 15:29:43 +01:00
Jeremy Morse 54a61c94f9 [DebugInfo][InstrRef] Honour too-much-debug-info cutouts
VarLoc based LiveDebugValues will abandon variable location propagation if
there are too many blocks and variable assignments in the function. If it
didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end
up with 1 million DBG_VALUEs just at the start of blocks.

Instruction-referencing LiveDebugValues should honour this limitation too
(because the same limitation applies to it). Hoist the relevant command
line options into LiveDebugValues.cpp and pass it down into the
implementation classes as an argument to ExtendRanges. I've duplicated all
the run-lines in live-debug-values-cutoffs.mir to have an
instruction-referencing flavour.

Differential Revision: https://reviews.llvm.org/D107823
2021-08-16 15:06:40 +01:00
Roman Lebedev febcedf18c
Revert "[NFCI][IndVars] rewriteLoopExitValues(): nowadays SCEV should not change `GEP` base pointer"
https://bugs.llvm.org/show_bug.cgi?id=51490 was filed.

This reverts commit 35a8bdc775.
2021-08-16 14:30:29 +03:00
David Sherwood 9b19b77883 [NFC] Remove unused code in llvm::createSimpleTargetReduction 2021-08-16 09:50:45 +01:00
Roman Lebedev 2eb554a9fe
Revert "Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)"
This is still wrong, as failing bots suggest.

This reverts commit 3d9beefc7d.
2021-08-16 11:07:42 +03:00
Cullen Rhodes 09507b5325 [AArch64][SME] Disable NEON in streaming mode
In streaming mode most of the NEON instruction set is illegal, disable
NEON when compiling with `+streaming-sve`, unless NEON is explictly
requested.

Subsequent patches will add support for the small subset of NEON
instructions that are legal in streaming mode.

Reviewed By: paulwalker-arm, david-arm

Differential Revision: https://reviews.llvm.org/D107902
2021-08-16 07:56:48 +00:00
Christian Sigg 93c55d5ea2 Reset all options in cl::ResetCommandLineParser()
Reset cl::Positional, cl::Sink and cl::ConsumeAfter options as well in cl::ResetCommandLineParser().

Reviewed By: rriddle, sammccall

Differential Revision: https://reviews.llvm.org/D103356
2021-08-16 09:56:22 +02:00
Craig Topper b82ce77b2b [X86] Support avx512fp16 compare instructions in the IntelInstPrinter.
This enables printing of the mnemonics that contain the predicate
in the Intel printer. This requires accounting for the memory size
that is explicitly printed in Intel syntax. Those changes have been
synced to the ATT printer as well.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108093
2021-08-16 12:31:36 +08:00
Sanjay Patel ca637014f1 [Analysis][SimplifyLibCalls] improve function signature check for memcmp
This would assert/crash as shown in:
https://llvm.org/PR50850

The matching for bcmp/bcopy should probably also be updated,
but that's another patch.
2021-08-15 16:11:26 -04:00
Craig Topper ff95d2524a [X86] Prevent accidentally accepting cmpeqsh as a valid mnemonic.
We should only accept as vcmpeqsh.

Same for all the other 31 comparison values.
2021-08-15 12:00:56 -07:00
Craig Topper 819818f7d5 [X86] Modify the commuted load isel pattern for VCMPSHZrm to match VCMPSSZrm/VCMPSDZrm.
This allows commuting any immediate value. The previous code only
commuted equality immediates. This was inherited from an earlier
version of VCMPSSZrm/VCMPSDZrm.
2021-08-15 11:43:56 -07:00
David Blaikie 62a4c2c10e DWARFVerifier: Check section-relative references at the end of the section
This ensures that debug_types references aren't looked for in
debug_info section.

Behavior is still going to be questionable in an unlinked object file -
since cross-cu references could refer to symbols in another .debug_info
(or, in theory, .debug_types) chunk - but if a producer only uses
ref_addr to refer to things within the same .debug_info chunk in an
object file (eg: whole program optimization/LTO - producing two CUs into
a single .debug_info section in an object file - the ref_addrs there
could be resolved relative to that .debug_info chunk, not needing to
consider comdat  (DWARFv5 type units or other creatures) chunks of
.debug_info, etc)
2021-08-15 11:40:24 -07:00
Craig Topper 786b8fcc9b [X86] Add vcmpsh/vcmpph to X86InstrInfo::commuteInstructionImpl.
They were already added to findCommuteOpIndices, but they also
need to be in X86InstrInfo::commuteInstructionImpl in order
to adjust the immediate control.
2021-08-15 11:36:13 -07:00
Paul Walker cd0e196413 [DAGCombiner] Stop visitEXTRACT_SUBVECTOR creating illegal BITCASTs post legalisation.
visitEXTRACT_SUBVECTOR can sometimes create illegal BITCASTs when
removing "redundant" INSERT_SUBVECTOR operations.  This patch adds
an extra check to ensure such combines only occur after operation
legalisation if any resulting BITBAST is itself legal.

Differential Revision: https://reviews.llvm.org/D108086
2021-08-15 18:25:49 +01:00
Kazu Hirata e6e687f2d9 [AsmParser] Remove MDSignedOrUnsignedField (NFC)
The last use was removed on Apr 18, 2020 in commit
aad3d578da.
2021-08-15 09:31:39 -07:00
David Green c6b7db015f [InstCombine] Add call to matchSAddSubSat from min/max
This adds a call to matchSAddSubSat from smin/smax instrinsics, allowing
the same patterns to match if the canonical form of a min/max is an
intrinsics, not a icmp/select.

Differential Revision: https://reviews.llvm.org/D108077
2021-08-15 17:25:16 +01:00
Roman Lebedev 3d9beefc7d
Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)
... with test change this time.

LLVM IR SSA form is "implicit" in `@pr51125`. While is a valid LLVM IR,
and does not require any PHI nodes, that completely breaks the further logic
in `CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses()`
that updates the live-out uses of the bonus instructions.

What i believe we need to do, is to first make the SSA form explicit,
by inserting tautological PHI nodes, and rewriting the offending uses.

```
$ /builddirs/llvm-project/build-Clang12/bin/opt -load /repositories/alive2/build-Clang-release/tv/tv.so -load-pass-plugin /repositories/alive2/build-Clang-release/tv/tv.so -tv -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=10 -tv -o /dev/null /tmp/test.ll

----------------------------------------
@global_pr51125 = global 4 bytes, align 4

define i32 @pr51125() {
%entry:
  br label %L

%L:
  %ld = load i32, * @global_pr51125, align 4
  %iszero = icmp eq i32 %ld, 0
  br i1 %iszero, label %exit, label %L2

%L2:
  store i32 4294967295, * @global_pr51125, align 4
  %cmp = icmp eq i32 %ld, 4294967295
  br i1 %cmp, label %L, label %exit

%exit:
  %r = phi i32 [ %ld, %L2 ], [ %ld, %L ]
  ret i32 %r
}
=>
@global_pr51125 = global 4 bytes, align 4

define i32 @pr51125() {
%entry:
  %ld.old = load i32, * @global_pr51125, align 4
  %iszero.old = icmp eq i32 %ld.old, 0
  br i1 %iszero.old, label %exit, label %L2

%L2:
  %ld2 = phi i32 [ %ld.old, %entry ], [ %ld, %L2 ]
  store i32 4294967295, * @global_pr51125, align 4
  %cmp = icmp ne i32 %ld2, 4294967295
  %ld = load i32, * @global_pr51125, align 4
  %iszero = icmp eq i32 %ld, 0
  %or.cond = select i1 %cmp, i1 1, i1 %iszero
  br i1 %or.cond, label %exit, label %L2

%exit:
  %ld1 = phi i32 [ poison, %L2 ], [ %ld.old, %entry ]
  %r = phi i32 [ %ld2, %L2 ], [ %ld.old, %entry ]
  ret i32 %r
}
Transformation seems to be correct!

```

Fixes https://bugs.llvm.org/show_bug.cgi?id=51125

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D106317
2021-08-15 19:16:04 +03:00
Roman Lebedev 60dd0121c9
Revert "[SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)"
Forgot to stage the test change.

This reverts commit 78af5cb213.
2021-08-15 19:15:09 +03:00
Roman Lebedev 78af5cb213
[SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)
LLVM IR SSA form is "implicit" in `@pr51125`. While is a valid LLVM IR,
and does not require any PHI nodes, that completely breaks the further logic
in `CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses()`
that updates the live-out uses of the bonus instructions.

What i believe we need to do, is to first make the SSA form explicit,
by inserting tautological PHI nodes, and rewriting the offending uses.

```
$ /builddirs/llvm-project/build-Clang12/bin/opt -load /repositories/alive2/build-Clang-release/tv/tv.so -load-pass-plugin /repositories/alive2/build-Clang-release/tv/tv.so -tv -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=10 -tv -o /dev/null /tmp/test.ll

----------------------------------------
@global_pr51125 = global 4 bytes, align 4

define i32 @pr51125() {
%entry:
  br label %L

%L:
  %ld = load i32, * @global_pr51125, align 4
  %iszero = icmp eq i32 %ld, 0
  br i1 %iszero, label %exit, label %L2

%L2:
  store i32 4294967295, * @global_pr51125, align 4
  %cmp = icmp eq i32 %ld, 4294967295
  br i1 %cmp, label %L, label %exit

%exit:
  %r = phi i32 [ %ld, %L2 ], [ %ld, %L ]
  ret i32 %r
}
=>
@global_pr51125 = global 4 bytes, align 4

define i32 @pr51125() {
%entry:
  %ld.old = load i32, * @global_pr51125, align 4
  %iszero.old = icmp eq i32 %ld.old, 0
  br i1 %iszero.old, label %exit, label %L2

%L2:
  %ld2 = phi i32 [ %ld.old, %entry ], [ %ld, %L2 ]
  store i32 4294967295, * @global_pr51125, align 4
  %cmp = icmp ne i32 %ld2, 4294967295
  %ld = load i32, * @global_pr51125, align 4
  %iszero = icmp eq i32 %ld, 0
  %or.cond = select i1 %cmp, i1 1, i1 %iszero
  br i1 %or.cond, label %exit, label %L2

%exit:
  %ld1 = phi i32 [ poison, %L2 ], [ %ld.old, %entry ]
  %r = phi i32 [ %ld2, %L2 ], [ %ld.old, %entry ]
  ret i32 %r
}
Transformation seems to be correct!

```

Fixes https://bugs.llvm.org/show_bug.cgi?id=51125

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D106317
2021-08-15 19:02:34 +03:00
Roman Lebedev 35a8bdc775
[NFCI][IndVars] rewriteLoopExitValues(): nowadays SCEV should not change `GEP` base pointer
Currently/previously, while SCEV guaranteed that it produces the same value,
the way it was produced may be illegal IR, so we have an ugly check that
the replacement is valid.

But now that the SCEV strictness wrt the pointer/integer types has been improved,
i believe this invariant is already upheld by the SCEV itself, natively.

I think we should add an assertion, wait for a week, and then, if all is good,
rip out all this checking.
Or we could just do the latter directly i guess.

This reverts commit rL127839.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D108043
2021-08-15 18:59:32 +03:00
Nikita Popov 944dfa4975 [IndVars] Don't check for pointer exit count (NFC)
After recent changes, exit counts and BE taken counts are always
integers, so convert these to assertions.

While here, also convert the loop invariance checks to asserts.
Exit counts are always loop invariant.
2021-08-15 16:49:30 +02:00
Qiu Chaofan a240b29f21 [NFC] Simply update a FIXME comment
X86 overrided LowerOperationWrapper was moved to common implementation
in a7eae62.
2021-08-15 22:43:46 +08:00
Nikita Popov 3c503ba06a [FunctionImport] Fix build with old mingw (NFC)
std::errc::operation_not_supported is not universally supported.
Make use of LLVM's errc interoperability header, which lists
known-good errc values.
2021-08-15 15:47:59 +02:00
Harald van Dijk 957334382c
[ExecutionEngine] Check for libunwind before calling __register_frame
libgcc and libunwind have different flavours of __register_frame. Both
 flavours are already correctly handled, except that the code to handle
the libunwind flavour is guarded by __APPLE__. This change uses the
presence of __unw_add_dynamic_fde in libunwind instead to detect whether
libunwind is used, rather than hardcoding it as Apple vs. non-Apple.

Fixes PR44074.

Thanks to Albert Jin <albert.jin@gmail.com> and Chris Schafmeister
<chris.schaf@verizon.net> for identifying the problem.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D106129
2021-08-15 13:35:53 +01:00
Paul Walker f7a831daa6 [LoopVectorize] Don't emit remarks about lack of scalable vectors unless they're specifically requested.
Previously we emitted a "does not support scalable vectors"
remark for all targets whenever vectorisation is attempted. This
pollutes the output for architectures that don't support scalable
vectors and is likely confusing to the user.

Instead this patch introduces a debug message that reports when
scalable vectorisation is allowed by the target and only issues
the previous remark when scalable vectorisation is specifically
requested, for example:

  #pragma clang loop vectorize_width(2, scalable)

Differential Revision: https://reviews.llvm.org/D108028
2021-08-15 12:15:52 +01:00
Nikita Popov 81b106584f [AArch64] Fix comparison peephole opt with non-0/1 immediate (PR51476)
This is a non-intrusive fix for
https://bugs.llvm.org/show_bug.cgi?id=51476 intended for backport
to the 13.x release branch. It expands on the current hack by
distinguishing between CmpValue of 0, 1 and 2, where 0 and 1 have
the obvious meaning and 2 means "anything else". The new optimization
from D98564 should only be performed for CmpValue of 0 or 1.

For main, I think we should switch the analyzeCompare() and
optimizeCompare() APIs to use int64_t instead of int, which is in
line with MachineOperand's notion of an immediate, and avoids this
problem altogether.

Differential Revision: https://reviews.llvm.org/D108076
2021-08-15 12:35:52 +02:00
Dávid Bolvanský 49de6070a2 Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop"
This reverts commit 435785214f. Still same compile time issues for -O0 -g, eg. +1.3% for sqlite3.
2021-08-15 11:44:13 +02:00
Anshil Gandhi 435785214f [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpand pass to report atomics generating
a compare and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-14 23:37:23 -06:00
Itay Bookstein 530aa7e4da [Linker] Import GlobalIFunc when importing symbols from another module
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107988
2021-08-14 22:01:11 -07:00
Wang, Pengfei f1de9d6dae [X86] AVX512FP16 instructions enabling 2/6
Enable FP16 binary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105264
2021-08-15 08:56:33 +08:00
luxufan 4ec32375bc [JITLink] Unify x86-64 MachO and ELF 's optimize GOT/Stub function
This patch  unify optimizeELF_x86_64_GOTAndStubs and optimizeMachO_x86_64_GOTAndStubs into a pure optimize_x86_64_GOTAndStubs

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D108025
2021-08-15 00:33:09 +08:00
Kazu Hirata 915cc69259 [Aarch64] Remove redundant c_str (NFC)
Identified with readability-redundant-string-cstr.
2021-08-14 08:49:40 -07:00
eopXD 012173680f [LoopIdiom] let the pass deal with runtime memset size
The current LIR does not deal with runtime-determined memset-size. This patch
utilizes SCEV and check if the PointerStrideSCEV and the MemsetSizeSCEV are equal.
Before comparison the pass would try to fold the expression that is already
protected by the loop guard.

Testcase file `memset-runtime.ll`, `memset-runtime-debug.ll` added.

This patch deals with proper loop-idiom. Proceeding patch wants to deal with SCEV-s
that are inequal after folding with the loop guards.

Reviewed By: lebedev.ri, Whitney

Differential Revision: https://reviews.llvm.org/D107353
2021-08-14 19:22:06 +08:00
Dawid Jurczak 107401002e [NFC][DSE] Clean up KnownNoReads and MemorySSAScanLimit in DSE
Another simple cleanups set in DSE. CheckCache is removed since 1f1145006b and in consequence KnownNoReads is useless.
Also update description of MemorySSAScanLimit which default value is 150 instead 100.

Differential Revision: https://reviews.llvm.org/D107812
2021-08-14 11:26:57 +02:00
Lang Hames 27ea3f1607 [JITLink][x86-64] Rename *Relaxable edges to *REXRelaxable.
The existing relaxable edges all assume a REX prefix. ELF includes non-REX
relaxations, so rename these edges to make room for the new kinds.
2021-08-14 18:28:49 +10:00
Lang Hames 632135acae [JITLink][x86-64] Rename BranchPCRel32ToPtrJumpStub(Relaxable -> Bypassable).
ELF allows for branch optimizations other than bypass, so rename this edge kind
to avoid any confusion.
2021-08-14 17:49:31 +10:00
Anshil Gandhi 29e11a1aa3 Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop"
This reverts commit c4e5425aa5.
2021-08-13 23:58:04 -06:00
Anshil Gandhi c4e5425aa5 [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpandPass to report atomics generating a compare
and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-13 22:44:08 -06:00
Jessica Paquette 50efbf9cbe [GlobalISel] Narrow binops feeding into G_AND with a mask
This is a fairly common pattern:

```
%mask = G_CONSTANT iN <mask val>
%add = G_ADD %lhs, %rhs
%and = G_AND %add, %mask
```

We have combines to eliminate G_AND with a mask that does nothing.

If we combined the above to this:

```
%mask = G_CONSTANT iN <mask val>
%narrow_lhs = G_TRUNC %lhs
%narrow_rhs = G_TRUNC %rhs
%narrow_add = G_ADD %narrow_lhs, %narrow_rhs
%ext = G_ZEXT %narrow_add
%and = G_AND %ext, %mask
```

We'd be able to take advantage of those combines using the trunc + zext.

For this to work (or be beneficial in the best case)

- The operation we want to narrow then widen must only be used by the G_AND
- The G_TRUNC + G_ZEXT must be free
- Performing the operation at a narrower width must not produce a different
  value than performing it at the original width *after masking.*

Example comparison between SDAG + GISel: https://godbolt.org/z/63jzb1Yvj

At -Os for AArch64, this is a 0.2% code size improvement on CTMark/pairlocalign.

Differential Revision: https://reviews.llvm.org/D107929
2021-08-13 18:31:13 -07:00
Matt Arsenault cc56152f83 GlobalISel: Add helper function for getting EVT from LLT
This can only give an imperfect approximation, but is enough to avoid
crashing in places where we call into EVT functions starting from LLTs.
2021-08-13 21:10:13 -04:00
Craig Topper d63f117210 [RISCV] Support RISCVISD::SELECT_CC in ComputeNumSignBitsForTargetNode. 2021-08-13 18:00:09 -07:00
Matt Arsenault a77ae4aa6a AMDGPU: Stop attributor adding attributes to intrinsic declarations 2021-08-13 20:51:48 -04:00
Matt Arsenault 5beb9a0e6a AMDGPU: Respect compute ABI attributes with unknown OS
Unfortunately Mesa is still using amdgcn-- as the triple for OpenGL,
so we still have the awkward unknown OS case to deal with. Previously
if the HSA ABI intrinsics appeared, we we would not add the ABI
registers to the function. We would emit an error later, but we still
need to produce some compile result. Start adding the registers to any
compute function, regardless of the OS. This keeps the internal state
more consistent, and will help avoid numerous test crashes in a future
patch which starts assuming the ABI inputs are present on functions by
default.
2021-08-13 20:44:46 -04:00
Arthur Eubanks 16e8134e7c [NFC] One more AttributeList::getAttribute(FunctionIndex) -> getFnAttr() 2021-08-13 16:56:42 -07:00
Arthur Eubanks c19d7f8af0 [CallPromotion] Check for inalloca/byval mismatch
Previously we would allow promotion even if the byval/inalloca
attributes on the call and the callee didn't match.

It's ok if the byval/inalloca types aren't the same. For example, LTO
importing may rename types.

Fixes PR51397.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D107998
2021-08-13 16:52:04 -07:00
Arthur Eubanks d5ff5ef65e [NFC] One more AttributeList::getAttribute(FunctionIndex) -> getFnAttr() 2021-08-13 16:49:05 -07:00
Arthur Eubanks dc41c558dd [NFC] Make AttributeList::hasAttribute(AttributeList::ReturnIndex) its own method
AttributeList::hasAttribute() is confusing. In an attempt to change the
name to something that suggests using other methods, fix up some
existing uses.
2021-08-13 16:27:11 -07:00
Arthur Eubanks f80ae58068 [NFC] Cleanup calls to AttributeList::getAttribute(FunctionIndex)
getAttribute() is confusing, use a clearer method.
2021-08-13 16:27:11 -07:00
Arthur Eubanks 8e9ffa1dc6 [NFC] Cleanup callers of AttributeList::hasAttributes()
AttributeList::hasAttributes() is confusing, use clearer methods like
hasFnAttrs().
2021-08-13 12:16:52 -07:00
Arthur Eubanks d7593ebaee [NFC] Clean up users of AttributeList::hasAttribute()
AttributeList::hasAttribute() is confusing, use clearer methods like
hasParamAttr()/hasRetAttr().

Add hasRetAttr() since it was missing from AttributeList.
2021-08-13 11:59:18 -07:00
Arthur Eubanks a9831cce1e [NFC] Remove public uses of AttributeList::getAttributes()
Use methods that better convey the intent.
2021-08-13 11:38:12 -07:00
Arthur Eubanks 80ea2bb574 [NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes()
This is more consistent with similar methods.
2021-08-13 11:16:52 -07:00
Arthur Eubanks 92ce6db9ee [NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr()
This is more consistent with similar methods.
2021-08-13 11:09:18 -07:00
Arthur Eubanks a0c42ca56c [NFC] Remove AttributeList::hasParamAttribute()
It's the same as AttributeList::hasParamAttr().
2021-08-13 10:58:21 -07:00
Amy Kwan 581a80304c [PowerPC] Disable CTR Loop generate for fma with the PPC double double type.
It is possible to generate the llvm.fmuladd.ppcf128 intrinsic, and there is no actual
FMA instruction that corresponds to this intrinsic call for ppcf128. Thus, this
intrinsic needs to remain as a call as it cannot be lowered to any instruction, which
also means we need to disable CTR loop generation for fma involving the ppcf128 type.
This patch accomplishes this behaviour.

Differential Revision: https://reviews.llvm.org/D107914
2021-08-13 12:27:24 -05:00
Haowei Wu 571b0d84d2 [IFS] Fix the copy constructor warning in IFSStub.cpp
This change fixes the gcc warning on copy constructor in IFSStub.cpp
file.

Differential Revision: https://reviews.llvm.org/D108000
2021-08-13 10:17:53 -07:00
Alfonso Gregory 17bc82dd3b [AsmWriter][NFC] Simplify writeDIGenericSubrange
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107469
2021-08-13 09:31:13 -07:00
Jessica Paquette ccfc079047 [AArch64][GlobalISel] Legalize scalar G_SSUBSAT + G_SADDSAT
These are lowered, matching SDAG behaviour. (See
llvm/test/CodeGen/AArch64/ssub_sat.ll and llvm/test/CodeGen/AArch64/sadd_sat.ll)

These fall back ~159 times on a build of clang with GISel enabled.

Differential Revision: https://reviews.llvm.org/D107777
2021-08-13 09:02:25 -07:00
Jamie Schmeiser 64f29e2dd1 Fix bad assert in print-changed code
Summary:
The assertion that both functions were not missing was incorrect and would
fail when one of the functions was missing. Fixed it and moved the
assertion earlier to check the input parameters to better capture
first-failure.  Added lit test.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D107989
2021-08-13 10:54:30 -04:00
Roman Lebedev 0dc6b597db
Revert "[SCEV] Remove premature assert. PR46786"
Since then, the SCEV pointer handling as been improved,
so the assertion should now hold.

This reverts commit b96114c1e1,
relanding the assertion from commit 141e845da5.
2021-08-13 17:50:22 +03:00
Roman Lebedev c46546bd52
Reland "[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological""
The commit originally unearthed a problem, reported as
https://reviews.llvm.org/rGf30a7dff8a5b32919951dcbf92e4a9d56c4679ff#1019890
Now that the problem has been fixed, and the assertion no longer fires,
let's see if there are other cases it fires on.

This reverts commit 5c8c24d2de,
relanding commit f30a7dff8a.
2021-08-13 15:45:03 +03:00
Roman Lebedev 2702fb1148
[SimplifyCFG] Restart if `removeUndefIntroducingPredecessor()` made changes
It might changed the condition of a branch into a constant,
so we should restart and constant-fold terminator,
instead of continuing with the tautological "conditional" branch.
This fixes the issue reported at https://reviews.llvm.org/rGf30a7dff8a5b32919951dcbf92e4a9d56c4679ff
2021-08-13 15:45:03 +03:00
Roman Lebedev 5c8c24d2de
Revert "[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological"
The assertion does not hold on a provided reproducer.
Reverting until after fixing the problem.

This reverts commit f30a7dff8a.
2021-08-13 13:16:22 +03:00
Dylan Fleming 4be7fb9762 [SVE] Add folds for truncation of vscale
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D107453
2021-08-13 10:18:00 +01:00
Rosie Sumpter 46abd1fbe8 [LoopFlatten] Fix assertion failure in checkOverflow
There is an assertion failure in computeOverflowForUnsignedMul
(used in checkOverflow) due to the inner and outer trip counts
having different types. This occurs when the IV has been widened,
but the loop components are not successfully rediscovered.
This is fixed by some refactoring of the code in findLoopComponents
which identifies the trip count of the loop.
2021-08-13 10:07:49 +01:00
luxufan ee65938357 [JITLink] Update ELF_x86_64 's edge kind to generic edge kind
This patch uses a switch statement to map the ELF_x86_64's edge kind to generic edge kind, and merge the ELF_x86_64 's applyFixup function to the x86_64 's applyFixup function. Some edge kinds were not have corresponding generic edge kinds, so I added three generic edge kinds asa follows:
1. RequestGOTAndTransformToDelta64, which is similar to RequestGOTAndTransformToDelta32.

2. GOTDelta64. This generic kind is similar to Delta64, except the GOTDelta64 computes the delta relative to GOTSymbol

3. RequestGOTAndTransformToGOTDelta64. This edge kind was used to deal with ELF_x86_64's GOT64 edge kind, it request the fixGOTEdge function to change the target to GOT entry, and set the edge kind to generic edge kind GOTDelta64.

These added generic edge kinds may named haphazardly, or can't express its meaning well.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D107967
2021-08-13 12:53:54 +08:00
Shivam Gupta 835ea22b37 [AVR] Enable machine verifier
Reviewed By: mhjacobson, benshi001

Differential Revision: https://reviews.llvm.org/D107853
2021-08-13 12:11:22 +08:00
Michael Kruse b1de32d6dd [OMPIRBuilder] Clarify CanonicalLoopInfo. NFC.
Add in-source documentation on how CanonicalLoopInfo is intended to be used. In particular, clarify what parts of a CanonicalLoopInfo is considered part of the loop, that those parts must be side-effect free, and that InsertPoints to instructions outside those parts can be expected to be preserved after method calls implementing loop-associated directives.

CanonicalLoopInfo are now invalidated after it does not describe canonical loop anymore and asserts when trying to use it afterwards.

In addition, rename `createXYZWorkshareLoop` to `applyXYZWorkshareLoop` and remove the update location to avoid that the impression that they insert something from scratch at that location where in reality its InsertPoint is ignored. createStaticWorkshareLoop does not return a CanonicalLoopInfo anymore. First, it was not a canonical loop in the clarified sense (containing side-effects in form of calls to the OpenMP runtime). Second, it is ambiguous which of the two possible canonical loops it should actually return. It will not be needed before a feature expected to be introduced in OpenMP 6.0

Also see discussion in D105706.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D107540
2021-08-12 21:02:19 -05:00
Giorgis Georgakoudis 60e643fe05 [OpenMP][Fix] Fix disable spmdization option
Besides SPMDization, other analysis and optimization for original, frontend-generated SPMD regions uses information from the AAKernelInfoFunction attribute. This fix makes sure disabling SPMDization through the corresponding option applies only to generic mode regions, which should not be SPMDized, while it leaves unaffected the attribute state of original SPMD regions.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108001
2021-08-12 17:59:14 -07:00
Ruiling Song e1beebbac5 SplitKit: Don't further split subrange mask in buildCopy
We may use several COPY instructions to copy the needed sub-registers
during split. But the way we split the lanes during the COPYs may be
different from the subranges of the old register. This would fail when we
extend the subranges of the new register because the LaneMasks do not
match exactly between subranges of new register and old register.
Since we are bundling the COPYs, I think there is no need to further refine the
subranges of the new register based on the set of LaneMasks of the inserted COPYs.

I am not sure if there will be further breaking cases. But as the subranges of
new register are created based on the LaneMasks of the subranges of old register,
it will be highly possible we will always find an exact LaneMask match.
We can think about how to make the extendPHIKillRanges() work for
subrange mask mismatch case if we meet more such cases in the future.

The test case was from D105065 by @arsenm.

Differential Revision: https://reviews.llvm.org/D107829
2021-08-13 07:36:38 +08:00
Heejin Ahn adb96d2e76 [WebAssembly] Fix leak in Emscripten SjLj
For SjLj, we allocate a table to record setjmp buffer info in the entry
of each setjmp-calling function by inserting a `malloc` call, and insert
a `free` call to free the buffer before each `ret` instruction.

But this is not sufficient; we have to free the buffer before we throw.
In SjLj handling, normal functions that can possibly throw or longjmp
are wrapped with an invoke and caught within the function so they don't
end up escaping the function. But three functions throw and escape the
function:
- `__resumeException` (Emscripten library function used for Emscripten
  EH)
- `emscripten_longjmp` (Emscripten library function used for Emscripten
  SjLj)
- `__cxa_throw` (libc++abi function called when for C++ `throw` keyword)

The first two functions are used to rethrow the current
exception/longjmp when the caught exception/longjmp is not for the
current function. `__cxa_throw` is used for exception, and because we
consider that a function that cannot longjmp, it escapes the function
right away, before which we should free the buffer.

Currently `lsan.test_longjmp3` and `lsan.test_exceptions_longjmp3` fail
in Emscripten; this CL fixes these.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D107852
2021-08-12 16:32:46 -07:00
Heejin Ahn aca198cf74 [WebAssembly] Error out when Emscripten SjLj setjmp is used with Wasm EH
Currently, when Wasm EH is used with Emscripten SjLj, Emscripten SjLj
cannot handle `invoke` instructions - it assumes all `invoke`s have been
lowered away with Emscripten EH. But in Wasm EH they are lowered in
instruction selection, so they are still present in the IR stage. This
happens when
1. Wasm EH and Emscripten SjLj are used together
2. A function that calls `setjmp` uses exceptions, i.e., has `invoke`s

We were already erroring out with an assertion failure in this case, but
this CL makes it error out more properly with a valid error message.

Wasm EH + Wasm SjLj will not have this restrictions. (it will have
another restriction though, e.g., `setjmp` cannot be called within
`catch`. But why would anyone do that..)

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D107687
2021-08-12 16:19:04 -07:00
Lei Huang 8930af45c3 [PowerPC] Implement XL compatibility builtin __addex
Add builtin and intrinsic for `__addex`.

This patch is part of a series of patches to provide builtins for
compatibility with the XL compiler.

Reviewed By: stefanp, nemanjai, NeHuang

Differential Revision: https://reviews.llvm.org/D107002
2021-08-12 16:38:21 -05:00
Heejin Ahn 78e87970af [WebAssembly] Disable offset folding for function addresses
Wasm does not support function addresses with offsets, but isel can
generate folded SDValues in the form of (@func + offset) without this
patch.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43133.

Reviewed By: dschuff, sbc100

Differential Revision: https://reviews.llvm.org/D107940
2021-08-12 13:40:41 -07:00
Sanjay Patel 14eefa57f2 [InstCombine] factorize min/max intrinsic ops with common operand (2nd try)
This is a re-try of 6de1dbbd09 which was reverted because
it missed a null check. Extra test for that failure added.

Original commit message:
This is an adaptation of D41603 and another step on the way
to canonicalizing to the intrinsic forms of min/max.

See D98152 for status.
2021-08-12 16:32:07 -04:00
Amy Huang 427520a8fa Revert "[InstCombine] factorize min/max intrinsic ops with common operand"
This reverts commit 6de1dbbd09 because it causes a
compiler crash.
2021-08-12 12:36:25 -07:00
Florian Hahn f999312872
Recommit "[Matrix] Overload stride arg in matrix.columnwise.load/store."
This reverts the revert 28c04794df.

The failing MLIR test that caused the revert should be fixed  in this
version.

Also includes a PPC test fix previously in 1f87c7c478.
2021-08-12 18:31:57 +01:00
Craig Topper 79fbddbea0 [RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit-stride or strided loads/stores in some cases.
For unit-stride and strided load/stores we set the SEW operand of
the pseudo instruction equal the EEW in the opcode. The LMUL
of the pseudo instruction is the LMUL we want.

These instructions calculate EMUL=(EEW/SEW) * LMUL. We can use
this to avoid changing vtype if the SEW/LMUL of the previous
vtype matches the EEW/EMUL ratio we need for the instruction.

Due to how the global analysis works, we can only do this
optimization when the previous vsetvli was produced in the block
containing the store. We need to know in the first phase if the
vsetvli will be inserted so we can propagate information to
the successors in the second phase correctly. This means we can't
depend on predecessors.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D106601
2021-08-12 10:05:27 -07:00
Roman Lebedev f30a7dff8a
[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological
We really shouldn't deal with a conditional branch that can be trivially
constant-folded into an unconditional branch.

Indeed, barring failure to trigger BB reprocessing, that should be true,
so let's assert as much, and hope the assertion never fires.
If it does, we have a bug to fix.
2021-08-12 20:03:09 +03:00
Roman Lebedev 628f63d3d5
[SimplifyCFG] If FoldTwoEntryPHINode() changed things, restart
Mainly, i want to add an assertion that `SimplifyCFGOpt::simplifyCondBranch()`
doesn't get asked to deal with non-unconditional branches,
and if i do that, then said assertion fires on existing tests,
and this is what prevents it from firing.
2021-08-12 20:03:09 +03:00
Sanjay Patel 790c29ab86 [InstCombine] fold umax/umin intrinsics based on demanded bits
This is a direct translation of the select folds added with
D53033 / D53036 and another step towards canonicalization
using the intrinsics (see D98152).
2021-08-12 12:37:45 -04:00
maekawatoshiki dd3eea6566 [LICM] Support sinking in LNICM
Currently, LNICM pass does not support sinking instructions out of loop nest.
This patch enables LNICM to sink down as many instructions to the exit block of outermost loop as possible.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D107219
2021-08-13 00:56:26 +09:00
Sanjay Patel cd44cc86e3 [InstCombine] remove unused function argument; NFC
This was just added with 6de1dbbd09 , and I missed
pulling the extra arg from the final revision.
2021-08-12 11:47:25 -04:00
Johannes Doerfert 4e7d7cae67 [Attributor][FIX] Do not try to rewrite functions with casted call sites
If we cast a function at the call site it is hard(er) to get the rewrite
correct, let's not attempt it for now.

Fixes PR51448.
2021-08-12 10:39:53 -05:00
Johannes Doerfert 5f543919b2 [Attributor][FIX] Guard constant casts with type size checks 2021-08-12 10:39:53 -05:00
Johannes Doerfert a420f80bf1 [Attributor] Do not delete volatile stores to null/undef
See D106309.

Differential Revision: https://reviews.llvm.org/D107906
2021-08-12 10:39:52 -05:00
David Green ae9a346ef8 [ARM] Fix DAG combine loop in reduction distribution
Given a constant operand, the MVE and DAGCombine combines could fight,
each redistributing in the opposite order. Add a guard to the MVE
vecreduce distribution to prevent that.
2021-08-12 16:37:39 +01:00
Sanjay Patel be0698559b [InstCombine] remove shl(neg x), y transform
This diff was accidentally committed with:
1b5a195845
2021-08-12 11:27:22 -04:00
Sanjay Patel 6de1dbbd09 [InstCombine] factorize min/max intrinsic ops with common operand
This is an adaptation of D41603 and another step on the way
to canonicalizing to the intrinsic forms of min/max.

See D98152 for status.
2021-08-12 11:19:09 -04:00
Sanjay Patel 1b5a195845 [InstCombine] add tests for factorization of min/max intrinsics; NFC 2021-08-12 11:19:09 -04:00
Victor Huang 99e00663d4 [PowerPC] Fix return address computation for "__builtin_return_address"
When depth > 0, callee frame address is used to compute the return address of
callee producing improper return address. This patch adds the fix to use caller
frame address to compute the return address of callee.

Reviewed By: nemanjai, #powerpc

Differential revision: https://reviews.llvm.org/D107646
2021-08-12 09:44:49 -05:00
Liqiang Tao 422fc5603a [llvm][Inline] Refactor out InlineOrder
Move InlineOrder to separated file.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D107831
2021-08-12 22:19:53 +08:00
Mehdi Amini 28c04794df Revert "[Matrix] Overload stride arg in matrix.columnwise.load/store."
This reverts commit a1ef81de35.

Broke the MLIR buildbot.
2021-08-12 11:57:19 +00:00
Florian Hahn a1ef81de35
[Matrix] Overload stride arg in matrix.columnwise.load/store.
This patch adjusts the intrinsics definition of
llvm.matrix.column.major.load and llvm.matrix.column.major.store to
allow overloading the type of the stride. The bitwidth of the stride is
used to perform the offset computation.

This fixes a crash when using __builtin_matrix_column_major_load or
__builtin_matrix_column_major_store on 32 bit platforms. The stride argument
of the builtins are defined as `size_t`, which is 32 bits wide on 32 bit
platforms.

Note that we still perform offset computations with 64 bit width on 32
bit platforms for accesses that do not take a user-specified stride.
This can be fixed separately.

Fixes PR51304.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D107349
2021-08-12 10:45:25 +01:00
David Truby 9c47d6b48d [llvm][sve] Lowering for VLS extending loads
This patch enables extending loads for fixed length SVE code generation.

There is a slight regression here in the mulh tests; since these tests
load the parameter and then extend it these are treated as extending
loads which are merged, preventing the mulh instruction from being
generated. As this affects scalable SVE codegen as well this should be
addressed in a separate patch.

Reviewed By: bsmith

Differential Revision: https://reviews.llvm.org/D107057
2021-08-12 09:43:39 +00:00
Cullen Rhodes 419deccfd1 [AArch64] NFC: Remove register decoder tables in disassembler
The register classes are generated by TableGen, use them instead of
handwritten tables.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D107763
2021-08-12 07:28:56 +00:00
Fangrui Song 67d4d7cf68 [Object] Add missing PPC_DYNAMIC_TAG macros 2021-08-12 00:05:04 -07:00
Christudasan Devadasan 5d940b71ae Reapply "SROA: Enhance speculateSelectInstLoads"
Originally committed as ffc3fb665d
Reverted in fcf2d5f402 due to an
assertion failure.

Original commit message:

Allow the folding even if there is an
intervening bitcast.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106667
2021-08-11 22:58:54 -04:00
Amara Emerson 73056f239e [AArch64][GlobalISel] Simplify/nuke the merge/unmerge legalizer rules.
These rules were originally written when the new predicate based legalizer
was introduced in an attempt to preserve existing behaviour. It wasn't
properly kept up to date as things like vector support was split out into
G_CONCAT_VECTORS, and frankly, even if it was, it was too complex.

It's much easier to start from scratch with what we can actually support,
which is just a few type combinations. Anything illegal we should either
legalize, or should be eliminated as a side effect of artifact combination.

Differential Revision: https://reviews.llvm.org/D107937
2021-08-11 16:45:23 -07:00