Commit Graph

49278 Commits

Author SHA1 Message Date
Florian Hahn 275bee32ad
[LoopUnroll] Forget block and loop dispositions during unrolling.
After unrolling a loop, the block and loop dispositions need to be
cleared. As we don't know which SCEVs in the loop/blocks may be
impacted, completely clear the cache. This should also fix some cases
where deleted loops remained in the LoopDispositions cache.

This fixes a verification failure surfaced by D134531.

I am planning on reviewing/updating the existing uses of
forgetLoopDispositions to check if they should be replaced by
forgetBlockAndLoopDispositions.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D134612
2022-09-27 08:49:04 +01:00
Carlos Alberto Enciso 6584d1f930 [ADT] Add IntervalTree - light tree data structure to hold intervals.
It allows finding all intervals that overlap with any given point.
At this time, it does not support any deletion or rebalancing
operations.

The IntervalTree is designed to be set up once, and then queried
without any further additions.

Reviewed By: psamolysov, probinson

Differential Revision: https://reviews.llvm.org/D125776
2022-09-27 08:22:28 +01:00
Paulo Matos 1bd1a44070 [WebAssembly] Use intrinsics for table.get/set instructions
Initial table.get/set implementation would match and lower combinations
of GEP+load/store to table.get/set instructions. However, this is error
prone due to potential combinations of GEP+load/store we don't implement,
and load/store optimizations. By changing the code to using intrinsics, we
 avoid both issues and simplify the code.

New builtins implemented:
* @llvm.wasm.table.get.externref
* @llvm.wasm.table.get.funcref
* @llvm.wasm.table.set.externref
* @llvm.wasm.table.set.funcref

Reviewed By: asb, tlively

Differential Revision: https://reviews.llvm.org/D134436
2022-09-27 09:16:30 +02:00
Fraser Cormack 4b1833d7d9 [Support][NFC] Fix typo in comment 2022-09-27 07:31:21 +01:00
Yeting Kuo 04e1301f3d [VP][RISCV] Add vp.maxnum and vp.minnum intrinsics and RISC-V support.
Add vp.maxnum and vp.minnum which are vector predicted intrinsics of llvm.maxnum
and llvm.minnum.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134639
2022-09-27 13:36:45 +08:00
Paul Scoropan ce004fb4f2 [PowerPC] XCOFF exception section support on the direct assembler path
This feature implements support for making entries in the exception section
on XCOFF on the direct assembly path using the ".except" pseudo-op. It also
provides functionality to lower entries (comprised of language and reason
codes) into the exception section through the use of annotation metadata
attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler
support will be provided in another review. https://reviews.llvm.org/D133030
needs to merge first for LIT tests

Reviewed By: shchenz, RKSimon

Differential Revision: https://reviews.llvm.org/D132146
2022-09-26 22:24:20 -04:00
Sebastian Peryt 46fc75ab28 [NFC][2/n] Remove PrunePH pass
Second patch in the series to remove legacy PM and
associated -enable-new-pm=0 flag targets pass that
has not been ported to new PM - PruneEH.
Discussion about this can be found in D44415.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134686
2022-09-26 18:38:04 -07:00
Matt Arsenault 9bf1aea224 LoopPeel: Pass through AssumptionCache (NFC) 2022-09-26 14:52:59 -04:00
Matt Arsenault 53fa00b3ae LoopUnroll: Pass through AssumptionCache (NFC)
Using these queries with a context instruction and without a cache
seems to be about 2x slower than with it so this theoretically
improves compile time.
2022-09-26 14:52:59 -04:00
Kazu Hirata e2398a4d7c [Analysis] Introduce getStaticBonusApplied (NFC)
InlineCostCallAnalyzer encourages inlining of the last call to the
static function by subtracting LastCallToStaticBonus from Cost.

This patch introduces getStaticBonusApplied to make available the
amount of LastCallToStaticBonus applied.

The intent is to allow the module inliner to determine whether
inlining a given call site is expected to reduce the caller size with
an expression like:

  IC.getCost() + IC.getStaticBonusApplied() < 0

This patch does not add a use of getStaticBonus yet.

Differential Revision: https://reviews.llvm.org/D134373
2022-09-25 23:21:40 -07:00
Kazu Hirata 06b1e5fdc3 [llvm] Use std::underlying_type_t (NFC) 2022-09-25 23:14:15 -07:00
Kazu Hirata 80c698d6b9 [ADT] Use std::conditional_t (NFC) 2022-09-25 23:14:13 -07:00
Lang Hames 75404e9ef8 [JITLink] Introduce new weakly-referenced concept separate from linkage.
Introduces two new methods on Symbol: isWeaklyReferenced and
setWeaklyReferenced. These are now used to track/set whether an external symbol
is weakly referenced, rather than having the Symbol's linkage set to weak.

This change is a first step towards proper handling of weak defs used across
JITDylib boundaries: It frees up the Linkage field on external symbols so that
it can be used to represent the linkage of the definition that the symbol resolves
to. It is expected that Platform plugins will use this information to track
locations that need to be updated if the selected weak definition changes (e.g.
because JITDylibs were dlclosed and then dlopened again in a different order).
2022-09-25 20:34:45 -07:00
Yeting Kuo 43c5fbdd3a [VP][RISCV] Add vp.sqrt intrinsic and RISC-V support.
The patch modeled vp.fabs patch D132793.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D133690
2022-09-26 10:47:40 +08:00
James Y Knight 5351878ba1 [TableGen] Add useDeprecatedPositionallyEncodedOperands option.
Summary:
The existing undefined-bitfield-to-operand matching behavior is very
hard to understand, due to the combination of positional and named
matching. This can make it difficult to track down a bug in a target's
instruction definitions.

Over the last decade, folks have tried to work-around this in various
ways, but it's time to finally ditch the positional matching. With
https://reviews.llvm.org/D131003, there are no longer cases that
_require_ positional matching, and it's time to start removing usage
and support for it.

Therefore: add a (default-false) option, and set it to true only in
those targets that require positional matching today. Subsequent
changes will start cleaning up additional in-tree targets.

NOTE TO OUT OF TREE TARGET MAINTAINERS:

If this change breaks your build, you may restore the previous
behavior simply by adding:
  let useDeprecatedPositionallyEncodedOperands = 1;
to your target's InstrInfo tablegen definition. However, this is
temporary -- the option will be removed in the future.

If your target does not set 'decodePositionallyEncodedOperands', you
may thus start migrating to named operands. However, if you _do_
currently set that option, I recommend waiting until a subsequent
change lands, which adds decoder support for named sub-operands.

Differential Revision: https://reviews.llvm.org/D134073
2022-09-24 09:40:45 -04:00
Teresa Johnson b1926f308f Restore "[MemProf] Memprof profile matching and annotation"
This reverts commit 794b7ea960, and
thus restores commit a212d8da94, and
follow on fixes 0cd6763fa9,
e9ff53d42f, and
37c6a25e9a.

Use a hash function (BLAKE3) instead of hash_combine/hash_code which are
not guaranteed to be stable across executions.

Additionally, it adds a "REQUIRES: x86_64-linux" to the tests that have
raw profile inputs to avoid failures on big endian bots.

Reviewers: snehasish, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D128142
2022-09-23 11:38:47 -07:00
Leonard Chan 79565766be Reland "[llvm] Support forward-referenced globals with dso_local_equivalent"
This reverts commit eef5db2c74.

See https://github.com/llvm/llvm-project/issues/57815.

dso_local_equivalent would fail with an assertion on forward-referenced
globals. This is an issue that only comes up in textual IR, which is why
we've never seen this assertion with clang.

Differential Revision: https://reviews.llvm.org/D134234
2022-09-23 18:32:07 +00:00
Kazu Hirata 0a5d4355c9 Revert "[Analysis] Make members of InlineCost const (NFC)"
This reverts commit 9ac934d1bf.

I'm reverting this to accommodate Yevgeny Rouban's downstream branch
where a field of InlineCost changes.
2022-09-23 09:41:21 -07:00
Josh Stone 4dcfb09e40 [NFC][CodeGen] Use const MF in TargetLowering stack probe functions
This makes them callable from places like canUseAsPrologue.

Differential Revision: https://reviews.llvm.org/D134492
2022-09-23 09:30:32 -07:00
Simon Pilgrim a6e9141505 [TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436)
The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both.

This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling.

Fixes #50778

Differential Revision: https://reviews.llvm.org/D111968
2022-09-23 14:03:18 +01:00
Caroline Concatto 5431bf27bd [AArch64]Remove svget/svset/svcreate from llvm
This patch removes the aarch64 instrinsic svget/svset/svcreate from llvm.
It also implements the InstCombine for vector.extract that used to be in svget.

Depends on: D131547

Differential Revision: https://reviews.llvm.org/D131548
2022-09-23 10:48:43 +01:00
Nikita Popov 14947cc4cd [IR] Handle assume intrinsics in hasClobberingOperandBundle()
Operand bundles on assumes do not read or write -- we correctly
modelled the read side of this, but not the write side. In practice
this did not matter because of how the method is used, but this
will become relevant for a future patch.
2022-09-23 10:26:58 +02:00
Daniel Kiss 7e1a873872 [Arm][AArch64] Make getArchFeatures to use TargetParser.def
Prefixing the the SubArch with plus sign makes the ArchFeature name.

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D134349
2022-09-23 10:25:37 +02:00
Yashwant Singh 3fe71809a5 Introduce predicate for a atomic operations in GMIR
Reviewed By: arsenm, sameerds

Differential Revision: https://reviews.llvm.org/D134266
2022-09-23 11:34:36 +05:30
Shraiysh Vaishay 95eb5109af [OpenMP][IRBuilder] Added if clause to task
This patch adds support for if clause to task construct in OpenMP
IRBuilder.

Reviewed By: raghavendhra

Differential Revision: https://reviews.llvm.org/D130615
2022-09-23 01:39:41 +00:00
gonglingqin ac295597a8 [LoongArch] Add codegen support for atomicrmw add/sub/nand/and/or/xor operation
Differential Revision: https://reviews.llvm.org/D133755
2022-09-23 09:32:11 +08:00
Teresa Johnson 794b7ea960 Revert "[MemProf] Memprof profile matching and annotation"
This reverts commit a212d8da94, and follow
on fixes 0cd6763fa9,
e9ff53d42f, and
37c6a25e9a.

After re-reading the documentation for hash_combine, I don't think this
is the appropriate hash function to use for computing the hash to use as
a stack id in the metadata, since it is not guaranteed to produce stable
values across executions. I have not hit this problem, but plan to
switch to using an MD5 hash. I am hitting an issue with one of the bots
(https://lab.llvm.org/buildbot/#/builders/171/builds/20732)
where the values produced are only the lower 32 bits of the expected
hash values, however, which I assume is related to the implementation of
hash_combine and hash_code.

I believe I fixed all of the other bot failures with the follow on fixes,
which I'll merge into the new version before reapplying.
2022-09-22 16:08:03 -07:00
Leonard Chan f7d674910d [llvm] Assert two ValIDs are the same kind before comparing
I suspect the reason for why D134234 was failing sometimes is because
"operator<" for a ValID could compare ValIDs of different kinds but have
the same non-active values and return an incorrect result. This is an
issue if I attempt to store ValIDs of different kinds in an std::map but
we compare different "active" values. For example, if I create an
std::map and store some ValIDs of kind t_GlobalName, then I insert a
ValID of kind t_GlobalID, the current "operator<" will see that one of
the operands is a t_GlobalID and compare it against the UIntVal of other
items in the map, but the other items in the map don't set UIntVal
because they're not t_GlobalIDs, so I compare against a
dummy/uninitialized value.

It seems pretty easy to add mixed ValID kinds into an std::map in
LLParser, so this just asserts that when doing the comparison that both
ValIDs are the same kind.

Differential Revision: https://reviews.llvm.org/D134488
2022-09-22 22:38:02 +00:00
Philip Reames 60c91fd364 [RISCV] Disallow scale for scatter/gather
RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely.

This has two effects:
* First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to.
* Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.)

The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics.

Differential Revision: https://reviews.llvm.org/D134382
2022-09-22 15:31:26 -07:00
Arthur Eubanks a8f1da128d [LazyCallGraph] Handle spurious ref edges when deleting a dead function
Spurious ref edges are ref edges that still exist in the call graph even
though the corresponding IR reference no longer exists. This can cause
issues when deleting a dead function which has a spurious ref edge
pointed at it because currently we expect the dead function's RefSCC to
be trivial.

In the case that the dead function's RefSCC is not trivial, remove all
ref edges from other nodes in the RefSCC to it.

Removing a ref edge can result in splitting RefSCCs. There's actually no
reason to revisit those RefSCCs because currently we only run passes on
SCCs, and we've already added all SCCs in the RefSCC to the worklist.
(as opposed to removing the ref edge in
updateCGAndAnalysisManagerForPass() which can modify the call graph of
SCCs we have not visited yet). We also don't expect that RefSCC
refinement will allow us to glean any more information for optimization
use. Also, doing so would drastically increase the complexity of
LazyCallGraph::removeDeadFunction(), requiring us to return a list of
invalidated RefSCCs and new RefSCCs to add to the worklist.

Fixes #56503

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D133907
2022-09-22 15:01:15 -07:00
Dan Palermo 44c734af9a [OpenMP][NFC] Fix wavesize build warning in OMPGridValues
Differential Revision: https://reviews.llvm.org/D134459
2022-09-22 21:53:59 +00:00
Craig Topper 52708be182 [RISCV] Remove support for the unratified Zbe, Zbf, and Zbm extensions.
These extensions do not appear to be on their way to ratification.
2022-09-22 13:04:41 -07:00
Teresa Johnson a212d8da94 [MemProf] Memprof profile matching and annotation
Profile matching and IR annotation for memprof profiles.

See also related RFCs:
RFC: Sanitizer-based Heap Profiler [1]
RFC: A binary serialization format for MemProf [2]
RFC: IR metadata format for MemProf [3]*

* Note that the IR metadata format has changed from the RFC during
implementation, as described in the preceeding patch adding the basic
metadata and verification support.

The matching is performed during the normal PGO annotation phase, to
ensure that the inlines applied in the IR at that point are a subset
of the inlines in the profiled binary and thus reflected in the
profile's call stacks. This is important because the call frames are
associated with functions in the profile based on the inlining in the
symbolized call stacks, and this simplifies locating the subset of
profile data relevant for matching onto each function's IR.

The PGOInstrumentationUse pass is enhanced to perform matching for
whatever combination of memprof and regular PGO profile data exists in
the profile.

Using the utilities introduced in D128854:
The memprof profile data for each context is converted to "cold" or
"notcold" based on parameterized thresholds for size, access count, and
lifetime. The memprof allocation contexts are trimmed to the minimal
amount of context required to uniquely identify whether the context is
cold or not cold. For allocations where all profiled contexts have the
same allocation type, no memprof metadata is attached and instead the
allocation call is directly annotated with an attribute specifying the
alloction type. This is the same attributed that will be applied to
allocation calls once cloned for different contexts, and later used
during LibCall simplification to emit allocation hints [4].

Depends on D128141 and D128854.

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html
[2] https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html
[3] https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165
[4] ab87cf382d

Differential Revision: https://reviews.llvm.org/D128142
2022-09-22 12:48:31 -07:00
Jonathan Camilleri 4cd7529e4c [clang][DebugInfo] Emit access specifiers for typedefs
The accessibility level of a typedef or using declaration in a
struct or class was being lost when producing debug information.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D134339
2022-09-22 17:08:41 +00:00
Philip Reames 47afaf2eb0 [llvm-exegesis] Initialize all supported targets
Enable registration of multiple exegesis targets at once. Use idiomatic approach to defining target select macros, but leave code in the llvm-mca sub-directories for now.

This is a stepping stone towards allowing llvm-exegesis benchmarking via simulator or testing in non-target dependent tests.

Differential Revision: https://reviews.llvm.org/D133605
2022-09-22 09:02:22 -07:00
luxufan 2e9118f1e4 [MemorySSA] Reset location size if IsGuaranteedLoopInvariant after phi tranlation
We set the Location size to beforeOrAfter if the Location value is not
guaranteed loop invariant. But in some cases, we need to reset the
location size if the location size is precise after phi tranlation of
location value. This will improve MemorySSA analysis results.

Differential Revision: https://reviews.llvm.org/D134161
2022-09-22 05:08:09 +00:00
Tim Northover 677da09d02 AArch64: add support for newer Apple CPUs
They're roughly ARMv8.6. This works in the .td file, but in
AArch64TargetParser.def, marking them v8.6 brings in support for the SM4
cryptographic hash and we don't actually have that. So TargetParser side
they're marked as v8.5, with the extra features (BF16 and I8MM added manually).

Finally, A16 supports the HCX extension in addition to v8.6. This has no
TargetParser implications.
2022-09-22 11:58:51 +01:00
Christudasan Devadasan 32a8260ccc -dot-machine-cfg for printing MachineFunction to a dot file
This pass allows a user to dump a MIR function to a dot file
and view it as a graph. It is targeted to provide a similar
functionality as -dot-cfg pass on LLVM-IR. As of now the pass
also support below flags:
-dot-mcfg-only [optional][won't print instructions in the
graph just block name]
-mcfg-dot-filename-prefix [optional][prefix to add to output dot file]
-mcfg-func-name [optional] [specify function name or it's
substring, handy if mir file contains multiple functions and
you need to see graph of just one]

More flags and details can be introduced as per the requirements
in future. This pass is inspired from -dot-cfg IR pass and APIs
are written in almost identical format.

Patch by Yashwant Singh <Yashwant.Singh@amd.com> (yassingh)

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D133709
2022-09-22 12:48:33 +05:30
Craig Topper 182aa0cbe0 [RISCV] Remove support for the unratified Zbp extension.
This extension does not appear to be on its way to ratification.

Still need some follow up to simplify the RISCVISD nodes.
2022-09-21 21:22:42 -07:00
Congzhe Cao 6782d71680 [LoopPassManager] Ensure to construct loop nests with the outermost loop
This patch is to resolve the bug reported and discussed in
https://reviews.llvm.org/D124926#3718761 and https://reviews.llvm.org/D124926#3719876.

The problem is that loop interchange is a loopnest pass under the new pass manager,
but the loop nest may not be constructed correctly by the loop pass manager after
running loop interchange and before running the next pass, which might cause problems
when it continues running the next pass.

The reason that the loop nest is constructed incorrectly is that the outermost
loop might have changed after interchange, and what was the original outermost
loop is not the current outermost loop anymore. Constructing the loop nest based
on the original outermost loop would generate an invalid loop nest.

The fix in this patch is that, in the loop pass manager before running each loopnest
pass, we re-cosntruct the loop nest based on the current outermost loop, if LPMUpdater
notifies the loop pass manager that the previous loop nest has been invalidated by passes
like loop interchange.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D132199
2022-09-21 23:59:26 -04:00
Chris Bieneman e77c40ffbd [NFC] Make dxil namespace consistent
We have namespaces `DXIL` and `dxil`, which is just confusing. This
renames `DXIL` -> `dxil` making everything consistent.

While the LLVM coding standards don't have a clear direction here, I
chose lower case because by my current unscientific count there are
more places where we had the lowercase namespace than the uppercase.
2022-09-21 17:48:13 -05:00
Kazu Hirata 9ac934d1bf [Analysis] Make members of InlineCost const (NFC)
Once we create an instance of InlineCost, we don't change its
contents.

Differential Revision: https://reviews.llvm.org/D134388
2022-09-21 14:47:24 -07:00
Leonard Chan eef5db2c74 Revert "[llvm] Support forward-referenced globals with dso_local_equivalent"
This reverts commit 411020ad1c.

One of the tests here fails on some upstream builders:
https://lab.llvm.org/buildbot#builders/16/builds/35314
2022-09-21 20:14:30 +00:00
Leonard Chan 411020ad1c [llvm] Support forward-referenced globals with dso_local_equivalent
See https://github.com/llvm/llvm-project/issues/57815.

dso_local_equivalent would fail with an assertion on forward-referenced
globals. This is an issue that only comes up in textual IR, which is why
we've never seen this assertion with clang.

Differential Revision: https://reviews.llvm.org/D134234
2022-09-21 19:31:35 +00:00
Scott Linder 552539bdac Revert "[NFC][AMDGPU] Refactor AMDGPUDisassembler"
This reverts commit f583151461.
2022-09-21 18:48:42 +00:00
Guozhi Wei c39311eb40 [RegisterCoalescer] Use LiveRangeEdit to handle rematerialization
This patch uses the API provided by LiveRangeEdit to handle rematerialization.
It will make future maintenance and improvement more easier.

No functional change.

Differential Revision: https://reviews.llvm.org/D133610
2022-09-21 17:51:07 +00:00
zhijian 3238951b61 Refactor XCOFFObjectFile::getImportFileTable.
Summary:

  1. Refactor with XCOFFObjectFile::getImportFileTable with function getSectionFileOffsetToRawData instead of getLoaderSectionAddress

  2. Delete the function getLoaderSectionAddress.

Reviewers: James Henderson,Esme Yi
Differential Revision: https://reviews.llvm.org/D134280
2022-09-21 12:51:38 -04:00
Matt Arsenault 0d1f040749 LICM: Pass through AssumptionCache 2022-09-21 09:16:17 -04:00
David Green 4f78e022ee [AArch64] Lower scalar sqxtn intrinsics to use fp registers
The llvm.aarch64.neon.scalar.sqxtn.i32.i64 intrinsics take and return
integer types, but operate on fp registers. This can create some
inefficiencies in their lowering, where the registers are converted to
fp a little too late. This patch adds lowering for the intrinsics,
creating bitcasts to/from fp types to allow nicer folding later when the
instructions are selected, especially around insert/extracts.

Differential Revision: https://reviews.llvm.org/D134024
2022-09-21 10:46:43 +01:00
Nikita Popov 17994ed919 [MemorySSA] Remove PerformedPhiTranslation flag
I believe this is no longer necessary, as the underlying problem
has been fixed in a different way: Nowadays, we will adjust the
location size to beforeOrAfterPointer() if the pointer is not loop
invariant. This makes merging results translated across loop
backedges safe.

The two tests in phi-translation.ll show an improvement while still
being correct: The loads in the loop no longer alias with noalias
pointers, but still alias with the store in the entry block (which
they originally did not -- this is the bug that
PerformedPhiTranslation originally fixed).

Differential Revision: https://reviews.llvm.org/D133404
2022-09-21 10:32:09 +02:00
Chen Zheng c941d925b0 [MachineCycle][NFC] add a cache for block and its top level cycle
This solves https://github.com/llvm/llvm-project/issues/57664

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D134019
2022-09-21 01:31:04 -04:00
Craig Topper 70a64fe7b1 [RISCV] Remove support for the unratified Zbt extension.
This extension does not appear to be on its way to ratification.

Out of the unratified bitmanip extensions, this one had the
largest impact on the compiler.

Posting this patch to start a discussion about whether we should
remove these extensions. We'll talk more at the RISC-V sync meeting this
Thursday.

Reviewed By: asb, reames

Differential Revision: https://reviews.llvm.org/D133834
2022-09-20 20:26:48 -07:00
Shubham Sandeep Rastogi 636de2bf34 Change isLittleEndian to follow llvm style and add an accessor
Differential Revision: https://reviews.llvm.org/D134290
2022-09-20 17:00:47 -07:00
Scott Linder f583151461 [NFC][AMDGPU] Refactor AMDGPUDisassembler
Clean up ahead of a patch to fix bugs in the AMDGPUDisassembler.

Use lit.local.cfg substitutions and more idiomatic use of split-file to
simplify and extend existing kernel-descriptor disassembly tests.

Add a comment to AMDHSAKernelDescriptor.h, as at least one small set
towards keeping all kernel-descriptor sensitive code in sync.

Reviewed By: kzhuravl, arsenm

Differential Revision: https://reviews.llvm.org/D130105
2022-09-20 20:37:19 +00:00
Kazu Hirata 00874c48ea [IPO] Reorder parameters of InlineFunction (NFC)
With the recent addition of new parameter MergeAttributes (D134117),
callers need to specify several default parameters before getting to
specify the new parameter.

This patch reorders the parameters so that callers do not have to
specify as many default parameters.

Differential Revision: https://reviews.llvm.org/D134125
2022-09-20 09:09:38 -07:00
Eric Li 403d72cd43 [Support][NFC] Clarify function comment
Follow-up to 86118ec2 that addresses the comments in D134072, which
were accidentally left off of the commit.
2022-09-20 11:10:16 -04:00
Eric Li 86118ec2d0 [Support] Provide access to the full mapping in llvm::Annotations
Providing access to the mapping of annotations allows test helpers to
be expressive by using the annotations as expectations. For example, a
matcher could verify that all annotated points were matched by a
matcher, or that an refactoring surgically modifies specific ranges.

Differential Revision: https://reviews.llvm.org/D134072
2022-09-20 11:06:21 -04:00
Vy Nguyen 016c2f5e32 [lld-macho] Support -dyld_env
This arg is undocumented but from looking at the code + experiment, it's used to add additional DYLD_ENVIRONMENT load commands to the output.

Differential Revision: https://reviews.llvm.org/D134058
2022-09-20 10:16:45 -04:00
Caroline Concatto d32b8fdbdb [LLVM][AArch64] Replace aarch64.sve.ld by aarch64.sve.ldN.sret
This patch removes the intrinsic aarch64.sve.ldN from tablegen in favour of
using arch64.sve.ldN.sret.

Depends on: D133023

Differential Revision: https://reviews.llvm.org/D133025
2022-09-20 13:15:07 +01:00
luxufan bfd31c6f12 [MemorySSA][NFC] Use const whenever possible
Differential Revision: https://reviews.llvm.org/D134162
2022-09-20 02:21:02 +00:00
Matt Arsenault 2adae8e1b7 VectorCombine: Pass through AssumptionCache 2022-09-19 19:25:22 -04:00
Matt Arsenault ce44357216 Analysis: Add AssumptionCache to isSafeToSpeculativelyExecute
Does not update any of the uses.
2022-09-19 19:25:22 -04:00
Matt Arsenault 34fb7803f8 GlobalISel: Pass through AssumptionCache 2022-09-19 19:10:51 -04:00
Matt Arsenault bcb931c484 SelectionDAG: Add AssumptionCache analysis dependency
Fixes compile time regression after
bb70b5d406
2022-09-19 19:10:51 -04:00
Matt Arsenault 0d8ffcc532 Analysis: Add AssumptionCache argument to isDereferenceableAndAlignedPointer
This does not try to pass it through from the end users.
2022-09-19 18:57:33 -04:00
Mircea Trofin c625c17b88 [lld][thinlto] Include -mllvm options in the thinlto cache key
They may modify thinlto optimization.

This patch only extends support for `-mllvm`. There is another way to
pass llvm flags, `-plugin-opt`, but its processing is different and will
be provided in a subsequent patch.

Differential Revision: https://reviews.llvm.org/D134013
2022-09-19 12:04:17 -07:00
Fangrui Song 0b140d0910 [Object] Add zstd decompression support to Decompressor
llvm::object::Decompressor is used by many DWARF consumers like llvm-dwarfdump,
llvm-dwp, llvm-symbolizer. Add tests to them. The lldb test can be left to
D133530.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D134116
2022-09-19 11:41:16 -07:00
zhijian dcd5abd4c4 [AIX] llvm-readobj support a new option --exception-section for xcoff object file.
Summary:

llvm-readobj support a new option --exception-section for xcoff object file.

https://www.ibm.com/docs/en/aix/7.2?topic=formats-xcoff-object-file-format#XCOFF__iua3i23ajbau

Reviewers:  James Henderson,Paul Scoropan

Differential Revision: https://reviews.llvm.org/D133030
2022-09-19 10:55:48 -04:00
Nikita Popov dd61726d5b Revert "[SimplifyCFG] accumulate bonus insts cost"
This reverts commit e5581df60a.

This causes major compile-time regressions, about 2-3% end-to-end
on CTMark.
2022-09-19 14:46:43 +02:00
Max Kazantsev 818b1ab84e [SCEV][NFC] Remove unused parameter from forgetLoopDispositions
Let's be honest about it, we don't drop loop dispositions for
particular loops. Remove the parameter that misleadingly makes
it apparent that we do.
2022-09-19 14:06:42 +07:00
Kazu Hirata 6b49f30fca [llvm] Deprecate llvm::empty (NFC)
This patch deprecates llvm::empty as I've migrated all known uses of
llvm::empty(x) to x.empty().

Differential Revision: https://reviews.llvm.org/D134141
2022-09-18 22:01:32 -07:00
Kazu Hirata 2078350645 Use std::make_unsigned_t (NFC) 2022-09-18 18:41:02 -07:00
Lang Hames 0e43f3b04d [ORC][ORC-RT] Make WrapperFunctionCall::Create support void functions.
Serialized calls to void-wrapper-functions should have zero bytes of argument
data, but accessing ArgData[0] may (and will, in the case of SmallVector) fail
if the argument data buffer is empty.

This commit fixes the issue by adding a check for empty argument buffers.
2022-09-18 17:53:45 -07:00
Yaxun (Sam) Liu e5581df60a [SimplifyCFG] accumulate bonus insts cost
SimplifyCFG folds

bool foo() {
  if (cond1) return false;
  if (cond2) return false;
  return true;
}

as

bool foo() {
  if (cond1 | cond2) return false
  return true;
}

'cond2' is called 'bonus insts' in branch folding since they introduce overhead
since the original CFG could do early exit but the folded CFG always executes
them. SimplifyCFG calculates the costs of 'bonus insts' of a folding a BB into
its predecessor BB which shares the destination. If it is below bonus-inst-threshold,
SimplifyCFG will fold that BB into its predecessor and cond2 will always be executed.

When SimplifyCFG calculates the cost of 'bonus insts', it only consider 'bonus' insts
in the current BB to be considered for folding. This causes issue for unrolled loops
which share destinations, e.g.

bool foo(int *a) {
  for (int i = 0; i < 32; i++)
    if (a[i] > 0) return false;
  return true;
}

After unrolling, it becomes

bool foo(int *a) {
  if(a[0]>0) return false
  if(a[1]>0) return false;
  //...
  if(a[31]>0) return false;
  return true;
}

SimplifyCFG will merge each BB with its predecessor BB,
and ends up with 32 'bonus insts' which are always executed, which
is much slower than the original CFG.

The root cause is that SimplifyCFG does not consider the
accumulated cost of 'bonus insts' which are folded from
different BB's.

This patch fixes that by introducing a ValueMap to track
costs of 'bonus insts' coming from different BB's into
the same BB, and cuts off if the accumulated cost
exceeds a threshold.

Reviewed by: Artem Belevich, Florian Hahn, Nikita Popov, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D132408
2022-09-18 20:21:14 -04:00
Kazu Hirata 82293ed486 [ModuleInliner] Remove unused using declarations (NFC) 2022-09-18 14:27:06 -07:00
Kazu Hirata 3e720fa9dc Use std::decay_t (NFC) 2022-09-18 10:25:08 -07:00
Kazu Hirata 5e5a6c5b07 Use std::conditional_t (NFC) 2022-09-18 10:25:06 -07:00
Kazu Hirata 919b638eff [ADT] Use std::common_type_t (NFC) 2022-09-18 10:25:04 -07:00
Kazu Hirata d3b95ecc98 [ModuleInliner] Remove InlineOrder::front (NFC)
InlineOrder::front is a remnant from the era when we had a nested
"while" loops in the module inliner, with the inner one grouping the
call sites with the same caller.

Now that we have a simple "while" loop draining the priority queue, we
can just use InlineOrder::pop.

Differential Revision: https://reviews.llvm.org/D134121
2022-09-18 08:49:44 -07:00
Kazu Hirata 284f0397e2 [Transforms] Merge function attributes within InlineFunction (NFC)
In the past, we've had a bug resulting in a compiler crash after
forgetting to merge function attributes (D105729).

This patch teaches InlineFunction to merge function attributes.  This
way, we minimize the "time" when the IR is valid, but the function
attributes are not.

Differential Revision: https://reviews.llvm.org/D134117
2022-09-17 23:10:23 -07:00
Kai Nacke ae35188f97 [GISel] Fix match tree emitter.
The following changes are necessasy to get the generated tree
matcher to compile:

- In CodeExpansions::declare(), the assert() prevents connecting
  two instructions. E.g. the match code
    (match (MUL $t, $s1, $s2),
           (SUB $d, $t, $s3)),
  results in two declarations of $t, one for the def and one for
  the use. Removing the assertion allows this construct.
  If $t is later used, it is one of the operands, which should be
  perfectly fine.
- The code emitted in GIMatchTreeVRegDefPartitioner::generatePartitionSelectorCode()
  is not compilable:
  - The value of NewInstrID should be emitted, not the name
  - Both calls involving getOperand() end with one parenthesis too many
- Swaps generated condition for the partition code in the latter function

It also changes the rules i2p_to_p2i, fabs_fabs_fold, and fneg_fneg_fold
to use the tree matcher for a linear match. These rules are tested by:

CodeGen/AArch64/GlobalISel/combine-fabs.mir
CodeGen/AArch64/GlobalISel/combine-fneg.mir
CodeGen/AArch64/GlobalISel/combine-ptrtoint.mir
CodeGen/AMDGPU/GlobalISel/combine-add-nullptr.mir

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D133257
2022-09-18 00:00:15 +00:00
Aiden Grossman e5e3dccd07 [mlgo] Add in-development instruction based features for regalloc advisor
This patch adds in instruction based features to the regalloc advisor
gated behind a flag so a user can decide at runtime whether or not they
want to enable the feature. The features are only enabled when LLVM is
compiled in MLGO develpment mode (LLVM_HAVE_TF_API) is set to true.

To extract the instruction features, I'm taking a list of segments from
each LiveInterval and noting the start and end SlotIndices. This list is then
sorted based on the start SlotIndex and I iterate through each SlotIndex
to grab instructions, making sure to check for overlaps. This results in
a vector of opcodes and binary mapping matrix that maps live ranges to the
opcodes of the instructions within that LR.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D131930
2022-09-17 19:54:45 +00:00
Kazu Hirata 20d764aff0 [llvm] Don't including SetVector.h (NFC)
llvm/lib/ProfileData/RawMemProfReader.cpp uses SetVector without
including SetVector.h, so this patch adds an appropriate #include
there.
2022-09-17 12:36:43 -07:00
Fangrui Song 367997d0d6 [Support] Rename llvm::compression::{zlib,zstd}::uncompress to more appropriate decompress
This improves consistency with other places (e.g. llvm::compression::decompress,
llvm::object::Decompressor::decompress, llvm-objcopy).
Note: when zstd::uncompress was added, we noticed that the API `ZSTD_decompress`
is fine while the zlib API `uncompress` is a misnomer.
2022-09-17 12:35:17 -07:00
Kazu Hirata 6170437df5 [ModuleInliner] Remove unnecessary #includes (NFC)
While I am at it, this patch removes an unnecessary forward
declaration.
2022-09-17 12:05:35 -07:00
Kazu Hirata 5faf4bf195 [ModuleInliner] Move UseInlinePriority to InlineOrder.cpp (NFC)
UseInlinePriority specifies the priority function.  This patch
simplifies the code by moving UseInlinePriority closer to the actual
consumer -- the switch statement inside getInlineOrder.

Differential Revision: https://reviews.llvm.org/D134100
2022-09-17 11:41:28 -07:00
Sander de Smalen bed214cf0f [AArch64][SME] Add intrinsics for enabling/disabling ZA.
This adds the intrinsics:
* void @llvm.aarch64.sme.za.enable() -> smstart za
* void @llvm.aarch64.sme.za.disable()  -> smstop za

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D133894
2022-09-17 16:41:42 +00:00
Shivam Gupta 78533528cf [NFC] Fix a comment in InitializePasses.h 2022-09-17 19:09:26 +05:30
Kazu Hirata 29c841ce93 Revert "[llvm] Remove llvm::is_trivially_{copy/move}_constructible (NFC)"
This reverts commit 01ffe31cbb.

A build breakage with GCC 7.3 has been reported:

https://reviews.llvm.org/D132311#3797053

FWIW, GCC 7.5 is OK according to Pavel Chupin.  I also personally
tested GCC 8.4.0.
2022-09-16 18:26:20 -07:00
Kazu Hirata 6e30a9cc08 [Inliner] Retire DefaultInlineOrder (NFC)
DefaultInlineOrder was largely an exercise in generalizing the
traversal order of call sites within the inliner.

Now that the module inliner is starting to form its shape, there is no
point in sharing DefaultInlineOrder between the module inliner and the
CGSCC inliner.  DefaultInlineOrder and all the other inline orders are
mutually exclusive in the following sense:

- The use of DefaultInlineOrder doesn't make sense in the module
  inliner because there is no priority inherent in the order in which
  call sites are added to the list of call sites -- SmallVector.

- The use of any other inline order doesn't make sense in the CGSCC
  inliner because little prioritization can be done within one CGSCC.

This patch essentially reverts the addition of DefaultInlineOrder so
that the loop structure of Inliner.cpp looks like the state just
before we started working on the module inliner (circa June 2021).

At the same time, ww remove the choice of DefaultInlineOrder from
UseInlinePriority.

Differential Revision: https://reviews.llvm.org/D134080
2022-09-16 15:36:40 -07:00
Jessica Paquette 1076b31da8 [GlobalISel] Combine select + fcmp to fminnum/fmaxnum/fminimum/fmaximum
This is a partial port of the code used by the SelectionDAGBuilder to
translate selects.

In particular, see matchSelectPattern in ValueTracking.cpp. This is a
GISel-equivalent of the portion which handles fminnum/fmaxnum/fminimum/fmaximum.

I tried to set it up so it'd be easy to add the non-FP cases. Those are simpler.
On the AArch64-end, it seems like the FP cases are more important for perf
right now, so I bit the bullet and went at the more complicated problem. :)

I elected to do this as a post-legalize combine rather than in the
IRTranslator because

Deciding which fmax/fmin to use can depend on legalization rules
Philosophically-speaking (TM), putting it in a combine just feels cleaner

Being able to enable/disable the combine is handy
Another option would be to use the ValueTracking code in the IRTranslator and
match what SelectionDAGBuilder::visitSelect does. I think that may be somewhat
annoying since we'd need to write lowerings back into the selects in the
legalizer. I'm not strongly opposed to the approach.

We'd also want to be careful with vector selects once that's implemented,
which explicitly check if a vector select is legal on the target. That'd
probably need a hook.

From what I can tell, doing this as a combine is probably a cleaner option
long-term.

Differential Revision: https://reviews.llvm.org/D116702
2022-09-16 13:35:46 -07:00
David Majnemer 8a868d8859 Revert "Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView""
This reverts commit cd20a18286 and adds a
"let Heading" to NoStackProtectorDocs.
2022-09-16 19:39:48 +00:00
Kazu Hirata e0bc76eb23 [ModuleInliner] Move InlinePriority and its derived classes to InlineOrder.cpp (NFC)
These classes are referred to only from getInlineOrder in
InlineOrder.cpp.  This patch hides the entire class declarations and
definitions in InlineOrder.cpp.

Differential Revision: https://reviews.llvm.org/D134056
2022-09-16 12:32:16 -07:00
Justin Lebar 8cc3bfd13f
[NFC] Fix indentation in ValueTracking.h.
In a separate patch I want to modify ValueTracking.h.  When I touch the
header, arc wants to clang-format the lines I touch (reasonable!).  But
then these whitespace changes get mixed into my patch.
2022-09-16 10:46:23 -07:00
mbs 7061a3f3f8 [support] Prepare TimeProfiler for cross-thread support
This NFC prepares the TimeProfiler to support the construction
and completion of time profiling 'entries' across threads.

Add ClockType alias so we can change the clock in one place.
(trivial) Use c++ usings instead of typedefs
Rename Entry to TimeTraceProfilerEntry since this type will eventually become public.
Add an intro comment.
Add some smoke unit tests.

Reviewed By: russell.gallop, rriddle, lattner, jloser

Differential Revision: https://reviews.llvm.org/D133153
2022-09-16 10:20:18 -06:00
Yuta Mukai 116838b151 [MachinePipeliner] Fix the interpretation of the scheduling model
The method of counting resource consumption is modified to be based on
"Cycles" value when DFA is not used.

The calculation of ResMII is modified to total "Cycles" and divide it
by the number of units for each resource. Previously, ResMII was
excessive because it was assumed that resources were consumed for
the cycles of "Latency" value.

The method of resource reservation is modified similarly. When a
value of "Cycles" is larger than 1, the resource is considered to be
consumed by 1 for cycles of its length from the scheduled cycle.
To realize this, ResourceManager maintains a resource table for all
slots. Previously, resource consumption was always 1 for 1 cycle
regardless of the value of "Cycles" or "Latency".

In addition, the number of micro operations per cycle is modified to
be constrained by "IssueWidth". To disable the constraint,
--pipeliner-force-issue-width=100 can be used.

For the case of using DFA, the scheduling results are unchanged.

Reviewed By: dpenry

Differential Revision: https://reviews.llvm.org/D133572
2022-09-16 09:51:48 +09:00
Navid Emamdoost 3e52c0926c Add -fsanitizer-coverage=control-flow
Reviewed By: kcc, vitalybuka, MaskRay

Differential Revision: https://reviews.llvm.org/D133157
2022-09-15 15:56:04 -07:00
Alexander Timofeev fbdea5a2e9 [AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode
This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler.  It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D133593
2022-09-15 22:03:56 +02:00
Florian Hahn 81a11da762
[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl.
This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops
using a wide shuffle  creating a v64i8 vector, selecting groups of 3
zero elements and an element from the input.

This is profitable on AArch64 where such shuffles can be lowered to tbl
instructions, but only in loops, because it requires materializing 4
masks, which can be done in the loop preheader.

This is the only reason the transform is part of CGP. If there's a
better alternative I missed, please let me know. The same goes for the
shouldReplaceZExtWithShuffle hook which guards this. I am not sure if
this transform will be beneficial on other targets, but it seems like
there is no way other convenient way.

This improves the generated code for loops like the one below in
combination with D96522.

    int foo(uint8_t *p, int N) {
      unsigned long long sum = 0;
      for (int i = 0; i < N ; i++, p++) {
	unsigned int v = *p;
	sum += (v < 127) ? v : 256 - v;
      }
      return sum;
    }

https://clang.godbolt.org/z/Wco866MjY

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D120571
2022-09-15 19:18:13 +01:00
Sergei Barannikov c6acb4eb0f [SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s
All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit.  This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.
2022-09-15 14:02:12 -04:00