Commit Graph

151193 Commits

Author SHA1 Message Date
Hans Wennborg 1e9afab875 Re-apply "[JumpThreading] Ignore free instructions"
It seems the crashes we saw wasn't caused by this (see comments on the review).

> This is basically D108837 but for jump threading. Free instructions
> should be ignored for the threading decision. JumpThreading already
> skips some free instructions (like pointer bitcasts), but does not
> skip various free intrinsics -- in fact, it currently gives them a
> fairly large cost of 2.
>
> Differential Revision: https://reviews.llvm.org/D110290

This reverts commit 4604695d7c.
2021-09-24 18:52:30 +02:00
Stanislav Mekhanoshin 082e22f3d7 [AMDGPU] Always reserve flat scratch SGPR for architected flat scratch
With architected flat scratch it becomes readonly. We must always
reserve SGPR pair for it even if we do not use scratch at all since
an attempt to write to SGPRs mapped to FLAT_SCRATCH results in
memory violation.

This is not needed since GFX10 with architected flat scratch though
since special SGPRs are not carving space from normal SGPRs.

Differential Revision: https://reviews.llvm.org/D110376
2021-09-24 09:46:31 -07:00
Florian Hahn 6f28fb7081
Recommit "[DSE] Track earliest escape, use for loads in isReadClobber."
This reverts the revert commit df56fc6ebb.

This version of the patch adjusts the location where the EarliestEscapes
cache is cleared when an instruction gets removed. The earliest escaping
instruction does not have to be a memory instruction.

It could be a ptrtoint instruction like in the added test
@earliest_escape_ptrtoint, which subsequently gets removed. We need to
invalidate the EarliestEscape entry referring to the ptrtoint when
deleting it.

This fixes the crash mentioned in
https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6
2021-09-24 17:13:27 +01:00
Simon Pilgrim d8fc9f8727 [X86][SSE] combineMulToPMADDWD - replace sext(v8i16) -> zext(v8i16)
As suggested on D108522, if we're sign extending a v4i16 source before multiplying as a v4i32, then we can replace that with a zero extension and rely on the implicit sign-extension of PMADDWD.
2021-09-24 16:42:01 +01:00
Sanjay Patel 09e71c367a [x86] convert logic-of-FP-compares to FP logic-of-vector-compares
This is motivated by the examples and discussion in:
https://llvm.org/PR51245
...and related bugs.

By using vector compares and vector logic, we can convert 2 'set'
instructions into 1 'movd' or 'movmsk' and generally improve
throughput/reduce instructions.

Unfortunately, we don't have a complete vector compare ISA before
AVX, so I left SSE-only out of this patch. Ie, we'd need extra logic
ops to simulate the missing predicates for SSE 'cmpp*', so it's not
as clearly a win.

Differential Revision: https://reviews.llvm.org/D110342
2021-09-24 11:38:19 -04:00
Paul Robinson 1376ae9094 [TargetLibraryInfo][AMDGPU] Minor cleanup, NFC 2021-09-24 07:52:44 -07:00
Sanjay Patel 3c5500907b Revert "[InstCombine] fold cast of right-shift if high bits are not demanded (2nd try)"
This reverts commit bb9333c350.

This exposes another existing bug that causes an infinite loop as shown in
D110170
...so reverting while I look at another fix.
2021-09-24 10:47:35 -04:00
Hans Wennborg 4604695d7c Revert "[JumpThreading] Ignore free instructions"
It caused compiler crashes, see comment on the code review for repro.

> This is basically D108837 but for jump threading. Free instructions
> should be ignored for the threading decision. JumpThreading already
> skips some free instructions (like pointer bitcasts), but does not
> skip various free intrinsics -- in fact, it currently gives them a
> fairly large cost of 2.
>
> Differential Revision: https://reviews.llvm.org/D110290

This reverts commit 1e3c6fc7cb.
2021-09-24 16:14:53 +02:00
Nico Weber df56fc6ebb Revert "[DSE] Track earliest escape, use for loads in isReadClobber."
This reverts commit 5ce89279c0.
Makes clang crash, see comments on https://reviews.llvm.org/D109844
2021-09-24 09:57:59 -04:00
David Sherwood 8e4f7b749c [Analysis] Fix another issue when querying vscale attributes on functions
There are several places in the code that are currently broken where
we assume an Instruction is always a member of a BasicBlock that
lives in a Function. This is a problem specifically when
attempting to get the vscale_range attribute. This patch adds checks
that an Instruction's parent also has a parent!

I've added a test for a function-less @llvm.vscale intrinsic call here:

  unittests/Analysis/ValueTrackingTest.cpp
2021-09-24 13:37:23 +01:00
Jay Foad e4e95f14f1 [LiveIntervals] Repair live intervals that gain subranges
In repairIntervalsInRange, if the new instructions refer to subregs but
the old instructions did not, make sure any existing live interval for
the superreg is updated to have subranges. Also skip repairing any range
that we have recalculated from scratch, partly for efficiency but also
to avoids some cases that repairOldRegInRange can't handle.

The existing test/CodeGen/AMDGPU/twoaddr-regsequence.mir provides some
test coverage for this change: when TwoAddressInstructionPass converts
REG_SEQUENCE into subreg copies, the live intervals will now get
subranges and MachineVerifier will verify that the subranges are
correct. Unfortunately MachineVerifier does not complain if the
subranges are not present, so the test also passed before this patch.

This patch also fixes ~800 of the ~1500 failures in the whole CodeGen
lit test suite when -early-live-intervals is forced on.

Differential Revision: https://reviews.llvm.org/D110328
2021-09-24 11:58:08 +01:00
Jay Foad 7863cc6c1c [LiveIntervals] Fix repairOldRegInRange for simple def cases
The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange"
was over-zealous. It would bail out when the end of the range to be
repaired was in the middle of the first segment of the live range of
Reg, which was always the case when the range contained a single def of
Reg.

This patch fixes it as suggested by Matthias Braun in post-commit review
on the original patch, and tests it by adding -early-live-intervals to
a selection of existing lit tests that now pass.

(Note that D23303 was originally applied to fix a crash in
SILoadStoreOptimizer, but that is now moot since D23814 updated
SILoadStoreOptimizer to run before scheduling so it no longer has to
update live intervals.)

Differential Revision: https://reviews.llvm.org/D110238

Unrevert with some changes to the tests:
- Add -verify-machineinstrs to check for remaining problems in live
  interval support in TwoAddressInstructionPass.
- Drop test/CodeGen/AMDGPU/extract-load-i1.ll since it suffers from
  some of those remaining problems.
2021-09-24 11:44:49 +01:00
Congzhe Cao 751be2a064 [CodeMoverUtils] Enhance isSafeToMoveBefore() when moving BBs
When moving an entire basic block BB before InsertPoint, currently
we check for all instructions whether the operands dominates
InsertPoint, however, this can be improved such that even an
operand does not dominate InsertPoint, as long as it appears as
a previous instruction in the same BB, it is safe to move.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D110378
2021-09-24 05:48:15 -04:00
Hsiangkai Wang 7d39a8a921 [RISCV] (1/2) Add the tail policy argument to builtins/intrinsics.
Add the tail policy argument to LLVM IR intrinsics. There are two policies for tail elements. Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation. In order to let users control the tail policy, we add an additional argument at the end of the argument list.

For unmasked operations, we have no maskedoff and the tail policy is always tail agnostic. If users want to keep tail elements under unmasked operations, they could use all one mask in the masked operations to do it. So, we only add the additional argument for masked operations for most cases. There are exceptions listed below.

In this patch, we do not handle the following cases to reduce the complexity of the patch. There could be two separate patches for them.

* Use dest argument to control tail policy
vmerge.vvm/vmerge.vxm/vmerge.vim (add _t builtins with additional dest argument)
vfmerge.vfm (add _t builtins with additional dest argument)
vmv.v.v (add _t builtins with additional dest argument)
vmv.v.x (add _t builtins with additional dest argument)
vmv.v.i (add _t builtins with additional dest argument)
vfmv.v.f (add _t builtins with additional dest argument)
vadc.vvm/vadc.vxm/vadc.vim (add _t builtins with additional dest argument)
vsbc.vvm/vsbc.vxm (add _t builtins with additional dest argument)

* Always has tail argument for masked/unmasked intrinsics
Vector Single-Width Integer Multiply-Add Instructions (add _t and _mt builtins)
Vector Widening Integer Multiply-Add Instructions (add _t and _mt builtins)
Vector Single-Width Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins)
Vector Widening Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins)
Vector Reduction Operations (add _t and _mt builtins)
Vector Slideup Instructions (add _t and _mt builtins)
Vector Slidedown Instructions (add _t and _mt builtins)

Discussion: https://github.com/riscv/rvv-intrinsic-doc/pull/101

Differential Revision: https://reviews.llvm.org/D105092
2021-09-24 17:09:50 +08:00
Simon Pilgrim dade83c02a [X86][SLM] Fix ADDQ/SUBQ/CMPEQQ throughput to account for running on either port.
Testing on a SLM box suggests these can run on either port, but the throughput is 4cy on either (inc MMX versions). Confirmed with Intel AoM / Agner / InstLatX64.
2021-09-24 10:06:14 +01:00
David Sherwood c2634fc6ab [Analysis] Fix issues when querying vscale attributes on functions
There are several places in the code that are currently broken as
they assume an Instruction always has a parent Function when
attempting to get the vscale_range attribute. This patch adds checks
that an Instruction has a parent.

I've added a test for a parentless @llvm.vscale intrinsic call here:

  unittests/Analysis/ValueTrackingTest.cpp

Differential Revision: https://reviews.llvm.org/D110158
2021-09-24 09:58:10 +01:00
Jonas Paulsson ea92283449 [SystemZ] Implement ISD::BITCAST for fp128 -> i128.
The type legalizer has by default no method of doing this bitcast other than
storing and reloading the value from stack.

This patch implements a custom lowering of this operation using extractions
of subregs (z13 and earlier using FP128 register pairs), or of vector
elements (with 'vector enhancements 1' using VR128 FP registers).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D110346
2021-09-24 10:26:45 +02:00
Amara Emerson 9f773b17c2 [GlobalISel][IRTranslator] Fix crash during bit-test switch optimization with odd types.
Odd switch case types cause a crash in the conversion to MVT. Instead use a pointer sized
scalar type which is what SDAG does in these cases.
2021-09-24 00:19:27 -07:00
Amara Emerson 661ab70314 [AArch64][GlobalISel] Fix crash in the extend(extract_vector_elt) optimization.
It was assuming that GPR extends could only have destination sizes of 32 or 64
bits, but for AArch64 we allow < 32 bits even without matching size physregs.
2021-09-23 23:07:16 -07:00
Lang Hames ef391df2b6 [ORC] Rename ExecutorAddress to ExecutorAddr.
Removing the 'ess' suffix improves the ergonomics without sacrificing clarity.
Since this class is likely to be used more frequently in the future it's worth
some short term pain to fix this now.
2021-09-23 20:35:17 -07:00
Lang Hames a2c1cf09df [ORC] Introduce EPCGenericDylibManager / SimpleExecutorDylibManager.
EPCGenericDylibManager provides an interface for loading dylibs and looking up
symbols in the executor, implemented using EPC-calls to functions in the
executor.

SimpleExecutorDylibManager is an executor-side service that provides the
functions used by EPCGenericDylibManager.

SimpleRemoteEPC is updated to use an EPCGenericDylibManager instance to
implement the ExecutorProcessControl loadDylib and lookup methods. In a future
commit these methods will be removed, and clients updated to use
EPCGenericDylibManagers directly.
2021-09-23 19:59:35 -07:00
Christudasan Devadasan 7a62a5b56d [AMDGPU] Legalize initialized LDS variables
We don't allow an initializer for LDS variables
and there is an early abort during instruction
selection. This patch legalizes them by ignoring
the init values. During assembly emission, proper
error reporting already exists for such instances.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109901
2021-09-23 22:53:20 -04:00
Teresa Johnson 2c1defeee4 [ThinLTO] Don't emit original GUID for locals to distributed indexes
In ThinLTO for locals we normally compute the GUID from the name after
prepending the source path to get a unique global id. SamplePGO indirect
call profiles contain the target GUID without this uniquification,
however (unless compiling with -funique-internal-linkage-names).
Therefore, the index contains the original GUID of the local symbols
(without module path prepended to uniquify), in order to correctly
handle the call edges added for these indirect call profile targets
with SamplePGO.

We were emitting these to the combined index when writing it out as
bitcode, which is unnecessary and causes overhead when writing out the
indexes for distributed backends. The only use of the original GUID name
is in the thin link. Suppress it in that case. This reduced the thin
link time for a large distributed build by about 7%, and the aggregate
size of the serialized indexes by over 2%.

Continue to print it when writing out the full index, since that is just
used for debugging and testing.

Update a distributed thinlto index test to contain a local and ensure
that we don't get a COMBINED_ORIGINAL_NAME record.

Differential Revision: https://reviews.llvm.org/D110296
2021-09-23 17:35:47 -07:00
Lang Hames c965fde7c2 [ORC] Shut down services in SimpleRemoteEPCServer.
This should have been included with ExecutorBootstrapService in 78b083dbb7,
but was accidentally left out. It give services a chance to release any
resources that they have acquired.
2021-09-23 16:27:28 -07:00
Craig Topper 40b230f685 [RISCV] Limit transformAddImmMulImm to prevent an infinite loop.
This fixes an issue reported in D108607.
2021-09-23 15:53:11 -07:00
Lang Hames 2ce73f13c9 [ORC] Fix file header. 2021-09-23 15:48:08 -07:00
Vang Thao 1443ba6163 [AMDGPU] Propagate defining src reg for AGPR to AGPR Copys
On targets that do not support AGPR to AGPR copying directly, try to find the
defining accvgpr_write and propagate its source vgpr register to the copies
before register allocation so the source vgpr register does not get clobbered.

The postrapseudos pass also attempt to propagate the defining accvgpr_write but
if the register to propagate is clobbered, it will give up and create new
temporary vgpr registers instead.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108830
2021-09-23 15:17:53 -07:00
Matt Arsenault 2875d3d484 RegAllocGreedy: Remove an unhelpful auto, and don't use a reference 2021-09-23 17:25:25 -04:00
Craig Topper 70f50114f3 [RISCV] Add another isel optimization for (and (shl x, c2), c1)
Turn (and (shl x, c2), c1) -> (slli (srli x, c3-c2), c3) if c1 is a
shifted mask with no leading zeros and c3 trailing zeros where c3
is greater than c2.
2021-09-23 14:18:07 -07:00
Nico Weber 3fa43da7a3 [llvm] Fix a copy-pasto
We should use IMAGE_REL_I386_SECREL in the i386 section of this file.

IMAGE_REL_I386_SECREL and IMAGE_REL_AMD64_SECREL have the same
numeric value 0xB, so this doesn't change behavior.
2021-09-23 15:34:01 -04:00
Stefan Gränitz 767b328e50 [ORC] Minor renaming and typo fixes (NFC)
Two typos, one unsused include and some leftovers from the TargetProcessControl -> ExecutorProcessControl renaming

Reviewed By: xgupta

Differential Revision: https://reviews.llvm.org/D110260
2021-09-23 21:33:34 +02:00
Fangrui Song 0bb767e7db [InlineAdvisor] Use one single quote 2021-09-23 12:16:15 -07:00
Craig Topper 4a69551d66 [RISCV] Add more isel optimizations for (and (shr x, c2), c1).
Turn (and (shr x, c2), c1) -> (slli (srli x, c2+c3), c3) if c1 is a
shifted mask with c2 leading zeros and c3 trailing zeros.

When the leading zeros is C2+32 we can use SRLIW in place of SRLI.
2021-09-23 11:29:04 -07:00
Sanjay Patel 74ba4b769a [x86] move combiner state check into convertIntLogicToFPLogic(); NFC
This function can be adapted to solve bugs like PR51245,
but it could require differentiating the combiner timing
between the existing and new transforms.
2021-09-23 14:28:22 -04:00
Thomas Lively 2f519825ba [WebAssembly] Add prototype relaxed SIMD fma/fms instructions
Add experimental clang builtins, LLVM intrinsics, and backend definitions for
the new {f32x4,f64x2}.{fma,fms} instructions in the relaxed SIMD proposal:
https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md.
Do not allow these instructions to be selected without explicit user opt-in.

Differential Revision: https://reviews.llvm.org/D110295
2021-09-23 11:01:36 -07:00
Louis Dionne e6126faba0 [libc++] Remove unused macro in __config
That macro was being defined but not used anywhere in libc++, so it
must be safe to remove it.

As a fly-by fix, also remove mentions of this macro in other places
in LLVM, to make sure they were not depending on the value defined in
libc++.

Differential Revision: https://reviews.llvm.org/D110289
2021-09-23 13:09:32 -04:00
Jay Foad deb2ca566a Revert "[LiveIntervals] Fix repairOldRegInRange for simple def cases"
This reverts commit 8229cb7412.

It was failing on buildbots with expensive checks enabled.
2021-09-23 17:55:05 +01:00
Nikita Popov 1e3c6fc7cb [JumpThreading] Ignore free instructions
This is basically D108837 but for jump threading. Free instructions
should be ignored for the threading decision. JumpThreading already
skips some free instructions (like pointer bitcasts), but does not
skip various free intrinsics -- in fact, it currently gives them a
fairly large cost of 2.

Differential Revision: https://reviews.llvm.org/D110290
2021-09-23 18:28:36 +02:00
Fangrui Song 1a6e1ee42a Resolve {GlobalValue,GloalIndirectSymol}::getBaseObject confusion
While both GlobalAlias and GlobalIFunc are GlobalIndirectSymbol, their
`getIndirectSymbol()` usage is quite different (GlobalIFunc's resolver
is an entity different from GlobalIFunc itself).

As discussed on https://lists.llvm.org/pipermail/llvm-dev/2020-September/144904.html
("[IR] Modelling of GlobalIFunc"), the name `getBaseObject` is confusing when
used with GlobalIFunc.

To resolve the confusion:

* Move GloalIndirectSymol::getBaseObject to GlobalAlias:: (GlobalIFunc should use `getResolver` instead)
* Change GlobalValue::getBaseObject not to inspect GlobalIFunc. Note: the function has 7 references.
* Add GlobalIFunc::getResolverFunction to peel off potential ConstantExpr indirection
  (`strlen` in `test/LTO/Resolution/X86/ifunc.ll`)

Note: GlobalIFunc::getResolver (like GlobalAlias::getAliasee which does not peel
off ConstantExpr indirection) is kept to be used by ValueEnumerator.

Reviewed By: ibookstein

Differential Revision: https://reviews.llvm.org/D109792
2021-09-23 09:23:35 -07:00
Jay Foad 8229cb7412 [LiveIntervals] Fix repairOldRegInRange for simple def cases
The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange"
was over-zealous. It would bail out when the end of the range to be
repaired was in the middle of the first segment of the live range of
Reg, which was always the case when the range contained a single def of
Reg.

This patch fixes it as suggested by Matthias Braun in post-commit review
on the original patch, and tests it by adding -early-live-intervals to
a selection of existing lit tests that now pass.

(Note that D23303 was originally applied to fix a crash in
SILoadStoreOptimizer, but that is now moot since D23814 updated
SILoadStoreOptimizer to run before scheduling so it no longer has to
update live intervals.)

Differential Revision: https://reviews.llvm.org/D110238
2021-09-23 17:16:14 +01:00
Craig Topper d5c67bba62 [RegAlloc] Cast uint8_t to unsigned before printing it.
raw_ostream interprets uint8_t as wanting to print a character
with that ASCII value. In this case the uint8_t is an integer
that we want to print.
2021-09-23 08:49:44 -07:00
Simon Pilgrim 5f2c53bdf4 Pass some DataLayout arguments by const-ref
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-23 15:50:31 +01:00
Piotr Sobczak 2ac53fffae [AMDGPU] Avoid processing functions in amdgpu-propagate-attributes pass for shaders
The pass amdgpu-propagate-attributes ("Early/Late propagate attributes
from kernels to functions") is currently run also for shaders, where
it does nothing. Modify the check so the pass only processes functions
for kernels.

Differential Revision: https://reviews.llvm.org/D109961
2021-09-23 16:46:56 +02:00
Simon Pilgrim c931d35216 [CostModel][X86] Increase i64 mul cost from 1 to 2
Only the most recent cpus support really 1cy 64-bit multiplies, and the X64 cost table represents a realistic worst case. The 1cy value was also discouraging vectorization when most vXi64 PMULDQ expansions aren't actually slower than scalarization.

Noticed while investigating PR51436.
2021-09-23 14:48:21 +01:00
Sanjay Patel bb9333c350 [InstCombine] fold cast of right-shift if high bits are not demanded (2nd try)
The 1st try at this was reverted because it caused an infinite loop in instcombine.
That should be fixed after:
1cd6b44f26

(masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C

Narrowing the shift should be better for analysis and can lead
to follow-on transforms as shown.

Attempt at a general proof in Alive2:
https://alive2.llvm.org/ce/z/tRnnSF

Here are a couple of the specific tests:
https://alive2.llvm.org/ce/z/bCnTp-
https://alive2.llvm.org/ce/z/TfaHnb

Differential Revision: https://reviews.llvm.org/D110170
2021-09-23 09:41:37 -04:00
Florian Hahn 5ce89279c0
[DSE] Track earliest escape, use for loads in isReadClobber.
At the moment, DSE only considers whether a pointer may be captured at
all in a function. This leads to cases where we fail to remove stores to
local objects because we do not check if they escape before potential
read-clobbers or after.

Doing context-sensitive escape queries in isReadClobber has been removed
a while ago in d1a1cce5b1 to save compile-time. See PR50220 for more
context.

This patch introduces a new capture tracker, which keeps track of the
'earliest' capture. An instruction A is considered earlier than instruction
B, if A dominates B. If 2 escapes do not dominate each other, the
terminator of the common dominator is chosen. If not all uses cannot be
analyzed, the earliest escape is set to the first instruction in the
function entry block.

If the query instruction dominates the earliest escape and is not in a
cycle, then pointer does not escape before the query instruction.

This patch uses this information when checking if a load of a loaded
underlying object may alias a write to a stack object. If the stack
object does not escape before the load, they do not alias.

I will share a follow-up patch to also use the information for call
instructions to fix PR50220.

In terms of compile-time, the impact is low in general,
    NewPM-O3: +0.05%
    NewPM-ReleaseThinLTO: +0.05%
    NewPM-ReleaseLTO-g: +0.03

with the largest change being tramp3d-v4 (+0.30%)
http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Compared to always computing the capture information on demand, we get
the following benefits from the caching:
NewPM-O3: -0.03%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.04%

The biggest speedup is tramp3d-v4 (-0.21%).
http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Overall there is a small, but noticeable benefit from caching. I am not
entirely sure if the speedups warrant the extra complexity of caching.
The way the caching works also means that we might miss a few cases, as
it is less precise. Also, there may be a better way to cache things.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109844
2021-09-23 12:45:05 +01:00
Jim Lin fbacf5ad38 [RISCV] Add missing op type OPERAND_UIMM2, OPERAND_UIMM3 and OPERAND_UIMM7 for verifyInstruction
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110307
2021-09-23 19:30:46 +08:00
Simon Pilgrim 2a5936faf0 [CodeGen] ProcessSDDbgValues - use const-ref value in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-23 12:23:46 +01:00
Simon Pilgrim 5cabe4d9d3 [CodeGen] RegisterCoalescer::buildVRegToDbgValueMap - use const-ref value in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-23 12:23:45 +01:00
Bjorn Pettersson 85a586501b [BasicBlockUtils] Fixup of an assumed typo in MergeBlockIntoPredecessor
The NFC commit e5692a564a changed the logic for
DomTreeUpdates to use the range [succ_begin, succ_begin) when
looking for SuccsOfPredBB rather than using [succ_begin, succ_end).

As the commit was NFC this is identified as a typo (it has been
discussed briefly in phabricator).

The typo was found when inspecting the code, so I've got no idea if
changing back to the old range has any significant impact (such as
solving any PR:s or causing some new problems). But at least this
restores the code to the originally indented behavior.
2021-09-23 13:03:26 +02:00
Fraser Cormack e7c879a69d [RISCV][VP] Add support for VP_REDUCE_* operations
This patch adds codegen support for lowering the vector-predicated
reduction intrinsics to RVV instructions. The process is similar to that
of the other reduction intrinsics, save for the fact that every VP
reduction has a start value. We reuse the existing custom "VL" nodes,
adding extra patterns where required to handle non-true masks.

To support these nodes, the `RISCVISD::VECREDUCE_*_VL` nodes have been
given an explicit "merge" operand. This is to faciliate the VP
reductions, where we must be careful to ensure that even if no operation
is performed (when VL=0) we still produce the start value. The RVV
reductions don't update the destination register under these conditions,
so we tie the splatted start value to the output register.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107657
2021-09-23 11:11:05 +01:00
Alex Richardson 05663dc146 [InstSimplify] Don't lose inbounds when simplifying a GEP
I noticed this while working on a (ptrtoint (gep null, x)) -> x fold.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D110168
2021-09-23 09:25:06 +01:00
Jay Foad 6cef28ed2d [TII] Remove the MFI argument to convertToThreeAddress. NFC.
This simplifies the API and addresses a FIXME in
TwoAddressInstructionPass::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D110229
2021-09-23 08:58:46 +01:00
Bjorn Pettersson c3ae8ecb52 [DAGCombiner] Rename isAlias as mayAlias. NFC
Differential Revision: https://reviews.llvm.org/D110062
2021-09-23 09:54:42 +02:00
Bjorn Pettersson c5e0313e44 [ModuleInlinerWrapperPass] Do some naive printing of wrapped pipeline with -print-pipeline-passes
Bisecting and reducing opt pipelines that includes the
ModuleInlinerWrapperPass has turned out to be a bit problematic.
This is far from perfect (it still lacks information about inline
advisor params etc.), but it should give some kind of hint to what
the wrapped pipeline looks like when using -print-pipeline-passes.

Reviewed By: aeubanks, mtrofin

Differential Revision: https://reviews.llvm.org/D109878
2021-09-23 09:54:42 +02:00
Liu, Chen3 76656ec8ec [X86][FP16] Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A)
This patch is to support transform something like
_mm512_add_ph(acc, _mm512_fmadd_pch(a, b, _mm512_setzero_ph()))
to _mm512_fmadd_pch(a, b, acc).

Differential Revision: https://reviews.llvm.org/D109953
2021-09-23 15:37:08 +08:00
Mikael Holmen e7b169a8ae [AMDGPU] Fix gcc warnings about unused variables [NFC] 2021-09-23 08:08:00 +02:00
Johannes Doerfert c6457dcae8 [OpenMP][FIX] Be more deliberate about invalidating the AAKernelInfo state
This patch fixes a problem when the AAKernelInfo state was invalidated,
e.g., due to `optnone` for a kernel, but not all parts indicated the
invalidation properly. We further eliminate most full state
invalidations as they should never be necessary.

Differential Revision: https://reviews.llvm.org/D109468
2021-09-23 00:04:30 -05:00
Johannes Doerfert 0a16c56010 [OpenMP][NFC] Improve debug output 2021-09-23 00:04:29 -05:00
Usman Nadeem 3b12282b0e [AArch64][SVE][InstCombine] Eliminate redundant chains of tuple get/set
Differential Revision: https://reviews.llvm.org/D109667

Change-Id: I06a3c28e3658ecda109a3a1b73265828274ab2ea
2021-09-22 20:59:46 -07:00
Wang, Pengfei ebec077e07 [X86][FP16] Change the order of the operands in complex FMA intrinsics to allow swap between the mul operands.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D109658
2021-09-23 11:02:48 +08:00
Freddy Ye 13207a21a6 [NFC] Remove redundant setOperationAction.
[FROUND,FROUNDEVEN][f32, f64, f128] are set Expand twice.

Differential Revision: https://reviews.llvm.org/D110302
2021-09-23 10:28:21 +08:00
hyeongyu kim 10a5632550 [NFC][InstCombine] Fix inconsistent comments 2021-09-23 09:31:39 +09:00
Zhi An Ng 1552179ac0 [WebAssembly] Add relaxed-simd feature
This currently only defines a constant, but it the future will be used
to gate builtins for experimenting and prototyping relaxed-simd proposal
(https://github.com/WebAssembly/relaxed-simd/).

Differential Revision: https://reviews.llvm.org/D110111
2021-09-22 14:52:50 -07:00
Craig Topper f0a422f935 [RISCV] Add fcvt.s.w(u)/fcvt.d.w(u)/fcvt.h.w(u) to hasAllNBitUsers
These instructions only read the lower 32 bits of their input.
2021-09-22 14:24:26 -07:00
Shilei Tian 423d34f74a [OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit`
This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110279
2021-09-22 17:16:41 -04:00
Sanjay Patel 1cd6b44f26 [InstCombine] add one-use check to shift-shift transform
We don't want to create extra instructions, and this
could infinite loop with the proposed transform in D110170.
2021-09-22 16:31:12 -04:00
Sanjay Patel a85d7a56c7 [ValueTracking] fix isOnlyUsedInZeroEqualityComparison with no users
This is another problem exposed by:
https://bugs.llvm.org/PR50836
2021-09-22 15:01:53 -04:00
Sanjay Patel b05804ab4c [Analysis] reduce code for isOnlyUsedInZeroEqualityComparison; NFC
There's a bug here noted by the FIXME and visible in variations of PR50836.
2021-09-22 14:57:53 -04:00
David Green c49611f909 Mark CFG as preserved in TypePromotion and InterleaveAccess passes
Neither of these passes modify the CFG, allowing us to preserve DomTree
and LoopInfo across them by using setPreservesCFG.

Differential Revision: https://reviews.llvm.org/D110161
2021-09-22 18:58:00 +01:00
Sanjay Patel c240169ff2 [Analysis] improve function matching for strlen libcall
The return type of strlen is size_t, not just any integer.

This is a partial fix for an example based on:
https://llvm.org/PR50836

There's another bug here because we can still crash
processing a real strlen or something that looks like it.
2021-09-22 13:50:12 -04:00
Daniil Fukalov 1a7b7d7ba2 [NFCI][CodeGen, AArch64] Fix inconsistent TargetCostKind types.
The pass uses different cost kinds to estimate "old" and "interleaved" costs:
default cost kind for all targets override `getInterleavedMemoryOpCost()` is
`TCK_SizeAndLatency`. Although at the moment estimated `TCK_Latency` costs are
equal to `TCK_SizeAndLatency`, (so the change is NFC) it may change in future.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110100
2021-09-22 20:15:17 +03:00
Arthur Eubanks e7249e4acf [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest
When determining whether to fold branches to a common destination by
merging two blocks, SimplifyCFG will count the number of instructions to
be moved into the first basic block. However, there's no reason to count
free instructions like bitcasts and other similar instructions.

This resolves missed branch foldings with -fstrict-vtable-pointers in
llvm-test-suite's lambda benchmark.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D108837
2021-09-22 09:52:37 -07:00
Craig Topper b33a1cc05b [RISCV] Optimize vp.store with an all ones mask to avoid a vmset.
We can use riscv_vse intrinsic instead of riscv_vse_mask. The code here
is based on similar code for handling masked.scatter and vp.scatter.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110206
2021-09-22 09:12:47 -07:00
Shilei Tian b205b3300b [NFC] clang-format -i llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 2021-09-22 12:10:20 -04:00
Hongtao Yu d9b511d8e8 [CSSPGO] Set PseudoProbeInserter as a default pass.
Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110209
2021-09-22 09:09:48 -07:00
Kazu Hirata 3c557cd7f9 [CodeGen] Remove redundant declaration MIRCanonicalizerID (NFC)
Note that MIRCanonicalizerID is declared in
llvm/include/llvm/CodeGen/Passes.h, which MIRCanonicalizerPass.cpp
includes.

Identified with readability-redundant-declaration.
2021-09-22 08:58:27 -07:00
Simon Pilgrim 8a44281f47 [SLP] getReductionCost - use explicit TTI::TCK_RecipThroughput CostKind. NFCI.
Avoid relying on the default cost kinds in TTI calls (we already do this in other places in SLP) - noticed while trying to see how much work it'd be to extend D110242 and remove all remaining uses of default CostKind arguments.
2021-09-22 16:52:22 +01:00
hyeongyu kim 98e96663f6 [InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (3/3)
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineVectorOps.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D110230
2021-09-23 00:48:24 +09:00
Shilei Tian ca999f7191 [OpenMP][Offloading] Use bitset to indicate execution mode instead of value
The execution mode of a kernel is stored in a global variable, whose value means:
- 0 - SPMD mode
- 1 - indicates generic mode
- 2 - SPMD mode execution with generic mode semantics

We are going to add support for SIMD execution mode. It will be come with another
execution mode, such as SIMD-generic mode. As a result, this value-based indicator
is not flexible.

This patch changes to bitset based solution to encode execution mode. Each
position is:
[0] - generic mode
[1] - SPMD mode
[2] - SIMD mode (will be added later)

In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode
execution with generic mode semantics. In the future after we add the support for
SIMD mode, `0b1xx` will be in SIMD mode.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110029
2021-09-22 11:40:52 -04:00
hyeongyu kim ec8311444a [InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (2/3)
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineCompares and InstructionCombining.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D110227
2021-09-23 00:14:50 +09:00
Simon Pilgrim b1f38a27f0 [Target][CodeGen] Remove default CostKind arguments on inner/impl TTI overrides
Based off a discussion on D110100, we should be avoiding default CostKinds whenever possible.

This initial patch removes them from the 'inner' target implementation callbacks - these should only be used by the main TTI calls, so this should guarantee that we don't cause changes in CostKind by missing it in an inner call. This exposed a few missing arguments in getGEPCost and reduction cost calls that I've cleaned up.

Differential Revision: https://reviews.llvm.org/D110242
2021-09-22 15:28:08 +01:00
hyeongyu kim e5aaf03326 [InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (1/3)
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineCasts.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D110226
2021-09-22 23:18:51 +09:00
Joseph Huber 1cf86df883 [OpenMP] Make sure the Thread ID function is not removed
Summary:
The thread ID function was reintroduced in D110195, but could
potentially be removed by the optimizer. Make the function noinline to
preserve the call sites and add it to the externalization RAII so its
definition is not removed by the attributor.
2021-09-22 10:13:18 -04:00
Sander de Smalen 6375ca4059 [AArch64][SVE] Add extract_subvector patterns for unpacked fp16 and bfloat types.
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D110163
2021-09-22 14:25:17 +01:00
Sander de Smalen 3e8d2008f7 [SelectionDAG] Remove PromoteIntOp_EXTRACT_SUBVECTOR.
This code seems untested and is likely obsolete, because this case
should already be handled by the code that legalizes the result type
of EXTRACT_SUBVECTOR.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110061
2021-09-22 14:23:35 +01:00
Tim Northover 3a00e58c2f AArch64: use indivisible cmpxchg for 128-bit atomic loads at O0
Like normal atomicrmw operations, at -O0 the simple register-allocator can
insert spills into the LL/SC loop if it's expanded and visible when regalloc
runs. This can cause the operation to never succeed by repeatedly clearing the
monitor. Instead expand to a cmpxchg, which has a pseudo-instruction for -O0.
2021-09-22 14:20:43 +01:00
Sander de Smalen d5681f1d68 [SelectionDAG] Add PromoteIntOp_INSERT_SUBVECTOR.
This is required to codegen something like:
  <vscale x 8 x i16> @llvm.experimental.vector.insert(<vscale x 8 x i16> %vec,
                                                      <vscale x 2 x i16> %subvec,
                                                      i64 %idx)
where the output vector is legal, but the input vector needs promoting.

It implements this by performing the whole operation on the promoted type,
and then truncating the result.

Reviewed By: david-arm, craig.topper

Differential Revision: https://reviews.llvm.org/D110059
2021-09-22 13:32:36 +01:00
Florian Hahn a7c6471a85
[Passes] Run vector-combine early with -fenable-matrix.
IR with matrix intrinsics is likely to also contain large vector
operations, which can benefit from early simplifications.

This is the last step in a series of changes to improve code-gen for
code using matrix subscript operators with the C/C++ matrix extension in
CLang, like

    using matrix_t = double __attribute__((matrix_type(15, 15)));

    void foo(unsigned i, matrix_t &A, matrix_t &B) {
      for (unsigned j = 0; j < 4; ++j)
        for (unsigned k = 0; k < i; k++)
          B[k][j] -= A[k][j] * B[i][j];
    }

https://clang.godbolt.org/z/6dKxK1Ed7

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D102496
2021-09-22 12:48:32 +01:00
Sanjay Patel c6013f71a4 Revert "[InstCombine] fold cast of right-shift if high bits are not demanded"
This reverts commit 2f6b07316f.

This caused several bots to hit an infinite loop at stage 2,
so it needs to be reverted while figuring out how to fix that.
2021-09-22 07:45:21 -04:00
David Green 02cd8a6b91 [ARM] Allow smaller VMOVL in tail predicated loops
This allows VMOVL in tail predicated loops so long as the the vector
size the VMOVL is extending into is less than or equal to the size of
the VCTP in the tail predicated loop. These cases represent a
sign-extend-inreg (or zero-extend-inreg), which needn't block tail
predication as in https://godbolt.org/z/hdTsEbx8Y.

For this a vecsize has been added to the TSFlag bits of MVE
instructions, which stores the size of the elements that the MVE
instruction operates on. In the case of multiple size (such as a
MVE_VMOVLs8bh that extends from i8 to i16, the largest size was be
chosen). The sizes are encoded as 00 = i8, 01 = i16, 10 = i32 and 11 =
i64, which often (but not always) comes from the instruction encoding
directly. A unit test was added, and although only a subset of the
vecsizes are currently used, the rest should be useful for other cases.

Differential Revision: https://reviews.llvm.org/D109706
2021-09-22 12:07:52 +01:00
Yi Kong d0746f2e9b Don't fold (select C, (gep Ptr, Idx), Ptr) if C is vector but Idx is scalar
The folding rule (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C,
Idx, 0)) creates a malformed SELECT IR if C is a vector while Idx is scalar.

  SELECT VecC, ScalarIdx, 0

We could splat Idx to a vector but it defeats the purpose of
optimisation. Don't apply the folding rule in this case.

This fixes a regression from commit d561b6fbdb.
2021-09-22 18:11:33 +08:00
Florian Mayer 36daf074d9 [hwasan] also omit safe mem[cpy|mov|set].
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D109816
2021-09-22 11:08:27 +01:00
Sander de Smalen 4ca1fbe361 [SelectionDAG] Make WidenVecRes_Convert work for scalable vectors.
Most of the code wasn't yet scalable safe, although most of the
code conceptually just works for scalable vectors. This change
makes the algorithm work on ElementCount, where appropriate,
and leaves the fixed-width only code to use `getFixedNumElements`.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D110058
2021-09-22 10:58:38 +01:00
Florian Hahn 300870a95c
[VectorCombine] Switch to using a worklist.
This patch updates VectorCombine to use a worklist to allow iterative
simplifications where a combine enables other combines.

Suggested in D100302.

The main use case at the moment is foldSingleElementStore and
scalarizeLoadExtract working together to improve scalarization.

Note that we now also do not run SimplifyInstructionsInBlock on the
whole function if there have been changes. This means we fail to
remove/simplify instructions not related to any of the vector combines.
IMO this is fine, as simplifying the whole function seems more like a
workaround for not tracking the changed instructions.

Compile-time impact looks neutral:
NewPM-O3: +0.02%
NewPM-ReleaseThinLTO: -0.00%
NewPM-ReleaseLTO-g: -0.02%

http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D110171
2021-09-22 09:54:58 +01:00
Sander de Smalen ab3607c0ed [AArch64][SVE] Add missing load/store patterns for unpacked bfloat vectors.
Reviewed By: c-rhodes

Differential Revision: https://reviews.llvm.org/D110063
2021-09-22 09:45:33 +01:00
Jay Foad 0205806d0f [AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers
Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac
instruction, so we might as well convert it to the more flexible VOP3-
only mad/fma form.

With this change, the only way we should emit VOP3-encoded mac/fmac is
if regalloc chooses registers that require the VOP3 encoding, e.g. sgprs
for both src0 and src1. In all other cases the mac/fmac should either be
converted to mad/fma or shrunk to VOP2 encoding.

Differential Revision: https://reviews.llvm.org/D110156
2021-09-22 09:36:34 +01:00
Jay Foad 3828ea6181 [AMDGPU] Divergence-driven instruction selection for mul i32
Differential Revision: https://reviews.llvm.org/D109881
2021-09-22 09:36:34 +01:00
Florian Hahn e08a5dc86f
[InstCombine] Move InstCombineWorklist to Utils to allow reuse (NFC).
InstCombine's worklist can be re-used by other passes like
VectorCombine. Move it to llvm/Transform/Utils and rename it to
InstructionWorklist.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D110181
2021-09-22 08:47:21 +01:00
Matt Arsenault ec55dcedce AMDGPU: Refactor getWavesPerEU to separate flat workgroup size query
Add an overload to pass the flat workgroup range in separately. This
will allow the attributor to use the assumed value for
amdgpu-flat-workgroup-sizes when inferring amdgpu-waves-per-eu.
2021-09-21 22:57:17 -04:00
Chen Zheng ffa9fa9ed2 [PowerPC] prepare for udpate form with non-const increment.
This is a follow-up of D105872. Now we are able to prepare for update
form with non-const increment.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D106032
2021-09-22 02:54:28 +00:00
Wenlei He 5f187f0afa [SamplePGO] Add switch to honor zero count on block level as accurate
Add a new LLVM switch `-profile-sample-block-accurate` to trust zero block counts for branches. Currently we leave out such zero counts when annotating branch weight metadata, which would lead to weights being considered as unknown.

Differential Revision: https://reviews.llvm.org/D110117
2021-09-21 17:06:37 -07:00
Usman Nadeem 645b8f5365 [AArch64][SVE] Add patterns to generate ADR instruction
Differential Revision: https://reviews.llvm.org/D109665

Change-Id: I9d2928688b80b804a16f52928e2057749ec2c0b2
2021-09-21 15:50:49 -07:00
Arthur Eubanks e42234383e Make DiagnosticInfoResourceLimit's limit param required
And always print it.

This makes some LLVM diagnostics match up better with Clang's diagnostics.

Updated some AMDGPU uses of DiagnosticInfoResourceLimit and now we print
better diagnostics for those.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D110204
2021-09-21 15:27:58 -07:00
Kirill Stoimenov 2649999579 [asan] Fixed a bug causing a crash when redzone optimization kicked in on X86 with -asan-optimize-callbacks flag on.
This change adds the ASan intrinsic to the list whihc are setting hasCopyImplyingStackAdjustment.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D110012
2021-09-21 22:26:03 +00:00
Craig Topper b81e26c7f4 Recommit "[X86] Clear kill flags when rewriting SETCC uses in flag copy lowering."
This time with the right bug number.

When we rewrite the setcc we replace set old setcc output register
with the new CondReg. But since CondReg can be shared by other
replacements, we don't know if the kill flags for the old register
are valid for CondReg. So be conservative and remove them.

The test case has a SETCCr and a SETCCm on the same condition so
they end up sharing the same CondReg. The SETCCr had one use with
a kill flag. This kill flag isn't valid after the replacement because
CondReg needs a live range extending to the later SETCCm replacment.

Fixes PR51903.
2021-09-21 14:59:25 -07:00
Xu Mingjie 32ab405717 [LTO] Emit DebugLoc for dead function in optimization remarks
Currently, the dead functions information getting from optimizations remarks does not contain debug location, but knowing where these dead functions locate could be useful for debugging or for detecting dead code.

Cause in `LTO::addRegularLTO()` we use `BitcodeModule::getLazyModule()` to read the bitcode module, when we pass Function F to `ore::NV()`, F is not materialized, so `F->getSubprogram()` returns nullptr, and there is no debug location information of dead functions in optimizations remarks.

This patch call `F->materialize()` before we pass Function F to `ore::NV()`, then debug location information will be emitted for dead functions in optimization remarks.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D109737
2021-09-21 14:50:21 -07:00
Craig Topper 51a82e051e Revert "[X86] Clear kill flags when rewriting SETCC uses in flag copy lowering."
This reverts commit 7550f146ff.

I botched the bug number.
2021-09-21 14:33:44 -07:00
Craig Topper 7550f146ff [X86] Clear kill flags when rewriting SETCC uses in flag copy lowering.
When we rewrite the setcc we replace set old setcc output register
with the new CondReg. But since CondReg can be shared by other
replacements, we don't know if the kill flags for the old register
are valid for CondReg. So be conservative and remove them.

The test case has a SETCCr and a SETCCm on the same condition so
they end up sharing the same CondReg. The SETCCr had one use with
a kill flag. This kill flag isn't valid after the replacement because
CondReg needs a live range extending to the later SETCCm replacment.

Fixes PR51908.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110046
2021-09-21 14:29:46 -07:00
George Burgess IV cd5f582c3d MemoryBuiltins: update comment; NFC
This comment references behavior that was removed in
ccae43a247, which is a commit from 5 years
ago. It seems safe to assume that that behavior won't be coming back
soon. If it does, we can readd this part of the comment :)
2021-09-21 13:47:26 -07:00
Sanjay Patel 2f6b07316f [InstCombine] fold cast of right-shift if high bits are not demanded
(masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C

Narrowing the shift should be better for analysis and can lead
to follow-on transforms as shown.

Attempt at a general proof in Alive2:
https://alive2.llvm.org/ce/z/tRnnSF

Here are a couple of the specific tests:
https://alive2.llvm.org/ce/z/bCnTp-
https://alive2.llvm.org/ce/z/TfaHnb

Differential Revision: https://reviews.llvm.org/D110170
2021-09-21 16:09:08 -04:00
Antonio Frighetto 43d6991c2a [IR] Look through bitcast in hasFnAttribute()
A logic incompleteness may lead MemorySSA to be too conservative
in its results. Specifically, when dealing with a call of kind
`call i32 bitcast (i1 (i1)* @test to i32 (i32)*)(i32 %1)`, where
the function `test` is declared with readonly attribute, the
bitcast is not looked through, obscuring function attributes. Hence,
some methods of CallBase (e.g., doesNotReadMemory) could provide
suboptimal results.

Differential Revision: https://reviews.llvm.org/D109888
2021-09-21 21:57:02 +02:00
Nikita Popov e4a1af3724 [MergeICmps] Remove unused NumMerged variable 2021-09-21 21:43:25 +02:00
Nikita Popov f2fa6ad047 [MergeICmps] Don't reorder unmerged comparisons
MergeICmps will currently sort (by offset) all comparisons in a chain,
including those that do not get merged. This is problematic in two ways:

 * We may end up moving the original first block into the middle of
   the chain, in which case the "extra work" instructions will also
   be in the middle of the chain, resulting in invalid IR
   (reported in https://reviews.llvm.org/D108782#3005583).
 * Reordering branches is generally not legal, because it may
   introduce branch on poison, which is UB (PR51845). The merging
   done by MergeICmps is legal as long as we assume that memcmp()
   works on frozen memory, but the reordering of unmerged comparisons
   is definitely incorrect (without inserting freeze instructions),
   so we should avoid it.

There are easier ways to fix the first issue, but I figured it was
worthwhile to do this properly to also fix the second one. What we
now do is to restore the original relative order of (potentially
merged) comparisons.

I took the liberty of dropping the MERGEICMPS_DOT_ON functionality,
because it would be more awkward to implement now (as the before and
after representation is different) and it doesn't seem terribly
useful nowadays.

Differential Revision: https://reviews.llvm.org/D110024
2021-09-21 21:22:12 +02:00
David Blaikie 49c519a848 DebugInfo: Rebuild decltype(nullptr) as 'std::nullptr_t'
Now that Clang's been changed to render nullptr types/template
parameters as 'std::nullptr_t' do the same thing down here.

(Clang commit: 131e878664 )
2021-09-21 11:37:30 -07:00
Michael Liao 2d1ffad010 [IR] Re-group AAMDNodes relevant interfaces. NFC. 2021-09-21 14:29:33 -04:00
alex-t 1a33294652 [AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC
Normally, given that the DA results are kept consistent over the selection DAG, uniform comparisons get selected to S_CMP_* but divergent to V_CMP_*.  Sometimes, for the sake of efficiency,  SSA subgraphs may be converted to VALU to avoid repeatedly copying data back and forth. Hence we have to be able to sustain the correctness passing the i1 from VALU to SALU context and vice versa.

VALU operations only process the active lanes of the VGPR and ignore inactive ones.
Active lanes correspond to 1 bit in the EXEC mask register.
SALU represents i1 as just one bit but VALU as 64bits: 0/1 and 0/(0xffffffffffffffff & EXEC) respectively.
SALU uses one-bit conditional flag SCC but VALU - VCC that is a pair of 32-bit SGPRs

To expose SCC to the VALU context we need to convert the one-bit boolean value to the appropriate 64bit.
To return back to the SALU context we need to do the opposite.

To correctly convert 64bit VALU boolean to either 0 or 1 we need to filter out the bits corresponding to the inactive lanes.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D109900
2021-09-21 21:19:31 +03:00
Owen Anderson b5fbbdd202 Teach InstCombine to eliminate malloc-realloc-free triplets.
Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D109988
2021-09-21 18:07:49 +00:00
Brendon Cahoon cbdf624bb8 [AMDGPU] Correctly merge alias.scope and noalias metadata for memops
When adding alias.scope and noalias metadata to a memcpy function,
the alias.scope and noalias metadata from the operands are merged.
The rule for merging alias.scope is to take the intersection of
the domains and the union of the scopes within those domains.
The rule for merging noalias is to take the intersection.

The bug is that AMDGPULowerModuleLDS was using concatenation for
both alias.scope and noalias. For example, when f1 and f2 are added
to the LDS structure and there is a memcpy(f2, f1, sizeof(f1)).
Then, concatenation creates noalias metadata for the memcpy that
includes both {f1, f2}. That means that the memcpy is assumed
not to alias a prior load of f2, which enables the optimizer to
remove a load of f2 that occurs after mempcy.

The function MDNode::getmostGenericAliasScope defines the semantics
for alias.scope. There is a function, combineMetadata in Local.cpp,
that uses intersect for noalias.

Differential Revision: https://reviews.llvm.org/D110049
2021-09-21 13:02:01 -05:00
Craig Topper 7c975665b4 [RISCV] Make some arrays of constants 'static const'. NFC
This helps the compiler generate better code.
2021-09-21 10:52:47 -07:00
Danila Malyutin 78b51c7a2c [LSR] Make sure that Factor fits into Base type
Fixes pr42770

Differential Revision: https://reviews.llvm.org/D108772
2021-09-21 20:50:50 +03:00
Amy Kwan 2af57b6099 [PowerPC] Add prefix load pattern for fpext to v2f64
This patch adds a prefixed load pattern involving v2f32 fpext v2f64, where we
are dealing with a value with an offset that fits into a 34-bit signed immediate.
A reduced test case is also added to patch that tests the pattern, in which the
pattern is tested in the big endian CHECKs of the newly added test.

Differential Revision: https://reviews.llvm.org/D109887
2021-09-21 12:45:24 -05:00
Ayal Zaks ab6a69dfea [LV] Fix crash for reverse interleaved loads with gap under fold-tail.
This patch fixes the crash found by PR51614:
whenever doing tail folding, interleave groups must be considered under mask.

Another fix D108900 follows for targets that support masked loads and stores:
when *deciding* to vectorize with masked interleave groups, check if the access
is reverse - which is currently not supported; rather than (only) asserting when
computing cost and generating code.

Differential Revision: https://reviews.llvm.org/D108891
2021-09-21 20:13:32 +03:00
Craig Topper aeb63d464f [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for and/or/xor.
This requires a minor change to CodeGenPrepare to ensure that
shouldSinkOperands will be called for And.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110106
2021-09-21 10:07:29 -07:00
Dávid Bolvanský c0fdfc9af2 [InstCombine] powi(x, y) * powi(x, z) -> powi(x, y + z)
We already have pow(x, y) * pow(x, z) -> pow(x, y + z) transformation, but we are missing same transformation for powi (power is integer).

Requires reassoc.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D109954
2021-09-21 18:20:46 +02:00
Florian Hahn 5131037ea9
[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange.
isValidAssumeForContext can provide better results with access to the
dominator tree in some cases. This patch adjusts computeConstantRange to
allow passing through a dominator tree.

The use VectorCombine is updated to pass through the DT to enable
additional scalarization.

Note that similar APIs like computeKnownBits already accept optional dominator
tree arguments.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D110175
2021-09-21 16:54:47 +01:00
Michael Liao 5fb3ae525f [SelectionDAG] Re-calculate scoped AA metadata when merging stores.
Reviewed By: jeroen.dobbelaere

Differential Revision: https://reviews.llvm.org/D102821
2021-09-21 11:41:17 -04:00
Aleksandr Bezzubikov 624e4d087e [GlobalISel] Support ConstantAsMetadata in IRTranslator
When using instructions which have a MetadataAsValue argument
(e.g. some target-specific intrinsics) MD canonicalization strips
internal MDNodes with a single ConstantAsMetadata child. That
prevented IRTranslator from the proper translation of such a calls.
2021-09-21 11:24:56 -04:00
Dmitry Preobrazhensky 3500e7d2b0 [AMDGPU][MC][GFX7][GFX10] Corrected image_atomic_fcmpswap
Differential Revision: https://reviews.llvm.org/D109616
2021-09-21 18:06:02 +03:00
Ben Shi b3052013b4 [RISCV] Optimize (add (mul x, c0), c1)
Optimize (add (mul x, c0), c1) -> (ADDI (MUL (ADDI, c1/c0), c0), c1%c0),
if c1/c0 and c1%c0 are simm12, while c1 is not.

Optimize (add (mul x, c0), c1) -> (MUL (ADDI, c1/c0), c0),
if c1%c0 is zero, and c1/c0 is simm12 while c1 is not.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108607
2021-09-21 14:13:14 +00:00
Anna Thomas 69921f6f45 [InstCombine] Improve TryToSinkInstruction with multiple uses
This patch allows sinking an instruction which can have multiple uses in a
single user. We were previously over-restrictive by looking for exactly one use,
rather than one user.

Also added an API for retrieving a unique undroppable user.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109700
2021-09-21 10:04:04 -04:00
Dmitry Preobrazhensky b8e7f53208 [AMDGPU][MC][GFX10] Enabled dlc for FLAT and GLOBAL atomics
Differential Revision: https://reviews.llvm.org/D109614
2021-09-21 16:23:20 +03:00
hyeongyu kim 043733d677 [IR] Add the constructor of ShuffleVector for one-input-vector.
One of the two inputs of the Shufflevector is often a placeholder.
Previously, there were cases where the placeholder was undef, and there were cases where it was poison.
I added these constructors to create a placeholder consistently.

Changing to use the newly added constructor will be written in a separate patch.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D110146
2021-09-21 22:06:07 +09:00
Jonas Paulsson a48b43f981 [SystemZ] Emit EXRL target instructions before text section is ended.
SystemZ adds the EXRL target instructions in the end of each file. This must
be done before debug info emission since that may end the text section, and
therefore this is now done in emitConstantPools() (instead of in
emitEndOfAsmFile).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D109513
2021-09-21 14:32:28 +02:00
Nicholas Guy 9e4d72675f [AArch64] Improve schedule modelling on the Cortex-A55
Enables the FuseAddress feature in the Cortex-A55 scheduling model

Differential Revision: https://reviews.llvm.org/D109323
2021-09-21 13:03:34 +01:00
Simon Pilgrim fc8f1e4419 [InstCombine] foldConstantInsEltIntoShuffle - bail if we fail to find constant element (PR51824)
If getAggregateElement() returns null for any element, early out as otherwise we will assert when creating a new constant vector

Fixes PR51824 + ; OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38057
2021-09-21 13:01:09 +01:00
Simon Pilgrim 20b58855e0 [CodeGen] SelectionDAGBuilder - Use const-ref iterator in for-range loops. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-21 13:01:08 +01:00
Simon Pilgrim f5d23d36de RewriteStatepointsForGC - Use const-ref iterator in for-range loops. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-21 13:01:08 +01:00
Simon Pilgrim 0f83456cf5 [CodeGen] SDDbgValue::getSDNodes() - use const-ref to avoid unnecessary copies. NFCI.
Reported by MSVC static analyzer.
2021-09-21 13:01:08 +01:00
Jay Foad 598bebeaa6 [AMDGPU] Prefer fmac over fma when selecting FMA_W_CHAIN
FMA_W_CHAIN is used when lowering fdiv f32. Prefer to select it to fmac
if there are no source modifiers, just like we do for other mad/mac and
fma/fmac cases.

Differential Revision: https://reviews.llvm.org/D110074
2021-09-21 11:57:45 +01:00
Jay Foad 86dcb59206 [AMDGPU] Prefer v_fmac over v_fma only when no source modifiers are used
v_fmac with source modifiers forces VOP3 encoding, but it is strictly
better to use the VOP3-only v_fma instead, because $dst and $src2 are
not tied so it gives the register allocator more freedom and avoids a
copy in some cases.

This is the same strategy we already use for v_mad vs v_mac and
v_fma_legacy vs v_fmac_legacy.

Differential Revision: https://reviews.llvm.org/D110070
2021-09-21 11:57:45 +01:00
Max Kazantsev cd166fb2ef [SCEV] Use isAvailableAtLoopEntry in the asserts
This is what is supposed to be there.
2021-09-21 17:11:15 +07:00
Petar Avramovic 8bc7185668 GlobalISel/Utils: Refactor constant splat match functions
Add generic helper function that matches constant splat. It has option to
match constant splat with undef (some elements can be undef but not all).
Add util function and matcher for G_FCONSTANT splat.

Differential Revision: https://reviews.llvm.org/D104410
2021-09-21 12:09:35 +02:00
Max Kazantsev 4d5d725428 [SCEV] Add some asserts on availability of arguments of isLoopEntryGuardedByCond
The logic in howManyLessThans is fishy. It first checks invariance of
RHS, and then uses OrigRHS as argument for isLoopEntryGuardedByCond, which
is, strictly saying, a different thing. We are seeing a very rare intermittent
failure of availability checks, and it looks like this precondition is
sometimes broken. Before we can figure out what's going on, adding asserts
that all involved values that may possibly to to isLoopEntryGuardedByCond
are available at loop entry.

If either of these asserts fails (OrigRHS is the most likely suspect), it
means that the logic here is flawed.
2021-09-21 17:08:52 +07:00
David Stenberg 7b4cc09b14 [LowerConstantIntrinsics] Fix heap-use-after-free bug in worklist
This fixes PR51730, a heap-use-after-free bug in
replaceConditionalBranchesOnConstant().

With the attached reproducer we were left with a function looking
something like this after replaceAndRecursivelySimplify():

  [...]

  cont2.i:
    br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i

  handler.type_mismatch3.i:
    %3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ]
    unreachable

  cont4.i:
    unreachable

  [...]

with both the branch instruction and PHI node being in the worklist. As
a result of replacing the branch instruction with an unconditional
branch, the PHI node in %handler.type_mismatch3.i would be removed. This
then resulted in a heap-use-after-free bug due to accessing that removed
PHI node in the next worklist iteration.

This is solved by using a value handle worklist. I am a unsure if this
is the most idiomatic solution. Another solution could have been to
produce a worklist just containing the interesting branch instructions,
but I thought that it perhaps was a bit cleaner to keep all worklist
filtering in the loop that does the rewrites.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D109221
2021-09-21 11:33:07 +02:00
Cullen Rhodes b23d22f7d5 [PowerPC] NFC: Remove unused tblgen template args
Identified in D109359.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D109715
2021-09-21 08:24:16 +00:00
Evgeniy Brevnov 129cf33604 [DSE][NFC] Rename Later->Killing, Earlier->Dead
First (and biggest) change is to use "Killing/Dead" in place of "Later/Earlier" base for names in DSE. For example, [Maybe]DeadLoc - is a location killed by KillingI instruction. I believe such names are more descriptive and easy to understand than current ones.

Second, there are inconsistencies in naming where different names are used for the same thing. Fixed that too.

Third, reordered parameters of isPartialOverwrite, tryToMergePartialOverlappingStores, isOverwrite to make them consistent between each other. This greatly reduces potential mistakes.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D106947
2021-09-21 13:44:12 +07:00
Amara Emerson 7091a7f781 [GlobalISel][Legalizer] Don't use eraseFromParentAndMarkDBGValuesForRemoval() for some artifacts.
For artifacts excluding G_TRUNC/G_SEXT, which have IR counterparts, we don't
seem to have debug users of defs. However, in the legalizer we're always calling
MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() which is expensive.
In some rare cases, this contributes significantly to unreasonably long compile
times when we have lots of artifact combiner activity.

To verify this, I added asserts to that function when it actually replaced a debug
use operand with undef for these artifacts. On CTMark with both -O0 and -Os and
debug info enabled, I didn't see a single case where it triggered.

In my measurements I saw around a 0.5% geomean compile-time improvement on -g -O0
for AArch64 with this change.

Differential Revision: https://reviews.llvm.org/D109750
2021-09-20 23:34:42 -07:00
Max Kazantsev 2c7d5fbc9e [SCEV] Generalize implication when signedness of FoundPred doesn't matter
The implication logic for two values that are both negative or non-negative
says that it doesn't matter whether their predicate is signed and unsigned,
but only flips unsigned into signed for further inference. This patch adds
support for flipping a signed predicate into unsigned as well.

Differential Revision: https://reviews.llvm.org/D109959
Reviewed By: nikic
2021-09-21 11:17:56 +07:00
Yonghong Song ea72b0319d BPF: make 32bit register spill with 64bit alignment
In llvm, for non-alu32 mode, the stack alignment is 64bit so only one
64bit spill per 64bit slot. For alu32 mode, the stack alignment
is 32bit, so it is possible to have two 32bit spills per
64bit slot.

Currently, bpf kernel verifier does not preserve register states
for 32bit spills. That is, one 32bit register may hold a constant
value or a bounded range before spill. After reload from the
stack, the information is lost and sometimes this may cause
verifier failure. For 64bit register spill, the verifier
indeed tries to preserve the register state for reloading.

The current verifier can be modestly changed to handle one
32bit spill per 64bit stack slot with state-preserving reload.
Handling two 32bit spills per 64bit stack slot will require
substantial changes.

This patch changes stack alignment for alu32 to be 64bit.
This way, for any 64bit slot in alu32 mode, only one
32bit or 64bit register values can be saved. Together
with previous-mentioned verifier enhancement, 32bit
spill can be handled with state preserving.

Note that llvm stack slot coallescing
seems only doing adjacent packing which may leave some holes
in the stack. For example,
   stack slot 8   <== 8 bytes
   stack slot 4   <== 8 bytes with 4 byte hole
   stack slot 8   <== 8 bytes
   stack slot 4   <== 4 bytes

Differential Revision: https://reviews.llvm.org/D109073
2021-09-20 21:00:25 -07:00
Max Kazantsev 073b254cff [SimplifyCFG] Redirect switch cases that lead to UB into an unreachable block
When following a case of a switch instruction is guaranteed to lead to
UB, we can safely break these edges and redirect those cases into a newly
created unreachable block. As result, CFG will become simpler and we can
remove some of Phi inputs to make further analyzes easier.

Patch by Dmitry Bakunevich!

Differential Revision: https://reviews.llvm.org/D109428
Reviewed By: lebedev.ri
2021-09-21 10:45:19 +07:00
Max Kazantsev a06db78fd9 [NFC] Rename Context->CtxI in SCEV for uniformity reasons 2021-09-21 10:12:20 +07:00
Kazu Hirata 85b4b21c8b [llvm] Use make_early_inc_range (NFC) 2021-09-20 19:30:02 -07:00
Usman Nadeem f417d9d821 [InstCombine] Eliminate vector reverse if all inputs/outputs to an instruction are reverses
Differential Revision: https://reviews.llvm.org/D109808

Change-Id: I1a10d2bc33acbe0ea353c6cb3d077851391fe73e
2021-09-20 18:32:24 -07:00
Amara Emerson 4ceea77409 [X86] Rename the X86WinAllocaExpander pass and related symbols to "DynAlloca". NFC.
For x86 Darwin, we have a stack checking feature which re-uses some of this
machinery around stack probing on Windows. Renaming this to be more appropriate
for a generic feature.

Differential Revision: https://reviews.llvm.org/D109993
2021-09-20 16:19:28 -07:00
Jacob Lambert dc6e8dfdfe [AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor. 2021-09-20 14:48:50 -07:00
Amara Emerson f9d69a0ab0 [GlobalISel] Implement support for the "trap-func-name" attribute.
This attribute calls a function instead of emitting a trap instruction.

Differential Revision: https://reviews.llvm.org/D110098
2021-09-20 14:32:01 -07:00
Florian Mayer 16b5f4502c [NFC] [hwasan] Separate outline and inline instrumentation.
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D110067
2021-09-20 21:49:09 +01:00
Craig Topper a95ba81073 [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for FMA.
If either of the multiplicands is a splat, we can sink it to use
vfmacc.vf or similar.
2021-09-20 11:49:50 -07:00
Nikita Popov dd0226561e [IR] Add helper to convert offset to GEP indices
We implement logic to convert a byte offset into a sequence of GEP
indices for that offset in a number of places. This patch adds a
DataLayout::getGEPIndicesForOffset() method, which implements the
core logic. I've updated SROA, ConstantFolding and InstCombine to
use it, and there's a few more places where it looks relevant.

Differential Revision: https://reviews.llvm.org/D110043
2021-09-20 20:18:16 +02:00
Geoffrey Martin-Noble 01b097afd0
Fix bad merge the removed switch case
When https://reviews.llvm.org/D109520 was landed, it reverted the addition of this switch
case added in https://reviews.llvm.org/D109293. This caused `-Wswitch` failures (and
presumably broke the functionality added in the latter patch).
2021-09-20 10:58:58 -07:00
Craig Topper 04ab6c85ef [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for FAdd/FSub/FMul/FDiv. 2021-09-20 10:25:46 -07:00
Nikita Popov ecd52a5be9 [Verifier] Try to fix MSVC build
Some buildbots fail with:

> C:\a\llvm-clang-x86_64-expensive-checks-win\llvm-project\llvm\lib\IR\Verifier.cpp(4352): error C2678: binary '==': no operator found which takes a left-hand operand of type 'const llvm::MDOperand' (or there is no acceptable conversion)

Possibly the explicit MDOperand to Metadata* conversion will help?
2021-09-20 18:47:25 +02:00
Kazu Hirata f3cfec9c9e [MCA] Fix a warning
This patch fixes the warning

  InstructionTables.cpp:27:56: error: loop variable 'Resource' of type
  'const std::pair<const uint64_t, ResourceUsage> &' (aka 'const
  pair<const unsigned long, llvm::mca::ResourceUsage> &') binds to a
  temporary constructed from type 'const std::pair<unsigned long,
  llvm::mca::ResourceUsage> &' [-Werror,-Wrange-loop-construct]

Note that Resource is declared as:

   SmallVector<std::pair<uint64_t, ResourceUsage>, 4> Resources;

without "const" for uint64_t.
2021-09-20 09:46:38 -07:00
Craig Topper d85e347a28 [RISCV] Add a pass to recognize VLS strided loads/store from gather/scatter.
For strided accesses the loop vectorizer seems to prefer creating a
vector induction variable with a start value of the form
<i32 0, i32 1, i32 2, ...>. This value will be incremented each
loop iteration by a splat constant equal to the length of the vector.
Within the loop, arithmetic using splat values will be done on this
vector induction variable to produce indices for a vector GEP.

This pass attempts to dig through the arithmetic back to the phi
to create a new scalar induction variable and a stride. We push
all of the arithmetic out of the loop by folding it into the start,
step, and stride values. Then we create a scalar GEP to use as the
base pointer for a strided load or store using the computed stride.
Loop strength reduce will run after this pass and can do some
cleanups to the scalar GEP and induction variable.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107790
2021-09-20 09:39:44 -07:00
Nikita Popov 8700f2bd36 [Verifier] Verify scoped noalias metadata
Verify that !noalias, !alias.scope and llvm.experimental.noalias.scope
arguments have the format specified in
https://llvm.org/docs/LangRef.html#noalias-and-alias-scope-metadata.
I've fixed up a lot of broken metadata used by tests in advance.
Especially using a scope instead of the expected scope list is a
commonly made mistake.

Differential Revision: https://reviews.llvm.org/D110026
2021-09-20 18:27:28 +02:00
Alexey Bataev bc69dd62c0 [SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020
2021-09-20 08:42:19 -07:00
David Sherwood f988f68064 [Analysis] Add support for vscale in computeKnownBitsFromOperator
In ValueTracking.cpp we use a function called
computeKnownBitsFromOperator to determine the known bits of a value.
For the vscale intrinsic if the function contains the vscale_range
attribute we can use the maximum and minimum values of vscale to
determine some known zero and one bits. This should help to improve
code quality by allowing certain optimisations to take place.

Tests added here:

  Transforms/InstCombine/icmp-vscale.ll

Differential Revision: https://reviews.llvm.org/D109883
2021-09-20 15:01:59 +01:00
Stefan Gränitz e8d81d80f6 [JITLink] Adopt forEachRelocation() helper in ELF RISCV backend (NFC)
Following D109516, this patch re-uses the new helper function for ELF relocation traversal in the RISCV backend.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D109522
2021-09-20 15:46:41 +02:00
Stefan Gränitz 68914dc990 [JITLink] Adopt forEachRelocation() helper in ELF x86-64 backend (NFC)
Following D109516, this patch re-uses the new helper function for ELF relocation traversal in the x86-64 backend.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D109520
2021-09-20 15:46:41 +02:00
David Green 3f90df22f1 [ARM] MVE reverse shuffles.
The vectorizer can sometimes make reverse shuffles from indices that
count down. In MVE, we don't have a 128bit rev instruction, but we can
select this to a VREV64 with some lane movs to swap the two halfs.

Ideally this would use VMOVD's, but only gets as far as VMOVS's at the
moment.

Differential Revision: https://reviews.llvm.org/D69510
2021-09-20 13:48:01 +01:00
Simon Pilgrim 7fc12b822c MachOObjectFile - checkOverlappingElement - use const-ref to avoid unnecessary copies. NFCI.
Reported by MSVC static analyzer.
2021-09-20 12:53:18 +01:00
Simon Pilgrim 4ab7c0d3fa [X86] X86TargetTransformInfo - remove unnecessary if-else after early exit. NFCI.
(style) Break the if-else chain as they all return.
2021-09-20 12:53:17 +01:00
Simon Pilgrim ea17b15f2d [MCA] InstructionTables::execute() - use const-ref iterator in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-20 12:53:17 +01:00
Petar Avramovic e4c46ddd91 [GlobalISel] Improve elimination of dead instructions in legalizer
Add eraseInstr(s) utility functions. Before deleting an instruction
collects its use instructions. After deletion deletes use instructions
that became trivially dead.
This patch clears all dead instructions in existing legalizer mir tests.

Differential Revision: https://reviews.llvm.org/D109154
2021-09-20 13:00:58 +02:00
Bjorn Pettersson c8cb7f611f [NewPM] Make InlinerPass (aka 'inline') a parameterized pass
In default pipelines the ModuleInlinerWrapperPass is adding the
InlinerPass to the pipeline twice, once due to MandatoryFirst (passing
true in the ctor) and then a second time with false as argument.

To make it possible to bisect and reduce opt test cases for this
part of the pipeline we need to be able to choose between the two
different variants of the InlinerPass when running opt. This patch is
changing 'inline' to a CGSCC_PASS_WITH_PARAMS in the PassRegistry,
making it possible run opt with both -passes=cgscc(inline) and
-passes=cgscc(inline<only-mandatory>).

Reviewed By: aeubanks, mtrofin

Differential Revision: https://reviews.llvm.org/D109877
2021-09-20 12:52:52 +02:00
Tim Northover 13aa102e07 AArch64: use ldp/stp for 128-bit atomic load/store in v.84 onwards
v8.4 says that normal loads/stores of 128-bytes are single-copy atomic if
they're properly aligned (which all LLVM atomics are) so we no longer need to
do a full RMW operation to guarantee we got a clean read.
2021-09-20 09:50:11 +01:00
David Spickett 92c9b28347 Revert "[AArch64][SVE] Teach cost model that masked loads/stores are cheap"
This reverts commit 734708e04f.

Due to build failures on the 2 stage SVE VLS bot.
https://lab.llvm.org/buildbot/#/builders/176/builds/908/steps/11/logs/stdio
2021-09-20 08:45:18 +00:00
Florian Hahn 7f6a4826ac
[CaptureTracking] Allow passing LI to PointerMayBeCapturedBefore (NFC).
isPotentiallyReachable can use LoopInfo to return earlier. This patch
allows passing an optional LI to PointerMayBeCapturedBefore. Used in
D109844.

Reviewed By: nikic, asbirlea

Differential Revision: https://reviews.llvm.org/D109978
2021-09-20 09:07:34 +01:00
Max Kazantsev e9d34c5429 [NFC] Add assert and test showing that revert of D109596 wasn't justified
All transforms of IndVars have prerequisite requirement of LCSSA and LoopSimplify
form and rely on it. Added test that shows that this actually stands.
2021-09-20 12:01:12 +07:00
Max Kazantsev 471217cff8 Revert "Revert "[IndVars] Replace PHIs if loop exits on 1st iteration""
This reverts commit 6fec6552f5.

The patch was reverted on incorrect claim that this patch may break LCSSA form
when the loop is not in a simplify form. All IndVars' transform insure that
the loop is in simplify and LCSSA form, so if it wasn't broken before this
transform, it will also not be broken after it.
2021-09-20 12:01:10 +07:00
Max Kazantsev def15c5fb6 [SCEV] Support negative values in signed/unsigned predicate reasoning
There is a piece of logic that uses the fact that signed and unsigned
versions of the same predicate are equivalent when both values are
non-negative. It's also true when both of them are negative.

Differential Revision: https://reviews.llvm.org/D109957
Reviewed By: nikic
2021-09-20 11:26:33 +07:00
David Blaikie cb42bb3550 llvm-dwarfdump: pretty type printing: print fully qualified names in function type parameter types 2021-09-19 18:49:15 -07:00
David Blaikie 606ea0dd2a llvm-dwarfdump: support for type printing "decltype(nullptr)" as "nullptr_t"
This should probably be rendered as "std::nullptr_t" but for now clang
uses the unqualified name (which is ambiguous with possible user defined
name in the global namespace), so match that here.
2021-09-19 17:33:56 -07:00
David Blaikie 11e0b79b05 llvm-dwarfdump: Don't print even an empty string when a type is unprintable 2021-09-19 17:03:10 -07:00
David Blaikie 5bfe5207ef llvm-dwarfdump: Pretty print names qualified/with scopes 2021-09-19 16:36:01 -07:00
Simon Pilgrim 0e89ff8195 [X86] SimplifyDemandedBits - only narrow a broadcast source if we only have one use.
Helps with the regression noted on D109065 - don't truncate a broadcast source if the source has multiple uses.
2021-09-19 22:53:30 +01:00
Kazu Hirata 84b07c9b3a [llvm] Use pop_back_val (NFC) 2021-09-19 13:44:23 -07:00
Chris Jackson 5ba8020326 [DebugInfo][LSR] Emit shorter expressions from scev-based salvaging
The scev-based salvaging for LSR can sometimes produce unnecessarily
verbose expressions. This patch adds logic to detect when the value to
be recovered and the induction variable differ by only a constant
offset. Then, the expression to derive the current iteration count can
be omitted from the dbg.value in favour of the offset.

Reviewed by: aprantl

Differential Revision: https://reviews.llvm.org/D109044
2021-09-19 21:41:44 +01:00
David Blaikie 372e2c24b6 llvm-dwarfdump: Pretty printing types including a space between const and parenthesized references/pointers to arrays 2021-09-19 13:32:53 -07:00
David Blaikie a51fb58c55 DWARFDie.cpp: Minor follow-up clang-format 2021-09-19 13:06:18 -07:00
David Blaikie f09ca5c646 DWARFDie: Improve type printing for function and array types - with qualifiers (cv/reference) and pointers to them 2021-09-19 12:59:31 -07:00
Simon Pilgrim f855ef2601 [X86][Atom] Fix FP uops + port usage
Both ports are required in most cases. Update the uops counts + port usage based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner / InstLatX64 reports as well.

Noticed while trying to improve fp costs for vectorization via the D103695 helper script.
2021-09-19 20:39:20 +01:00
Simon Pilgrim b7342e3137 [X86] Fold SHUFPS(shuffle(x),shuffle(y),mask) -> SHUFPS(x,y,mask')
We can combine unary shuffles into either of SHUFPS's inputs and adjust the shuffle mask accordingly.

Unlike general shuffle combining, we can be more aggressive and handle multiuse cases as we're not going to accidentally create additional shuffles.
2021-09-19 20:39:19 +01:00
Simon Pilgrim cf8fac7d07 [X86][Atom] Specific uops for all IMUL/IDIV instructions
Based off a mixture of llvm-exegesis captures (PR36895) and Intel AoM / Agner / InstLatX64 reports.
2021-09-19 16:58:52 +01:00
Roman Lebedev 5f2fe48d06
[X86][TLI] SimplifyDemandedVectorEltsForTargetNode(): don't break apart broadcasts from which not just the 0'th elt is demanded
Apparently this has no test coverage before D108382,
but D108382 itself shows a few regressions that this fixes.

It doesn't seem worthwhile breaking apart broadcasts,
assuming we want the broadcasted value to be preset in several elements,
not just the 0'th one.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D108411
2021-09-19 17:38:32 +03:00
Roman Lebedev 07f1d8f0ca
[X86] lowerShuffleAsDecomposedShuffleMerge(): if both inputs are broadcastable/identities, canonicalize broadcasts as such
Split off from D108253.
Broadcast is simpler than any other shuffle we might produce
to do what we want to do here, so prefer it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D108382
2021-09-19 17:35:37 +03:00
Roman Lebedev 0852313e47
[NFC] combineX86ShufflesRecursively(): actually address nits for previous patch 2021-09-19 17:29:18 +03:00
Roman Lebedev 1e72ca94e5
[X86] combineX86ShufflesRecursively(): call SimplifyMultipleUseDemandedVectorElts() on after finishing recursing
This was suggested in https://reviews.llvm.org/D108382#inline-1039018,
and it avoids regressions in that patch.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D109065
2021-09-19 17:24:58 +03:00
David Green 1da52ef294 [ARM] Add VGETLANEu patterns for v4f16 and v8f16
These were apparently missing, having no pattern that could convert a
VGETLANEu of a v4f16 to an i32. Added bf16 whilst here, following the
same code.
2021-09-19 14:25:21 +01:00
Simon Pilgrim e381d8b243 [X86][Atom] Fix (U)COMISS/SD uops, latency and throughput
Both ports are required, for reg and mem variants - we can also use the WriteFComX class directly and remove the unnecessary InstRW overrides. Matches what Intel AoM / Agner / InstLatX64 report as well.
2021-09-19 12:44:44 +01:00
Ben Shi dee5a8ca32 [RISCV] Optimize (add (shl x, c0), (shl y, c1)) with SH*ADD
Optimize (add (shl x, c0), (shl y, c1)) ->
         (SLLI (SH*ADD x, y), c1), if c0-c1 == 1/2/3.

Reviewed By: craig.topper, luismarques

Differential Revision: https://reviews.llvm.org/D108916
2021-09-19 16:35:12 +08:00
David Blaikie ae0873483d DWARFDie:DWARFTypePrinter: Add common utility function for checking where parentheses are required 2021-09-18 22:54:57 -07:00
David Blaikie d2373c04a7 DWARFDie.cpp: Reduce indentation with early continue 2021-09-18 22:22:25 -07:00
Roman Lebedev 6a2c2263fb
[X86] Improve i8 all-ones element insertion in pre-SSE4.1
Should avoid some regressions in D109065

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D109989
2021-09-18 22:24:06 +03:00
Alexandre Ganea 7b25fa8c7a [Support] Attempt to fix deadlock in ThreadGroup
This is an attempt to fix the situation described by https://reviews.llvm.org/D104207#2826290 and PR41508.
See sequence of operations leading to the bug in https://reviews.llvm.org/D104207#3004689

We ensure that the Latch is completely "free" before decrementing the number of TaskGroupInstances.

Differential revision: https://reviews.llvm.org/D109914
2021-09-18 13:49:10 -04:00
David Green cb5e3f7959 [ARM] Prevent large integer VQDMULH pattern crashes
Put a limit on the size of constant integers we test when looking for
VQDMULH, to prevent it from crashing from values more than 64bits.
2021-09-18 18:47:02 +01:00
Kazu Hirata 48719e3b18 [CodeGen] Use make_early_inc_range (NFC) 2021-09-18 09:29:24 -07:00
Joseph Huber c30d7730eb [OpenMP] Change debugging symbol to weak_odr linkage
The new device runtime uses an internal variable to set debugging. This
variable was originally privately linked because every module will have
a copy of it. This caused problems with merging the device bitcode
library because it would get renamed and there was not a way to refer to
an external, private symbol. This changes the symbol to weak_odr so it
can be defined multiply, but will not be renamed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109997
2021-09-17 21:25:24 -04:00
Joseph Huber 27905eeb89 [Attributor] Change AAExecutionDomain to check intrinsic edges
The AAExecutionDomain instance checks if a BB is executed by the main
thread only. Currently, this only checks the `__kmpc_kernel_init` call
for generic regions to indicate the path taken by the main thread. In
the new runtime, we want to be able to detect basic blocks even in SPMD
mode. For this we enable it to check thread-ID intrinsics being compared
to zero as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109849
2021-09-17 19:51:38 -04:00
Usman Nadeem 757384abff [AArch64][SVE][InstCombine] Fold redundant zip1/2(uzp1/2) operations
zip1(uzp1(A, B), uzp2(A, B)) --> A
    zip2(uzp1(A, B), uzp2(A, B)) --> B

Differential Revision: https://reviews.llvm.org/D109666

Change-Id: I4a6578db2fcef9ff71ad0e77b9fe08354e6dbfcd
2021-09-17 15:24:46 -07:00
Arthur Eubanks 0db9481208 [NFC] Remove FIXMEs about calling LLVMContext::yield()
Nobody has complained about this, and the documentation for
LLVMContext::yield() states that LLVM is allowed to never call it.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D110008
2021-09-17 14:59:34 -07:00
Andrew Browne c533b88a6d [DFSan] Add force_zero_label abilist option to DFSan. This can be used as a work-around for overtainting.
Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D109847
2021-09-17 12:57:40 -07:00
Hongtao Yu c5fafc1e73 [CSSPGO] Tweakes to lower pseudo probe runtime overhead
A couple tweaks to

1. allow more thinlto importing by excluding probe intrinsics from IR size in module summary

2. Allow general default attributes (nofree nosync nounwind) for pseudo probe intrinsic. Without those attributes, pseudo probes will be basically treated as unknown calls which will in turn block their containing functions from annotated with those attributes.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D109976
2021-09-17 12:28:09 -07:00
Kazu Hirata e2febc2ed4 [llvm] Use drop_begin (NFC) 2021-09-17 09:16:40 -07:00
Roman Lebedev 358df06f4e
[X86] Improve `matchBinaryShuffle()`'s `BLEND` lowering with per-element all-zero/all-ones knowledge
We can use `OR` instead of `BLEND` if either the element we are not picking is zero (or masked away);
or the element we are picking overwhelms (e.g. it's all-ones) whatever the element we are not picking:
https://alive2.llvm.org/ce/z/RKejao

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D109726
2021-09-17 19:13:33 +03:00
Sanjay Patel 41ff7612b3 [InstCombine] allow splat vectors for narrowing masked fold
Mostly cosmetic diffs, but the use of m_APInt matches splat constants.
2021-09-17 11:24:16 -04:00
Simon Pilgrim 72e5786281 [DebugInfo] DWARF - Use const-ref iterator in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-17 14:04:54 +01:00
Simon Pilgrim 4af7643470 [CodeGen] LiveDebug - Use const-ref iterator in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-17 14:04:54 +01:00
Simon Pilgrim db23f27786 [X86] X86PreTileConfig - Use const-ref iterator in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-17 14:04:53 +01:00
Simon Pilgrim 77f6c0bcaa Fix Wdocumentation warnings. NFCI.
Fix parameter name typos and drop returns statements from void functions
2021-09-17 12:45:56 +01:00
Simon Pilgrim 5ebe95e256 [X86][Atom] Fix integer shuffles uops, latency and throughput
The MMX pack/unpck shuffles don't need an override - they have the same behaviour as other shuffles (Port0 only).
The SSE pslldq/psrldq shuffles don't need an override - they have the same behaviour as other shuffles (Port0 only).
The SSE pshufb shuffles use 4uops (+1 load).

Noticed the pslldq/psrldq issue while trying to improve reduction costs via the D103695 helper script, and fixed the others while reviewing. Confirmed with Intel AoM / Agner / InstLatX64.
2021-09-17 12:11:54 +01:00
Simon Pilgrim 9e70d4e5f2 [AsmPrinter] DebugLocEntry::dump() - Use const-ref iterator in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-17 12:11:54 +01:00
Simon Pilgrim e4b2f66d7f [TableGen] Record::checkRecordAssertions() - Use const-ref iterator in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-17 12:11:53 +01:00
Jonas Paulsson 1a5ab3e97c [SystemZ] Recognize .machine directive in parser.
The .machine directive can be used in assembly files to specify the ISA for
the instructions following it.

Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D109660
2021-09-17 12:03:54 +02:00
Petar Avramovic d477a7c2e7 GlobalISel/Utils: Refactor integer/float constant match functions
Rework getConstantstVRegValWithLookThrough in order to make it clear if we
are matching integer/float constant only or any constant(default).
Add helper functions that get DefVReg and APInt/APFloat from constant instr
getIConstantVRegValWithLookThrough: integer constant, only G_CONSTANT
getFConstantVRegValWithLookThrough: float constant, only G_FCONSTANT
getAnyConstantVRegValWithLookThrough: either G_CONSTANT or G_FCONSTANT

Rename getConstantVRegVal and getConstantVRegSExtVal to getIConstantVRegVal
and getIConstantVRegSExtVal. These now only match G_CONSTANT as described
in comment.

Relevant matchers now return both DefVReg and APInt/APFloat.

Replace existing uses of getConstantstVRegValWithLookThrough and
getConstantVRegVal with new helper functions. Any constant match is
only required in:
ConstantFoldBinOp: for constant argument that was bit-cast of float to int
getAArch64VectorSplat: AArch64::G_DUP operands can be any constant
amdgpu select for G_BUILD_VECTOR_TRUNC: operands can be any constant

In other places use integer only constant match.

Differential Revision: https://reviews.llvm.org/D104409
2021-09-17 11:22:13 +02:00
Chen Zheng 80584f0056 Revert "[PowerPC][ELF] make sure local variable space does not overlap with parameter save area"
This causes mix-compile issues on PowerPC Linux.

This reverts commit 324bd467a2.
2021-09-17 08:07:18 +00:00
Lang Hames 7e8babeb9d Revert "[examples] Fix SectionMemoryManager deconstruction error with MSVC."
This reverts commit 63838d8814, which broke tests
on some bots. See e.g. https://lab.llvm.org/buildbot#builders/109/builds/22561
2021-09-17 17:42:25 +10:00
Sjoerd Meijer 97cc678cc4 [FuncSpec] Specialising on addresses of const global values.
This introduces an option to allow specialising on the address of global
values. This option is off by default because it is likely not that profitable
to do so and needs more investigation. Before, we were specialising on addresses
and thus this changes the default behaviour.

Differential Revision: https://reviews.llvm.org/D109775
2021-09-17 08:07:05 +01:00
Lang Hames 63838d8814 [examples] Fix SectionMemoryManager deconstruction error with MSVC.
This commit fixes an order-of-initialization issue: If the default mmapper
object is destroyed while some global SectionMemoryManager is still using it
then calls to the mapper from ~SectionMemoryManager will fail. This issue was
causing failures when running the LLVM Kaleidoscope examples on windows.

Switching to a ManagedStatic solves the initialization order issue.

Patch by Justice Adams. Thanks Justice!

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D107087
2021-09-17 16:58:23 +10:00
Peter Collingbourne fc08cfb888 CodeView: static_cast result of getOffset() to size_t.
Silences a narrowing conversion warning on 32-bit platforms after D109923.
2021-09-16 23:39:04 -07:00
Christudasan Devadasan 167ff5280d [GlobalOpt] Do not shrink global to bool for an unfavorable AS
Do not call `TryToShrinkGlobalToBoolean` for address spaces
that don't allow initializers. It inserts an initializer value
while shrinking to bool. Used the target hook introduced with
D109337 to skip this call for the restricted address spaces.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D109823
2021-09-16 23:13:30 -04:00
Nuri Amari cc8229603b Extract LC_CODE_SIGNATURE related implementation out of LLD
Move the functionality in lld that handles writing of the LC_CODE_SIGNATURE load command and associated data section to a central reusable location.

This change is in preparation for another change that modifies llvm-objcopy to reproduce the LC_CODE_SIGNATURE load command and corresponding
data section to maintain the validity of signed macho object files passed through llvm-objcopy.

Reviewed By: #lld-macho, int3, oontvoo

Differential Revision: https://reviews.llvm.org/D109803
2021-09-16 17:43:39 -07:00
Lang Hames 78b083dbb7 [ORC] Add finalization & deallocation actions, SimpleExecutorMemoryManager class
Finalization and deallocation actions are a key part of the upcoming
JITLinkMemoryManager redesign: They generalize the existing finalization and
deallocate concepts (basically "copy-and-mprotect", and "munmap") to include
support for arbitrary registration and deregistration of parts of JIT linked
code. This allows us to register and deregister eh-frames, TLV sections,
language metadata, etc. using regular memory management calls with no additional
IPC/RPC overhead, which should both improve JIT performance and simplify
interactions between ORC and the ORC runtime.

The SimpleExecutorMemoryManager class provides executor-side support for memory
management operations, including finalization and deallocation actions.

This support is being added in advance of the rest of the memory manager
redesign as it will simplify the introduction of an EPC based
RuntimeDyld::MemoryManager (since eh-frame registration/deregistration will be
expressible as actions). The new RuntimeDyld::MemoryManager will in turn allow
us to remove older remote allocators that are blocking the rest of the memory
manager changes.
2021-09-17 09:55:45 +10:00
Nico Weber 646299d183 [Support] Convert BinaryStream class zoo to 64-bit offsets
Most PDB fields on disk are 32-bit but describe the file in terms of MSF
blocks, which are 4 kiB by default.

So PDB files can be a bit larger than 4 GiB, and much larger if you create them
with a block size > 4 kiB.

This is a first (necessary, but by far not not sufficient) step towards
supporting such PDB files.  Now we don't truncate in-memory file offsets (which
are in terms of bytes, not in terms of blocks).

No effective behavior change. lld-link will still error out if it were to
produce PDBs > 4 GiB.

Differential Revision: https://reviews.llvm.org/D109923
2021-09-16 19:14:52 -04:00
Daniil Suchkov 0e36288318 [LoopPredication] Report changes correctly when attempting loop exit predication
To make the IR easier to analyze, this pass makes some minor transformations.
After that, even if it doesn't decide to optimize anything, it can't report that
it changed nothing and preserved all the analyses.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D109855
2021-09-16 22:49:55 +00:00
Jon Roelofs 4b19e7dfae [LoopIdiomRecognize][Remarks] Track loop-strided store to/from blocks
Differential revision: https://reviews.llvm.org/D109929
2021-09-16 15:46:26 -07:00
Jacob Lambert 4c1023b4b7 [AMDGPU] NFC: Fixing small spelling errors in AMDGPU header files
Nonfunctional commit fixing several minor spelling errors in llvm/lib/Target/AMDGPU header files.
Testing workflow as a new contributor.

Differential Revision: https://reviews.llvm.org/D109733
2021-09-16 13:03:09 -07:00
Teresa Johnson 88cb3e2cb6 [MemProf] Don't instrument stack accesses unless requested
Skip stack accesses unless requested, as the memory profiler runtime
does not currently look at or report accesses for these addresses.

Differential Revision: https://reviews.llvm.org/D109868
2021-09-16 12:21:51 -07:00
Nikita Popov 0fc624f029 [IR] Return AAMDNodes from Instruction::getMetadata() (NFC)
getMetadata() currently uses a weird API where it populates a
structure passed to it, and optionally merges into it. Instead,
we can return the AAMDNodes and provide a separate merge() API.
This makes usages more compact.

Differential Revision: https://reviews.llvm.org/D109852
2021-09-16 21:06:57 +02:00
Craig Topper 73e5b9ea90 [RISCV] Select (srl (sext_inreg X, i32), uimm5) to SRAIW if only lower 32 bits are used.
SimplifyDemandedBits can turn srl into sra if the bits being shifted
in aren't demanded. This patch can recover the original sra in some cases.

I've renamed the tablegen class for detecting W users since the "overflowing operator"
term I originally borrowed from Operator.h does not include srl.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D109162
2021-09-16 11:03:35 -07:00
Vang Thao 106959acc1 [AMDGPU] Inline non-kernel functions using extern lds
In https://reviews.llvm.org/D100481, forceful inline of all non-kernel
functions using lds was disabled since AMDGPULowerModuleLDS pass now handles
static lds. However that pass does not handle extern lds so non-kernel
functions using extern lds must sill be inline.

Reviewed By: hsmhsm, arsenm

Differential Revision: https://reviews.llvm.org/D109773
2021-09-16 10:58:51 -07:00
Arthur Eubanks d49cb5b303 [SimplifyCFG] Add bonus when seeing vector ops to branch fold to common dest
This makes some tests in vector-reductions-logical.ll more stable when
applying D108837.

The cost of branching is higher when vector ops are involved due to
potential SLP transformations.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D108935
2021-09-16 10:50:36 -07:00
Dávid Bolvanský a4a426c9e0 [InstCombine] Added llvm.powi optimizations
If power is even:
powi(-x, p) -> powi(x, p)
powi(fabs(x), p) -> powi(x, p)
powi(copysign(x, y), p) -> powi(x, p)
2021-09-16 19:42:21 +02:00
Kazu Hirata cfc7402419 [llvm] Use drop_begin (NFC) 2021-09-16 08:46:26 -07:00
Michael Liao ffa5c3a555 Fix warning on `llvm-else-after-return`. NFC. 2021-09-16 11:25:43 -04:00
Doug Gregor a773db7d76 Add a command-line flag to control the Swift extended async frame info.
Introduce a new command-line flag `-swift-async-fp={auto|always|never}`
that controls how code generation sets the Swift extended async frame
info bit. There are three possibilities:

* `auto`: which determines how to set the bit based on deployment target, either
statically or dynamically via `swift_async_extendedFramePointerFlags`.
* `always`: the default, always set the bit statically, regardless of deployment
target.
* `never`: never set the bit, regardless of deployment target.

Patch by Doug Gregor <dgregor@apple.com>

Reviewed By: doug.gregor

Differential Revision: https://reviews.llvm.org/D109392
2021-09-16 06:57:45 -07:00
Bjorn Pettersson d9fc3d879e [NewPM] Replace 'kasan-module' by 'asan-module<kernel>'
Change the asan-module pass into a MODULE_PASS_WITH_PARAMS in the
pass registry, and add a single parameter called 'kernel' that
can be set instead of having a special pass name 'kasan-module'
to trigger that special pass config.

Main reason is to make sure that we have a unique mapping from
ClassName to PassName in the new passmanager framework, making it
possible to correctly identify the passes when dealing with options
such as -print-after and -print-pipeline-passes.

This is a follow-up to D105006 and D105007.
2021-09-16 14:58:42 +02:00
Bjorn Pettersson 8f8616655c [NewPM] Use a separate struct for ModuleThreadSanitizerPass
Split ThreadSanitizerPass into ThreadSanitizerPass (as a function
pass) and ModuleThreadSanitizerPass (as a module pass).
Main reason is to make sure that we have a unique mapping from
ClassName to PassName in the new passmanager framework, making it
possible to correctly identify the passes when dealing with options
such as -print-after and -print-pipeline-passes.

This is a follow-up to D105006 and D105007.
2021-09-16 14:58:42 +02:00
Bjorn Pettersson ab41eef9ac [NewPM] Use a separate struct for ModuleMemorySanitizerPass
Split MemorySanitizerPass into MemorySanitizerPass (as a function
pass) and ModuleMemorySanitizerPass (as a module pass).
Main reason is to make sure that we have a unique mapping from
ClassName to PassName in the new passmanager framework, making it
possible to correctly identify the passes when dealing with options
such as -print-after and -print-pipeline-passes.

This is a follow-up to D105006 and D105007.
2021-09-16 14:58:42 +02:00